The Cope-Chat approach previously described remains the crux of most chemoinformatic database systems. The implementation provides a mechanism to search on chemical (sub)structure or reaction scheme within the database.
How?
The pseudo code example below demonstrates how a simple database table could be queried for the free base of strychnine.
SELECT quantity
FROM tblStores
WHERE quantity>65
AND buildingCode=110
AND chemicalStructure=
AND siUnit='g'
AND country='uk'
AND county='oxfordshire'
SQL, the language of the relational database, does not directly offer such rich nor convenient functionality however. The compound must be preprocessed and the key attributes extracted or calculated. For strychnine, these details may include the relative molecular weight (334.42g/mol), a fingerprint, and key functional groups/molecular characteristics such as tertiary amine, amide, levels of unsaturation etc. Bar the relative molecular weight and fingerprint, each of these characteristics could correspond to a numbered or labelled hole along the edges of a Cope-Chat card.
For database queries attempting to retrieve information on a discrete chemical compound, the SQL query above could be rewritten:
SELECT something
FROM tableName
WHERE molecularWeight = 334.42
AND tertiaryAmine=1
AND ester=0
AND weinreibAmide=0
AND amide=1
AND .......
Such an SQL query may return a database table row for information relevant to the chemical structure of strychnine. However the database result set may also contain rows for the many many other chemical structures that also have a molecular weight of 334.42g/mol, and match the other search predicates (eg the compound has a single tertiary amine and no Weinreib amide functional groups). Post database query processing is required to ensure that the intermediary query results truly match what the original search criteria. Generally this a very CPU intensive task so it is necessary to ensure the predicates above filter the candidate result set to a minimum.
For wildcard chemical database interrogation such as substructure searching, the same type of query would be used. However the post processing step would have to interpret the substructure, open valency, salts, R groups etc.
COTS chemoinformatics implementations that permit (sub)structure searching of databases containing chemical information usually encapsulate these pre- (to generate the information to use as the Cope-Chat holes) and post- (to ensure that the information queried truly matches the scientists search query) processing steps within the database using an extensible indexing mechanism. In IBM DB2, these database components are known as Extenders, in IBM Informix these components are known as Datablades, and in Oracle they are known as Data Cartridges. The extensible indexing implementation also hides additional table(s) or data structures containing the search criteria extracted from the chemical moieties being indexed (ie the repository containing the moiety molecular weight, fingerprints, and functional groups present).
Further information on COTS chemical database systems implemented within an Oracle Data Cartridge include:
Further information on COTS chemical database systems implemented using technology centered around databases other than Oracle include: