- The objective of a similarity search is to find suitable
starting materials for the synthesis of a target compound. For this
purpose, WODCA searches in a catalog of chemicals for compounds
having a defined structural similarity to the query structure. WODCA
contains forty different
criteria for defining molecular similarity. The similarity
definitions are based either on substructures such as the
largest carbon skeleton or on generalized chemical reactions like
oxidation or ozonolysis. Other similarity definitions combine both
types of similarity
- The scheme in Figure 1 illustrates the principles of a
similarity search. The user applies a similarity criterion to
a target molecule and obtains a transformed target. This transformed
target is compared to each transformed catalog compound, which has
been precomputed for each similarity criteria during installation of
the catalog. If a transformed target molecule and a transformed
catalog compound are identical, then the original catalog compound
is considered as similar and is proposed by WODCA as a suitable
starting material for the synthesis of the target.
Figure 1: Principle of a similarity search
This method is very fast. A similarity search in a catalog
with more than 16.000 compounds takes less than five seconds on a
Sun SPARCstation10. The scheme in Figure 2 gives some examples for
the effect of the application of a similarity criterion to a query
compound. The transformed query compounds are compared to each
transformed catalog compound to find suitable starting materials.
The scheme shows only a selection of the entire similarity criteria
to give an impression of the structural changes resulting from the
similarity definitions. Clearly, a similarity criterion is only
useful if it is adapted to the query compound. It makes no sense to
search suitable starting materials for e.g. vitamin A with the
criterion aromatic system including alpha atoms, because
vitamin A does not contain an aromatic ring system.
Figure 2: Selection of examples for the effect of
a similarity criterion on a query compound
- Presently, 40 different similarity criteria have been
implemented in WODCA. They range from rather narrowly focussed
criteria to quite general ones that have a broad scope. Some
criteria are based on the skeleton of a given structure and are
therefore called "topological" criteria. They often
correspond to the results of substructure searches. Other criteria
are based on generalized reactions such as substitution or oxidation
reactions, taking structure as similar that can be interconverted by
substitution, or oxidation reactions, respectively.
In the following, the various similarity criteria are arranged into
groups, that are based either on general structural features or on
generalized reactions. We will arrange them somehow in the order of
increasing specificity, starting with criteria that have quite a
broad scope and ending up with quite narrow reaction types.
However, before doing so, first a highly specific similarity
criteria is given, the identity check, a check whether a chosen
structure is already contained in the catalog of available starting
materials, either by considering stereochemical feature, or by
In the following, examples of structures that are considered as
similar will be given. Furthermore, the transformed structure on
which the recognition of similarity is based will be indicated. The
transformed structure of a given query can be indicated in a
separate window (see the next section: WODCA's
"Search for Similar Compounds" Window).
A. Identity Check
- 1. Identity without stereochemistry
Stereochemistry is not considered. Structures that have the same
constitution are considered to be identical.
- 2. Identity with stereochemistry
Only structures that also correspond to each other in their
stereochemical features are considered to be identical.
- 3. Largest Molecule
In cases, where the query structure consists of several molecules or
fragments (such as with salts consisting of a cation and an anion)
only the largest entity will be taken for an identity check.
B. Ring Systems and Carbon Skeletons
- These criteria only consider the carbon skeleton and ring
systems including heteroatoms in rings. The site of exo-substitution
by heteroatoms is not considered.
4. Carbon Skeleton
Molecules that share the same (largest) carbon skeleton are
considered as similar.
- 5. Reduced Carbon Skeleton
Structures are considered as similar after multiple bonds
incorporated in the carbon skeleton) have been completely reduced.
- 6. (Largest) Ring System
Structures that have the same largest ring system are considered as
similar. Note that the definition of largest ring system is based on
the number of atoms a cyclic structure has; this includes the ring
atoms and the atoms attached to them. Thus, a benzene ring is
counted with 12 atoms, whereas cyclopentane counts with 15 atoms.
- 7. Rings with Skeleton
Structures that have the same ring system and correspond in the
number and site of carbon atoms in the substituents are taken as
C. Ring Systems and Substitution Pattern
- These criteria only consider ring systems and the position of
substituents at those rings. No distinction of the identity of the
substituent, whether they consist of heteroatoms or carbon atoms, is
made. All substituents are converted into chlorine atoms for the
8. Ring + Substitution Positions
Molecules that have the same ring system and substituents at the
same positions are considered as similar.
- 9. Reduced Ring + Substitution Positions
Here the same criterion as in previous case is applied, but, in
addition, unsaturated rings are completely reduced into their
D. Carbon Skeleton and Substitution Pattern
- These criteria consider the (largest) carbon skeleton (including
ring systems) and the site of substitution by heteroatoms. No
distinction is made between the identity of heteroatoms. The first
criterion (10) allows only a limited interchange of substituents.
10. Element and ZH Exchange
All halogen atoms and OH, SH and NH2 groups are
considered as equivalence.
- All other criteria allow an interchange between all heteroatoms,
generalizing all heteroatoms into chlorine atom. Different criteria
can be invoked depending on how heteroatoms in aromatic systems
(AR), atoms directly bonded to aromatic rings (A1), carbon-carbon
multiple bonds (CCMB), bond orders to substituent (BO) and multiple
substitution by heteroatoms (MU) are handled.
The following choices can be made:
a. AR: Heteroatoms in aromatic rings are considered as
substituents (+AR) or not (-AR). (This feature is presently not
b. A1: Heteroatoms directly attached to aromatic rings are
generalized (+A1) or not (-A1).
- c. CCMB: Carbon-carbon multiple bonds count as carrying
substituents (+CCMB) or not (-CCMB).
- d. BO: The bond order to substituents is considered (+BO) or
- e. MU: Multiple substitution by heteroatoms is counted (+MU)
or not (-MU).
- Different combinations of these various options give rise to a
series of similarity criteria of substitution patterns. The various
similarity criteria encompass different combinations of the options
on a separate panel (see Figure 3), which can be used to make the
selection between these various possibilities. The panel is
automatically displayed if WODCA is in novice mode (see Options
menu). Note, that not each combination of parameters is allowed.
Figure 3: Help panel for the selection of a
similarity criterion of the type »substitution pattern«
11. Substitution Pattern (-AR+A1-CCMB-BO-MU)
- 12. Substitution Pattern (-AR+A1+CCMB-BO-MU)
- 13. Substitution Pattern (-AR+A1-CCMB-BO+MU)
- 14. Substitution Pattern (-AR+A1+CCMB-BO+MU)
- 15. Substitution Pattern (-AR+A1-CCMB+BO+MU)
- 16. Substitution Pattern (-AR+A1+CCMB+BO+MU)
- 17. Substitution Pattern (-AR-A1-CCMB-BO-MU)
- 18. Substitution Pattern (-AR-A1+CCMB-BO-MU)
- 19. Substitution Pattern (-AR-A1-CCMB-BO+MU)
- 20. Substitution Pattern (-AR-A1+CCMB-BO+MU)
- 21. Substitution Pattern (-AR-A1-CCMB+BO+MU)
- 22. Substitution Pattern (-AR-A1+CCMB+BO+MU)
E) Ring Systems and Carbon Skeleton Including a-Atoms
- These criteria include the heteroatoms directly bonded (a-atoms)
to ring systems or to the carbon skeleton into the search query. All
atoms beyond these a-atoms are
replaced by hydrogen atoms.
23. Carbon Skeleton + Alpha Atoms
- 24. Carbon Skeleton + Alpha Atoms (Keep Aromates)
- 25. Aromatic System + Alpha Atoms
- The similarity criteria in sections D and E already introduced
generalized reactions like substitution and also, in more indirect
manner, reduction and oxidation reactions.
In other words, compounds that can be interconverted by such
reactions are considered as similar. The following groups of
similarity criteria address such generallized reactions as
hydrolysis, reduction, oxidation, elimination, and even ozonolysis
reactions in an explicit manner.
F. Hydrolysis Reactions
- The following bonds are allowed to be hydrolyzed:
26. Hydrolysis (No Alpha Atoms)
- 27. Hydrolysis (With Alpha Atoms)
G. Reduction Reactions
- In the following, functional groups are reduced to their lowest
oxidation state. No carbon-heteroatom bonds will be cleaved.
28. Maximum Reduction
- 29. Maximum Reduction (No C-Aromates)
- 30. Maximum Reduction (No Aromates)
- 31. Carbon Skeleton with Alpha Atoms and Maximum Reduction
H. Oxidation Reactions
- These similarity criteria oxidize functional groups to their
maximum oxidation state; no carbon-heteroatom bonds will be broken.
Observe that the similarity criteria of this groups quite often find
the same compounds as similar as the criteria of the previous
section (G. Reduction Reactions). The reason is that functional
groups of different oxidation state will lead to the same - highly
oxidized - groups on oxidation or the same - highly reduced - groups
on reduction. Differences might occur with carbon-carbon double
bonds and ring systems. As criteria 28 and 29 reduce aromatic rings
to their saturated form, saturated rings will not be oxidized to
aromatic rings. In the same manner, carbon-carbon double bonds are
reduced to single bonds by the criteria in section G., while they
are not oxidized by the following criteria.
32. Maximum Oxidation
- 33. Carbon Skeleton with Alpha Atoms and Max. Oxidation
- 34. Hydrolysis and Oxidation (No Alpha Atoms)
- 35. Hydrolysis and Oxidation (With Alpha Atoms)
I. Elimination Reactions
- Structures that give the same product in an elimination reaction
are considered similar.
1,4 and 1,2-Elimination is done with free substituents (halogens,
chalkogens and elements of the nitrogen-group with no further
substituents). Decarboxylation of carboxylic acid groups is also
performed where possible (e.g.
- 37. Elimination and Substitution Pattern (-AR+A1+CCMB-BO-MU)
After having performed the structural changes of the previous
criterion each carbon-carbon double bond as well as each substituent
is replaced by a chlorine atom.
- 38. Carbon Skeleton with Alpha Atoms and Elimination
- Structures that will give the same product on ozonolysis are
considered as similar.
Non aromatic carbon-carbon double bonds are broken and converted
into aldehyde functions. With more than one fragment only the
largest structure is considered.
- 40. Ozonolysis and Substitution Pattern (-AR+A1-CCMB-BO-MU)
After performing the transformation of the previous criterion each
substituent is replaced by a chlorine atom.
- If the user clicks with the left mouse button on the option
Similar Compounds... in the searches
menu the window for the application of a similarity
search appears (Figure 4):
Figure 4: Window for the application of a
similarity criterion to a target compound
If you are currently reading the online manual of WODCA you can
use the hyperlinks on the picture in Figure 4. Just click with
the left mouse button on a window element of your interest and
follow the link to obtain directly an explanation. Use the back
button of your WWW browser to get back to this place.
- The similarity search window is shown in Figure 4. It
contains different window elements which are now explained.
The left hand area is called Query
(transformed). It displays the modified chemical structure
of a query that has been transformed by application of a similarity
criterion. The structure display for transformed queries
contains a context menu with the title Query Actions. When
the right mouse button is pressed the following context menu
option Recalculate Coordinates is only useful in some cases:
Sometimes the structure display Query
(transformed) shows bad coordinates for the current
compound. After clicking on this menu entry, WODCA recalculates the
2D-coordinates of the current ensemble and displays the correct
Above the list
box of similarity criteria there are two switches to reduce
the number of similarity criteria presented. The first switch sets
range to broad, intermediate, narrow or
unspecified. The second switch sets the focus
of the similarity search to functionality, skeleton,
rings, aromaticity or unspecified. Both
switches have only an effect on the number of similarity criteria
Below the list box there are three buttons:
button starts a similarity search with the selected
similarity criterion. If it is pressed, the structure display is
updated with the transformed query and the catalog icon in the WODCA
main window will display the number of hits found. To view a
match list it is necessary to export
it to the CACTVS browser.
button does not start a similarity search, but displays the
transformed query for the chosen similarity criterion.
button shuts the window.
- Before a similarity search can be applied, a query
compound has to be defined and a catalog of chemicals must have been
loaded into WODCA. If a query compound is already saved in a CTX
structure file, it can be loaded into WODCA with the help of the
file menu. Otherwise it can be
exported from the CACTVS editor to WODCA. The file
menu is also useful to load a catalog of chemicals into
WODCA. Whether a catalog of chemicals is already loaded into WODCA,
or not, is indicated by the Catalog
icon in WODCA's
The next step is to click with the left mouse button on the entry
Similar Compounds... in WODCA's searches
menu . This lets the window
for the application of similarity searches appear. A
similarity criterion has to be selected from the list
box of similarity criteria. To start a similarity search
press the Search button in the lower part of the window with
the left mouse button. This makes WODCA to search in the catalog of
chemicals for similar compounds which could be suitable starting
materials for the synthesis of the target compound. If a similarity
search was successful, WODCA lists the compounds found in the
WODCA console and the
Match List icon in
WODCA's Information Area
indicates the number of hits found which are stored in a match list.
To view a match list it has to be exported to the CACTVS match list
browser, which has to be opened previously with the Tools
menu. The export can be achieved in different ways. It is possible
to drag the match list icon with the left mouse button to the CACTVS
browser window. In this case, the match list is added to the
contents of the CACTVS match list browser. Alternatively, the match
list can be exported to the CACTVS match list browser via the Export
menu. In this case, the contents of the CACTVS match list
browser will be overwritten.
The picture in Figure 5 shows the
result of a similarity search:
Figure 5: CACTVS browser during the display of a
The match list contains 5 different compounds obtained from the
Acros catalog with the similarity search »Ring
Substitution Positions«. The first three hits are displayed in
the CACTVS browser (see Figure 5).
The query structure was 2-methoxycarbonyl-3-phenyl-cyclopentanone
which is shown in Tutorial A.
- Last change: 2000-06-30