Main indexFormer pageNext pageBeginningEnd

Similarity Searches

What is a Similarity Search?

The objective of a similarity search is to find suitable starting materials for the synthesis of a target compound. For this purpose, WODCA searches in a catalog of chemicals for compounds having a defined structural similarity to the query structure. WODCA contains forty different criteria for defining molecular similarity. The similarity definitions are based either on substructures such as the largest carbon skeleton or on generalized chemical reactions like oxidation or ozonolysis. Other similarity definitions combine both types of similarity criteria.

Principle of a Similarity Search

The scheme in Figure 1 illustrates the principles of a similarity search. The user applies a similarity criterion to a target molecule and obtains a transformed target. This transformed target is compared to each transformed catalog compound, which has been precomputed for each similarity criteria during installation of the catalog. If a transformed target molecule and a transformed catalog compound are identical, then the original catalog compound is considered as similar and is proposed by WODCA as a suitable starting material for the synthesis of the target.

Figure 1: Principle of a similarity search

This method is very fast. A similarity search in a catalog with more than 16.000 compounds takes less than five seconds on a Sun SPARCstation10. The scheme in Figure 2 gives some examples for the effect of the application of a similarity criterion to a query compound. The transformed query compounds are compared to each transformed catalog compound to find suitable starting materials. The scheme shows only a selection of the entire similarity criteria to give an impression of the structural changes resulting from the similarity definitions. Clearly, a similarity criterion is only useful if it is adapted to the query compound. It makes no sense to search suitable starting materials for e.g. vitamin A with the criterion aromatic system including alpha atoms, because vitamin A does not contain an aromatic ring system.

Figure 2: Selection of examples for the effect of a similarity criterion on a query compound

List of WODCA's Similarity Criteria

Presently, 40 different similarity criteria have been implemented in WODCA. They range from rather narrowly focussed criteria to quite general ones that have a broad scope. Some criteria are based on the skeleton of a given structure and are therefore called "topological" criteria. They often correspond to the results of substructure searches. Other criteria are based on generalized reactions such as substitution or oxidation reactions, taking structure as similar that can be interconverted by substitution, or oxidation reactions, respectively.
In the following, the various similarity criteria are arranged into groups, that are based either on general structural features or on generalized reactions. We will arrange them somehow in the order of increasing specificity, starting with criteria that have quite a broad scope and ending up with quite narrow reaction types.
However, before doing so, first a highly specific similarity criteria is given, the identity check, a check whether a chosen structure is already contained in the catalog of available starting materials, either by considering stereochemical feature, or by neglecting them.
In the following, examples of structures that are considered as similar will be given. Furthermore, the transformed structure on which the recognition of similarity is based will be indicated. The transformed structure of a given query can be indicated in a separate window (see the next section: WODCA's "Search for Similar Compounds" Window).

A. Identity Check

1. Identity without stereochemistry
Stereochemistry is not considered. Structures that have the same constitution are considered to be identical.

2. Identity with stereochemistry
Only structures that also correspond to each other in their stereochemical features are considered to be identical.

3. Largest Molecule
In cases, where the query structure consists of several molecules or fragments (such as with salts consisting of a cation and an anion) only the largest entity will be taken for an identity check.

B. Ring Systems and Carbon Skeletons

These criteria only consider the carbon skeleton and ring systems including heteroatoms in rings. The site of exo-substitution by heteroatoms is not considered.

4. Carbon Skeleton
Molecules that share the same (largest) carbon skeleton are considered as similar.

5. Reduced Carbon Skeleton
Structures are considered as similar after multiple bonds incorporated in the carbon skeleton) have been completely reduced.

6. (Largest) Ring System
Structures that have the same largest ring system are considered as similar. Note that the definition of largest ring system is based on the number of atoms a cyclic structure has; this includes the ring atoms and the atoms attached to them. Thus, a benzene ring is counted with 12 atoms, whereas cyclopentane counts with 15 atoms.

7. Rings with Skeleton
Structures that have the same ring system and correspond in the number and site of carbon atoms in the substituents are taken as similar.

C. Ring Systems and Substitution Pattern

These criteria only consider ring systems and the position of substituents at those rings. No distinction of the identity of the substituent, whether they consist of heteroatoms or carbon atoms, is made. All substituents are converted into chlorine atoms for the search query.

8. Ring + Substitution Positions
Molecules that have the same ring system and substituents at the same positions are considered as similar.

9. Reduced Ring + Substitution Positions
Here the same criterion as in previous case is applied, but, in addition, unsaturated rings are completely reduced into their saturated forms.

D. Carbon Skeleton and Substitution Pattern

These criteria consider the (largest) carbon skeleton (including ring systems) and the site of substitution by heteroatoms. No distinction is made between the identity of heteroatoms. The first criterion (10) allows only a limited interchange of substituents.

10. Element and ZH Exchange
All halogen atoms and OH, SH and NH2 groups are considered as equivalence.

All other criteria allow an interchange between all heteroatoms, generalizing all heteroatoms into chlorine atom. Different criteria can be invoked depending on how heteroatoms in aromatic systems (AR), atoms directly bonded to aromatic rings (A1), carbon-carbon multiple bonds (CCMB), bond orders to substituent (BO) and multiple substitution by heteroatoms (MU) are handled.

The following choices can be made:
a. AR: Heteroatoms in aromatic rings are considered as substituents (+AR) or not (-AR). (This feature is presently not implemented.)

b. A1: Heteroatoms directly attached to aromatic rings are generalized (+A1) or not (-A1).

c. CCMB: Carbon-carbon multiple bonds count as carrying substituents (+CCMB) or not (-CCMB).

d. BO: The bond order to substituents is considered (+BO) or not (-BO).

e. MU: Multiple substitution by heteroatoms is counted (+MU) or not (-MU).

Different combinations of these various options give rise to a series of similarity criteria of substitution patterns. The various similarity criteria encompass different combinations of the options on a separate panel (see Figure 3), which can be used to make the selection between these various possibilities. The panel is automatically displayed if WODCA is in novice mode (see Options menu). Note, that not each combination of parameters is allowed.

Figure 3: Help panel for the selection of a similarity criterion of the type »substitution pattern«

11. Substitution Pattern (-AR+A1-CCMB-BO-MU)

12. Substitution Pattern (-AR+A1+CCMB-BO-MU)

13. Substitution Pattern (-AR+A1-CCMB-BO+MU)

14. Substitution Pattern (-AR+A1+CCMB-BO+MU)

15. Substitution Pattern (-AR+A1-CCMB+BO+MU)

16. Substitution Pattern (-AR+A1+CCMB+BO+MU)

17. Substitution Pattern (-AR-A1-CCMB-BO-MU)

18. Substitution Pattern (-AR-A1+CCMB-BO-MU)

19. Substitution Pattern (-AR-A1-CCMB-BO+MU)

20. Substitution Pattern (-AR-A1+CCMB-BO+MU)

21. Substitution Pattern (-AR-A1-CCMB+BO+MU)

22. Substitution Pattern (-AR-A1+CCMB+BO+MU)

E) Ring Systems and Carbon Skeleton Including a-Atoms

These criteria include the heteroatoms directly bonded (a-atoms) to ring systems or to the carbon skeleton into the search query. All atoms beyond these a-atoms are replaced by hydrogen atoms.

23. Carbon Skeleton + Alpha Atoms

24. Carbon Skeleton + Alpha Atoms (Keep Aromates)

25. Aromatic System + Alpha Atoms

The similarity criteria in sections D and E already introduced generalized reactions like substitution and also, in more indirect manner, reduction and oxidation reactions.
In other words, compounds that can be interconverted by such reactions are considered as similar. The following groups of similarity criteria address such generallized reactions as hydrolysis, reduction, oxidation, elimination, and even ozonolysis reactions in an explicit manner.

F. Hydrolysis Reactions

The following bonds are allowed to be hydrolyzed:

26. Hydrolysis (No Alpha Atoms)

27. Hydrolysis (With Alpha Atoms)

G. Reduction Reactions

In the following, functional groups are reduced to their lowest oxidation state. No carbon-heteroatom bonds will be cleaved.
28. Maximum Reduction

29. Maximum Reduction (No C-Aromates)

30. Maximum Reduction (No Aromates)

31. Carbon Skeleton with Alpha Atoms and Maximum Reduction

H. Oxidation Reactions

These similarity criteria oxidize functional groups to their maximum oxidation state; no carbon-heteroatom bonds will be broken. Observe that the similarity criteria of this groups quite often find the same compounds as similar as the criteria of the previous section (G. Reduction Reactions). The reason is that functional groups of different oxidation state will lead to the same - highly oxidized - groups on oxidation or the same - highly reduced - groups on reduction. Differences might occur with carbon-carbon double bonds and ring systems. As criteria 28 and 29 reduce aromatic rings to their saturated form, saturated rings will not be oxidized to aromatic rings. In the same manner, carbon-carbon double bonds are reduced to single bonds by the criteria in section G., while they are not oxidized by the following criteria.

32. Maximum Oxidation

33. Carbon Skeleton with Alpha Atoms and Max. Oxidation

34. Hydrolysis and Oxidation (No Alpha Atoms)

35. Hydrolysis and Oxidation (With Alpha Atoms)

I. Elimination Reactions

Structures that give the same product in an elimination reaction are considered similar.

36. Elimination
1,4 and 1,2-Elimination is done with free substituents (halogens, chalkogens and elements of the nitrogen-group with no further substituents). Decarboxylation of carboxylic acid groups is also performed where possible (e.g. -ketocarboxylic acids).

37. Elimination and Substitution Pattern (-AR+A1+CCMB-BO-MU)
After having performed the structural changes of the previous criterion each carbon-carbon double bond as well as each substituent is replaced by a chlorine atom.

38. Carbon Skeleton with Alpha Atoms and Elimination

K. Ozonolysis

Structures that will give the same product on ozonolysis are considered as similar.

39. Ozonolysis
Non aromatic carbon-carbon double bonds are broken and converted into aldehyde functions. With more than one fragment only the largest structure is considered.

40. Ozonolysis and Substitution Pattern (-AR+A1-CCMB-BO-MU)
After performing the transformation of the previous criterion each substituent is replaced by a chlorine atom.

WODCA's »Search for Similar Compounds« Window

If the user clicks with the left mouse button on the option Similar Compounds... in the searches menu the window for the application of a similarity search appears (Figure 4):

The button for closing this windowThe button for the transformation of a queryThe button for the application of a similarity searchThe listbox for the selection of a similarity criterionThe switch for the setting of the focusThe switch for the setting of the similarity rangeThe structure display for queries transformedThe window for the application of similarity searches

Figure 4: Window for the application of a similarity criterion to a target compound

Info

If you are currently reading the online manual of WODCA you can use the hyperlinks on the picture in Figure 4. Just click with the left mouse button on a window element of your interest and follow the link to obtain directly an explanation. Use the back button of your WWW browser to get back to this place.

The similarity search window is shown in Figure 4. It contains different window elements which are now explained.

How to Search for Starting Materials in a Catalog of Chemicals

Before a similarity search can be applied, a query compound has to be defined and a catalog of chemicals must have been loaded into WODCA. If a query compound is already saved in a CTX structure file, it can be loaded into WODCA with the help of the file menu. Otherwise it can be exported from the CACTVS editor to WODCA. The file menu is also useful to load a catalog of chemicals into WODCA. Whether a catalog of chemicals is already loaded into WODCA, or not, is indicated by the Catalog icon in WODCA's Information Area.
The next step is to click with the left mouse button on the entry Similar Compounds... in WODCA's searches menu . This lets the window for the application of similarity searches appear. A similarity criterion has to be selected from the list box of similarity criteria. To start a similarity search press the Search button in the lower part of the window with the left mouse button. This makes WODCA to search in the catalog of chemicals for similar compounds which could be suitable starting materials for the synthesis of the target compound. If a similarity search was successful, WODCA lists the compounds found in the WODCA console and the Match List icon in WODCA's Information Area indicates the number of hits found which are stored in a match list.
To view a match list it has to be exported to the CACTVS match list browser, which has to be opened previously with the Tools menu. The export can be achieved in different ways. It is possible to drag the match list icon with the left mouse button to the CACTVS browser window. In this case, the match list is added to the contents of the CACTVS match list browser. Alternatively, the match list can be exported to the CACTVS match list browser via the Export menu. In this case, the contents of the CACTVS match list browser will be overwritten.
The picture in Figure 5 shows the result of a similarity search:

Figure 5: CACTVS browser during the display of a match list

The match list contains 5 different compounds obtained from the Acros catalog with the similarity search »Ring Substitution Positions«. The first three hits are displayed in the CACTVS browser (see Figure 5). The query structure was 2-methoxycarbonyl-3-phenyl-cyclopentanone which is shown in Tutorial A.

Main indexFormer pageNext pageBeginningEnd

Last change: 2000-06-30
Webmaster: matthias.pfoertner@chemie.uni-erlangen.de