The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large data set from the industry. Our experiments show that our method yields a three-fold speed-up over previous methods.
|Lecture Notes in Computer Science
|Udgivet - 2009
|9th International Workshop, WABI 2009 - Philadelphia, USA
Varighed: 12 sep. 2009 → 13 sep. 2009
Konferencens nummer: 9
|9th International Workshop, WABI 2009
|12/09/2009 → 13/09/2009