A tree based method for the rapid screening of chemical fingerprints

Thomas Greve Kristensen, Jesper Nielsen, Christian Nørgaard Storm Pedersen

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisKonferenceartikelForskningpeer review


The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large data set from the industry. Our experiments show that our method yields a three-fold speed-up over previous methods.
BogserieLecture Notes in Computer Science
Sider (fra-til)194-205
Antal sider11
StatusUdgivet - 2009
Begivenhed9th International Workshop, WABI 2009 - Philadelphia, USA
Varighed: 12 sep. 200913 sep. 2009
Konferencens nummer: 9


Konference9th International Workshop, WABI 2009


Dyk ned i forskningsemnerne om 'A tree based method for the rapid screening of chemical fingerprints'. Sammen danner de et unikt fingeraftryk.