A tree based method for the rapid screening of chemical fingerprints

Thomas Greve Kristensen, Jesper Nielsen, Christian Nørgaard Storm Pedersen

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperConference articleResearchpeer-review

Abstract

The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large data set from the industry. Our experiments show that our method yields a three-fold speed-up over previous methods.
Original languageEnglish
Book seriesLecture Notes in Computer Science
Volume5724
Pages (from-to)194-205
Number of pages11
ISSN0302-9743
DOIs
Publication statusPublished - 2009
Event9th International Workshop, WABI 2009 - Philadelphia, United States
Duration: 12 Sept 200913 Sept 2009
Conference number: 9

Conference

Conference9th International Workshop, WABI 2009
Number9
Country/TerritoryUnited States
CityPhiladelphia
Period12/09/200913/09/2009

Fingerprint

Dive into the research topics of 'A tree based method for the rapid screening of chemical fingerprints'. Together they form a unique fingerprint.

Cite this