Using Inverted Indices for Accelerating LINGO Calculations

Thomas Greve Kristensen, Jesper Nielsen, Christian Nørgaard Storm Pedersen

    Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review


    The ever growing size of chemical data bases calls for the
    development of novel methods for representing and comparing
    molecules. One such method called LINGO is based on fragmenting the
    SMILES string representation of molecules. Comparison of molecules
    can then be performed by calculating the Tanimoto coefficient which
    is called the LINGOsim when used on LINGO multisets. This paper
    introduces a verbose representation for storing LINGO multisets
    which makes it possible to transform them into sparse fingerprints
    such that fingerprint data structures and algorithms can be used to
    accelerate queries. The previous best method for rapidly
    calculating the LINGOsim similarity matrix required specialised
    hardware to yield a significant speedup over existing methods. By
    representing LINGO multisets in the verbose representation and using
    inverted indices it is possible to calculate LINGOsim similarity
    matrices roughly 2.6 times faster than existing methods without
    relying on specialised hardware.
    TidsskriftJournal of Chemical Information and Modeling
    Sider (fra-til)597-600
    Antal sider4
    StatusUdgivet - 18 feb. 2011


    Dyk ned i forskningsemnerne om 'Using Inverted Indices for Accelerating LINGO Calculations'. Sammen danner de et unikt fingeraftryk.