SNPFile - A software library and file format for large scale association mapping and population genetics studies

  • Jesper Nielsen
  • , Thomas Mailund

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

4 Citations (Scopus)
207 Downloads (Pure)

Abstract

Background
High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions.

Results
We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods.

Conclusion
The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format.
Original languageEnglish
JournalBMC Bioinformatics
Volume9
Issue526
Pages (from-to)1-11
Number of pages11
ISSN1471-2105
DOIs
Publication statusPublished - 8 Dec 2008

Fingerprint

Dive into the research topics of 'SNPFile - A software library and file format for large scale association mapping and population genetics studies'. Together they form a unique fingerprint.

Cite this