Poseidon - Archaeogenetic Data Management
Archaeogenetics has become a fast accelerating field, with new data coming out faster than many individual researchers can keep track of and co-analyze. Together with samples currently being processed in the world's largest laboratories, we're now approaching genome-wide data for 10,000 ancient individuals. In addition, for many of those samples we also have rich metadata ranging from archaeological information to radiocarbon dating.
The way data is currently shared and published via academic papers, at least from genetic analyses, is mainly via releasing raw sequencing data into public repositories such as the ENA, while providing partial metadata on samples via poorly formatted Excel tables in the Supplement. This creates (at least) the following problems:
- Intermediate data such as genotypes are often not released at all, making it hard for others to reproduce analyses.
- The connection between individuals, contextual information, and genetic data becomes hard to maintain, bridging between very different repositories and sources (Excel vs. personal homepages vs. public repositories)
- Meta-analyses spanning datasets require enormous amounts of work on data collection and curation.
Poseidon addresses these problems by i) providing a human- and machine-readible package format for genetic data together with archaeological and laboratory context information, ii) a set of tools to work with Poseidon-packages, and iii) a curated repository of publicly available Poseidon-packages.
You can find out more about Poseidon on the primary webpage about the project.