The same four chemical building blocks behind almost all life on earth could one day be used replace traditional computer storage. Genomics England is using a relational database to power the data science behind its ambitious 100,000 Genomics Project. The organisation, which is owned by the UK's Department of Health and Social Care, runs the project, which is sequencing 100,000 whole genomes from patients with rare diseases, along with their families, and also patients with common cancers. The project has now reached its halfway point, with over 50,000 genomes sequenced. By the end of 2018, the 100,000 genomes project will be complete, with more than 20 petabytes of data stored on the project's infrastructure.
The UCSC Genome Browser is a powerful tool that provides an open portal to the world's genomic information. But because its so powerful, it is often difficult to use. Here is an authoritative book that will get you on the air, not only with the reference data, but also with the means to compare it with your own data. This is a terrific book.
'The book would suit a bioinformatician wishing to gain an introduction into genome database querying and interaction.' Microbiology Today '... provides a step-by-step account of how most commonly-used databases are compiled and updated, their applications and practical examples of how to use them. It is suitable for graduates and advanced undergraduates in bioinformatics or biology, or any researcher intent on exploiting the capabilities of databases as research tools more fully.
Although data-sharing is crucial for making the best use of genetic data in diagnosing disease, many individuals who might donate data are concerned about privacy. Jagadeesh et al. describe a solution that combines a protocol from modern cryptography with frequency-based clinical genetics used to diagnose causal disease mutations in patients with monogenic disorders. This framework correctly identified the causal gene in cases involving actual patients, while protecting more than 99% of individual participants' most private variants.
At the moment, getting a genome sequenced is a length process. After purchasing a kit and providing a sample, the sequencing itself, the base calling, the analysis, are all managed by a third party. This works brilliantly for research and healthcare systems, but leaves genomic consumers with very little control over how their data is managed. For the vast majority of people, the technology and the expertise to take control of their data are out of reach. This model could be about to change, with the open access publication of the'CliveOme', the genome of Clive Brown, CTO of British biotech company Oxford Nanopore.