A research group headed by Stanford University scientists has developed a scalable, high throughput method to generate high fidelity whole genome and HLA sequencing, viral genomes, and representation of human transcriptome from single nasopharyngeal swabs of coronavirus disease (COVID19) patients. Their results are published on the medRxiv* preprint server.
The ongoing and highly disruptive COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has crippled health care systems around the world and resulted in tremendous morbidity and mortality.
As with other virus outbreaks in the past, viral sequencing has been crucial, although limited by high costs and low throughput. Moreover, the collection of associated host genomic data (which can aid in familial relationship tracking and genetic risk appraisal) has been hampered by the requirement for multiple sampling.
Therefore, there is a pressing need for protocols that can open the door for producing these data in real-time and at scale. This will not only significantly contribute to infection tracking, but also inform further development of therapeutics.
In this groundbreaking study, researchers from Stanford University, Chan Zuckerburg Biohub, University of Lausanne, and Illumina Inc. described a method for achieving simultaneous viral and host sequencing from single SARS-CoV-2 diagnostic nasopharyngeal swab residuals.
Analyzing hundreds of samples simultaneously
The researchers used low-pass whole host genome sequencing as an alternative to array-based genotyping in order to provide rich information for trait mapping at scale, which can regularly yield DNA of adequate quality for host genome and HLA sequencing.
Furthermore, they have presented a high-throughput RNA sequencing workflow for sequencing full viral genomes, and human transcriptome reads from hundreds of samples at the same time.
Finally, the researchers described how exactly this method could be used to create a robust multi-omic foundation for data integration and sharing across global institutions – especially since global data repositories have been pivotal for advancing research before and during the current pandemic.
Copious data from a single nasopharyngeal swab
Albeit nasopharyngeal swabs have been used in the past to perform whole-genome sequencing of respiratory viruses in low throughput, this method significantly accelerates the process both in terms of time and number of subjects sequenced.
More specifically, a comparable rate of viral genomic coverage was described, with the capability of studying at least ten times the number of samples in a single sequencing run.
What is also fascinating is that the same nasopharyngeal swab can be used to gather an abundance of human genomics data, and it often yields sufficient DNA to pursue deep sequencing of HLA type, which is a crucial component of the host immune response.
The rise of multi-omic data repositories
Arguably the most significant application of the proposed workflow is that it allows rapid development of large scale, multicentric, and even global host and viral multi-omic data repositories.
With this method, the number of viral genomes comparable to the submissions of SARS-CoV-2 data since the start of the pandemic could be produced by less than a hundred sequencing centers within weeks, along with matched host genome, transcriptome and HLA typing.
And this is basically an indispensable scaffold for integrating such complex inputs to centralized data repositories, enabling, in turn, unparalleled rapidity of the discovery and implementation needed to overcome a devastating COVID-19 pandemic.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.