Ultra-high throughput processing of SRA data on AWS

About our projectBig-Data Analysis of RNA-sequencing data to gain insights for developing vaccines and drugs against the spread of the coronavirus.

Where does the SARS-CoV-2 come fromThe novel coronavirus, SARS-CoV-2, is believed to have stemmed from a zoonotic transfer from bats and/or pangolin to humans in a wet market of Wuhan, China in 2019.

Impact of the pandemicThe resulting pandemic will infect millions and has already crippled the global economy. While there is an intense research effort to sequence isolates to understand the evolution of the virus in real-time, our understanding of where it originated is limited by the sparse characterization of other members of the Coronaviridae family.

What we are doingWe are re-analyzing all RNA-sequencing data in the NCBI Short Read Archive to discover new members of Coronaviridae. Our initial focus is mammalian RNA-sequence libraries followed by avian, metagenomic, and finally all 1.6M entries (~14 Petabytes).

Why is this data so importantThe resulting phylogenetic tree will serve as a definitive characterization of Coronaviridae and give the global research effort the deepest evolutionary conservation dataset, offering critical insight into the origins and evolution of this scourge. Website:

What the impactful potential findings could be:

  • Are there more closely related SARS-CoV-2 viruses than known, specifically strains capable of recombination (mixing) with the SARS-CoV-2 which would hinder ongoing vaccine effort?
  • What is observed recombination rate between different CoV species, and how frequently is this expected to occur?
  • What are the animal/environmental reservoirs for SARS-CoV-2?
  • Is culling of a species indicated to slow the spread?
  • Can we identify evolutionary conserved (and thus functional) regulatory motifs and/or RNA secondary structures across Circoviridae?

How can you participate in this projectWe are open for any form of participation (check out our TODO list and GitHub Issues), as well as financial support for this project to get fast results to researcher around the world.



Visit the github page to learn more

Try It out



dockerfile, hcl, html, python, shell

Devpost Software Identifier