Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the causative agent for the Coronavirus Disease 2019 (COVID-19). Since its first detection in December 2019 the disease has engulfed almost the entire world by spreading over more than 100 countries that resulted in the above 352,294 deaths as of 25th May 2020. This highly infectious virus spread via respiratory droplets and aerosols when an uninfected person comes in contact with an infected one. Without any drug or vaccine at sight, the world is slowly succumbing to the disease. Therefore, researchers around the world have started collaborating and sharing their research data so that with concerted efforts a cure for the disease can be developed quickly. In this challenging scenario, Bioinformatics came out as one of the essential tools to analyze viral data as it provides vital information about the genetic makeup of the virus and also assists directly in the development of drugs or vaccines against the deadly disease. The COVID-19 pandemic is far from over, and there is worldwide research on the development of effective diagnostic methods as well as treatments and preventable vaccines. We wanted to automate the overall process so we came up with a solution in free time.
We took this as a challenge more than an opportunity and developed a bioinformatics application “Biozene” for researchers/scientists/anyone who is working on the DNA sequences of the virus.
What it does
Biozene is a bioinformatics application for computational biology and to perform basic to advanced tasks in a short amount of time. This application is developed considering COVID-19 in mind to help researchers/scientists or anyone who is working to fight with the pandemic. It is a data analysis application for integrated and interactive analytics on genomics to compute and compare millions of sequences of COVID-19 DNA sequences. Biozene can help scientists decode the genome of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that causes COVID-19 disease. This approach could help in drug identification and vaccine development.
Features of Biozene:
1. Represent and analyze DNA sequences
2. View DNA Features
3. Protein Synthesis Analysis (Translation, Transcription, Complement, Amino Acids generation, etc)
4. Generate Genome Diagrams
5. Mutation Rate Modeling
1. Helps Researchers/Scientists
2. Saves number of hours
3. Free to use and supports all kind of devices
How we built it
Biozene is a SaaS application which supports all kinds of devices (responsive). The application runs on the cloud and performs tasks in real-time. We have used end to end pipeline structure to build the solution. Starting from data mining to deployment, we have used several tools, frameworks, libraries, and languages. The core language behind Biozene is “Python” and supported by the Tornado engine as a server. Below are the techniques used to develop the app:
Data Analysis and Visualization: Pandas, Numpy, Matplotlib, Seaborn, etc.
Core Language: Python.
Main Library: Biopython, and Scikit-learn.
Cloud Technology and Server: AWS, Tornado, and, Streamlit.
Additionals Tools: Git, Anaconda, Colab, Instamojo, Putty, etc.
NCBI and the University of Edinburgh are the two platforms where we collected a lot of information about the bioinformatics tools and algorithms. Biozene currently supports the “FASTA” and “GenBank” files that are available on NCBI.
Challenges we ran into
The biggest challenge of this year is the “Novel Coronavirus” itself. Biozene is deployed on free dynos on a cloud service provider and the challenge we are facing is of funds at the moment. We are not able to scale up the infrastructures on the cloud. For preparing this production level application, we used free services. In the coming days, we would like to enhance the features like integration of Databases, authentication, malware, better UX, etc and for those integrations, we will need to upgrade the cloud services. Apart from that, we are really excited to bring Biozene in production to help our researchers/scientists.
Accomplishments that we're proud of
We have developed this application in a very short span of time. After developing the solution, we reached out to a few reputed research institutes and labs in India who are working on vaccine development. They really find our solution impactful and at the moment we are exploring the opportunity to build a partnership with them so they can use Biozene and other custom analytics-based solutions. We are proud of our team who have contributed to the development of the project.
What we learned
We took this pandemic as a challenge more than an opportunity. Biozene is one of our special projects and so far and we are really happy with the outcomes and response from the community. At the moment, India is incomplete lockdown and our young and dynamic team is working from home (remote). We learned the art of work in these lockdown situations. Apart from it, on the technical and development side, we learned awesome-streamlit which is going to revolutionize the AI/ML domain in the next few months. The production level deployment also helped us learn new concepts like adding domains, add-ons, custom pipelines, etc. Learning is a continuous process and we will keep on learning going forward.
What's next for Biozene: Interactive Bioinformatics
We are adding more features in a few upcoming days to provide a robust and feasible tool to the bioinformatics community who is working to battle the deadly virus.
- Run Blast
- Machine Learning Modeling (Cluster and Regression Analysis)
- Phylogenetics and sequence motifs
Biozene is free to use so anyone can use it by their convenience. We are also developing a customized paid solution. Apart from the development, we are optimistic about the collaboration with the research labs here in India to accelerate their COVID-19-related R&D activities with Biozene and related custom solutions. We are committed and dedicated to work with anyone who is working to fight with COVID-19 to support their COVID-19 related research and discovery efforts using AI, ML, and Bioinformatics technologies.
Try It out
- COVIDathon - Decentralized AI Hackathon
- DeveloperWeek Global 2020
- Girls in Tech Virtual Hackathon
- Decypher : A Better Tomorrow
- Innovation in STEM Hackathon