Researchers from all over the world are working hard on the investigation of the SARS-CoV-2 virus, resulting in many new publications in so-called preprint versions per day, e.g. at medRxiv or bioRxiv. The usual publication process requires a (possibly long) reviewing process, where other experts examine the content in detail before its official publication. However, time is short and thus, a good interface to access, sort and classify the huge amount of preprint papers is needed.
What it does
Our website offers the following features:
- List and access all available preprints regarding SARS-CoV-2 from medRxiv and bioRxiv.
- Sort and filter the preprints by publishing date, author name, title, keywords, and category.
- Classification of papers into given topics
- Select one of the predefined topics, taken from the COVID-19 Open Research Dataset Challenge, and obtain a list of related papers.
- List papers that are related to a user-entered question and rate the resulting publications' relevance.
- Sort the papers by the number of Google Scholar citations of the authors.
How we built it
We built our website using a Python backend with Django and a PostgreSQL database. It is deployed via Amazon Web Services on AWS Elastic Beanstalk. In order to classify papers to a given topic, we use natural language processing techniques to find correlations between the topic and the content of a papers' abstract. As proposed by Daniel Wolffram the classifier counts the occurrences of 800000 keywords that were extracted from the COVID-19 Open Research Dataset Challenge (CORD-19) from Kaggle. Afterward, a Latend Dirichlet Allocation is applied to calculate a distribution of the given text. The distance between the two distributions is the similarity between the texts.
For all of us, deploying a web application using AWS Elastic Beanstalk was a new experience. Though we had some time-consuming issues when setting it up, we learned a lot and are proud of getting it to work on time.
What we learned
In order to serve relevant papers for a user-entered question and for processing the abstracts of the papers for machine learning purposes, we needed to dive into the topic of Natural Language Processing. Besides, we were able to improve our knowledge of general web development and to gain experience with deploying on AWS.
What's next for COVID-19 Publications
As of now, classifying a paper to topics is done solely based on the paper's abstract. A future version may consider the complete text of the paper and thereby obtain higher accuracy.
Furthermore, we plan to allow verified experts to evaluate and review the papers informally. These reviews may consist of short annotations and a rating on its quality. This could lead to a discussion before a paper is officially peer-reviewed and provide indications for the quality of the articles.
Try It out
- Hack Quarantine
- Northumbria University: Flatten the Curve
- COVID-19 Global Hackathon 1.0
- Tech Takes On COVID Hackathon
- LauzHack Against COVID-19
- HackTheU Unbox Your Ideas