Everyone is fighting Corona in someways so I was looking into the major issues that we are facing and came up with this thought to help the community

Finding Solutions to Fight Corona Problems:

  • Too much data coming in twitter stream
  • Categorizing the data:
  • Corona Positive Tweets(Recovery cases, New Policy to avoid Corona, Trials Success) vs Negative Tweets(New Corona Case Reported, More patients dead, Spread to new area)
  • Corona Supply Request(We need to identify supply shortages in area)
  • Corona Notice(Major announcements)

Country based categorization:

  • Will have further drilled down with following:
  • Death
  • Recovery
  • New Case
  • Re occurrence

What it does

Currently our platform process twitter data and aggregate that to several categories and using a web platform to show the data to the entire world.

How we built it

  • Started building a PySpark Streaming app which will do the categorization of tweets
  • All the processed data  will be saved on to MongoDB
  • Visualization of the Data on MongoDB using a web platform

Web Platform Technologies considered:

  • Django based API Implementation
  • React based FE.

Challenges we ran into

Hosting it in AWS and connecting everything together.

Accomplishments that we're proud of

  • We got our platform up and running from idea to working platform on AWS server on the 5th day
  • We are processing data in near real time

What we learned

  • Need to look for concurrent user requests
  • Caching must be implemented

What's next for Corona Data Analysis Platform

  • Implement user request handling part
  • Build visualization on the data that we have collected so far
  • Manage the large data that we're collecting
  • Country based categorization

Try It out



django, mongodb, pyspark, python, react

Devpost Software Identifier