The live website above provides real-time information about the spread of the coronavirus in your community and provides tools to help you inform your friends.
Even at a time of crisis, it is easy to lose sight of how one's actions effect collective good. There are numerous people and employers who neglect to participate in social distancing, ultimately undermining the effort to flatten the curve and save lives. For example, see this video or this other video (There are way too many of these).
This is where CoronAroundUS comes in.
We want to tackle this problem by visualizing the data to display the importance of social distancing and to give users the means to inform their friends and employers about the need to social distance. The only way to tackle the challenge of maintain a global pandemic is for everyone to be on the same page and help do their part.
What it does
Our website displays all current cases of coronavirus up to today in addition to the number of predicted cases over the next 30 days. The site also displays each hospital in the US as well as the number of available beds still left at each one. These visualizations demonstrate the urgency to social distance. Users can then write an email using our personalized email template to their loved ones to remind them to social distance or to employers who are not adhering to the call to allow employees to work from home. Ultimately, we hope that this website will create awareness and put into proper perspective the magnitude of this challenge, and how each person can make a positive difference.
How we built it
We realized that the widely popular John Hopkins University data was not a reliable source of data for the site. Due to missing values, inconsistencies in the reports, and lock of uniform formatting, we were forced to search for alternate sources of data. To solve this problem, we decided to scrape data directly from globally recognized organizations including the ECDC, WHO, MSCBS, IRCSS, and DXY.CN.
Using BeautifulSoup and Pandas, we were able to scrape each of these websites to create our own live data pipeline. This pipeline includes data about all US states and counties, all Chinese provinces, all Spanish regions, all Italian regions, and all Australian regions, in addition to data for every other country as a whole in the world. Furthermore, this live pipeline will be updated every day at 9 a.m. PST and 9 p.m. PST to include the most recent statistics. The dataset includes the features Date, Country Name, Region/State Name, Latitude, Longitude, Number of Confirmed Cases, Number of Deaths, Population, and date that shelter-in-place orders were given for each location. We also collected and cleaned data that was made available by the New York Times to determine the number of used and available beds at all hospitals within the US. Once we finished parsing and cleaning the data, we uploaded the data to a MongoDB server for our frontend to fetch.
We tried using various Machine Learning models to predict the number of confirmed cases and deaths using the latitude & longitude and the number of previously deceased/confirmed/recovered by region. Eventually, after trying SVMs, Linear Regression, Polynomial Regression, and XGBoost training, we settled on using a 3-layered Neural Network with the Rectified Linear Activation function to perform regression. We then uploaded this data to the MongoDB database also for the frontend to fetch.
We also built a email service using SendGridAPI to allow users to send and templates email to their love ones or employers. In addition, we made an SMS service using Twilio API so that users can also contact via their phones.
Challenges we ran into
Working remotely was a difficult challenge for us because we could not coordinate as well as if we were working together in person. We needed to be proficient in communicating via Slack and scheduling Zoom meetings to keep up to date.
There was also great difficulty working with the data because the location of confirmed cases are not always consistent and there are sometimes missing columns. This was the most time-consuming part of the development process, and we were ultimately forced to develop our own data pipeline due to the inconsistencies of the publicly available data sets.
Our machine learning models started off with very poor accuracy. It took us a while to realize that there was a mistake in the data pipeline. This mistake along with some communication overhead among our team cost us a lot of time because Muntaser was not aware of the mistake in the data until a few hours later, so the rest of the team needed to wait for him to get back and rerun the models for predictions.
On the frontend side of things, it was a challenge to coordinate work on the website because we wanted our website to look cohesive and concise. To keep our style consistent, we made mocks up of our website using Figma so that all of us were on the same page.
Accomplishments that we're proud of
Despite difficulty of not being able to meet each other in person, we were able to coordinate and deploy a polished and useful website.
What we learned
We should have spent more time at the beginning coming up with certain checkpoint meetings. A lot of time was wasted when one of us was waiting for another person to respond. Every bug/mistake/miscommunication costed us ten times as much time as it otherwise would if we were in an in-person hackathon.
What's next for CoronAroundUS
We would like to add more metrics for a more comprehensive overview of how hospitals are handling the covid-19 outbreak. It would also help to add visualizations that compares and contrasts the result of no controls vs social distancing. This would help users more directly see how many lives they can save by choosing to stay at home.
Try It out