The new outbreak of Coronavirus has been the biggest news in the first month of 2020, especially for many of our friends who still have families in the affected area. After WHO declared the situation in China a Public Health Emergency of International Concern, many countries have heightened their measures to better control the spread of such disease. This grave situation motivates us to study the spreading trend carefully, and contribute our share using knowledge we learned in the Machine Learning class. Specifically, we want to predict the trend of new confirmed cases by applying the LSTM model to the two major epidemic datasets.

What it does

Using the daily number of confirmed cases in Hubei Province, China, where COVID-19 was the most dense, we attempt to predict the development of the virus in New York State, US.

How I built it

Based on the literature research on the prior work on forecasting time-series data, we have decided to use the Long short-term memory (LSTM) as our main model to train and test the data. Our data source is the Johns Hopkins CSSE dataset here, where the collected data is from various sources including WHO,, China CDC, US CDC, etc.

For the overall approach, we intend to follow the steps listed below:

  1. Data collection and cleaning to format it as inputs to a LSTM model
  2. Discover and visualize data in different regions to gain insights
  3. Preprocess the data through the same pipeline
  4. Build a LSTM model and train it using the Hubei data
  5. Fine-tune the model by comparing with actual data
  6. Forecast the trend of spread in other newly affected regions.

Challenges I ran into

Time Series prediction beyond test data

Accomplishments that I'm proud of

The fact that we actually trained a LSTM Model without error

What I learned

LSTM Model, MinMaxScaler()

What's next for Predict Coronavirus Growth Trend of the World

First, we need to conduct the last two steps in the Method section, i.e. to fine-tune the model, and to make predictions based on the trained model.

Try It out



keras, python, tensorflow

Devpost Software Identifier