Just like the stock index, the Twitter Pandemic Index reflects global sentiment about the COVID-19 pandemic.
What it does
Slide along the Twitter Pandemic Index and see the most frequent words, hashtags, emojis and symptoms mentioned on those up and down days.
For example, the Twitter Pandemic Index follows the ups and downs of several milestone events (ref: CNN timeline):
- 2020-02-02: first death outside China (Philippines)
- 2020-02-07: the world mourns the death of Dr Li Wenliang whose early warning about the coronavirus was silenced by China
- 2020-02-29: first death in the US (Washington state)
- 2020-03-03: Federal Reserve cuts interest rate by 0.5%
- 2020-03-11: WHO declares COVID-19 a pandemic
- 2020-03-13: US announces relief package One interesting observation is that since the middle of March, the Twitter Pandemic Index rose to a new high (but still not in the positive zone) when people worked from home and started using more :heart: and :prayer: in their emojis, suggesting that working from home improved sentiment and unified people, bringing out the best in humanity.
The frequency of symptoms mentioned in the tweets also provide an incidental indicator of influenza-like illnesses (ILI) among the population. By listening for symptoms self-reported in the tweets, policymakers can forecast the rise of ILI cases before they are confirmed and take earlier measures to flatten the curve.
How I built it
The Twitter Pandemic Index is calculated from the text of over 200 million tweets collected in the Large Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research by team members Juan M. Banda and Ramya Tekumalla.
Tweets were tokenized, preprocessed and annotated using the Social Media Mining Toolkit (SMMT) with a dictionary comprised of terms from SNOMED, MeSH, ICD9/10, CPT and the Consumer Health Vocabulary to capture layman terms common in social media. Sentiments were calculated with VADER.
The Twitter Pandemic Index is primarily based on the VADER compound score and plotted alongside the number of confirmed cases to show the evolution of the pandemic since January 27, 2020, when tweet collection first began.
Challenges I ran into
We would have loved to incorporate more inputs such as mobility data and effective reproduction Rt as a more comprehensive measure of the Pandemic Index such that it reflects the impact of human behavior on the COVID-19 spread. Right now, our main input, Twitter chatter captures the global topics of discussion.
We would like to analyze the chatter by country to better understand the more nuanced concerns of people in various regions, e.g. pasta in Italy
Accomplishments that I'm proud of
Collecting and analyzing over 200 million tweets and recognizing key entities such as medical symptoms Presenting a Twitter Pandemic Index chronicling the pandemic’s ups and downs mapped to major events unfolding over time (e.g. trough coincided with WHO declaration of pandemic; peak coincided with US relief package)
What I learned
Crowdsourcing social media chatter reveals close to real-time human reactions and sentiments about key news events.
What's next for Tweet Pandemic Index
The frequency of symptoms mentioned in the tweets also provide an early indicator of influenza-like illnesses (ILI) among the population. By listening for symptoms self-reported in the tweets, policymakers can forecast the rise of ILI cases before they are confirmed and take earlier measures to flatten the curve.
Additional social network analysis to understand the emergence and virality of news and Twitter posts would allow the media and policy makers to better manage misinformation spread and generate more effective communication content and channels to inform and assuage the public.
Try It out
ascend, google, python, smmt, spacy, vader