From our experience, we realized that there is a lot of information out there when it comes to dealing with the coronavirus and in fact, a lot of topics, and simply, not enough time in our daily lives to go through all the video publications by researchers, government press conferences, and all the news editorials. We also realized that this problem is also even more prevalent for health care professionals and other workers at the front-lines who are seeking to accomplish tasks in a larger time crunch than ever - all the while, trying to stay in touch with society, other world events, and most importantly, other word-wide developments regarding the efforts to combat this virus.
What it does
Inspired by the ordeal of too much information and little-to-no ways of being able to grasp all this information without having to go through it ourselves. From our research, we realized that the primary mode of information for people is through Videos and the most common source of these videos is YouTube.
To solve this problem, we came up with 'Summarize This Vid For Me' which takes a YouTube video link and utilizes Google's video intelligence API to transcribe the audio into text and natural language processing in an unsupervised algorithm to summarize the key elements in the video by recognizing the most common sentences/ words. In our WebApp, users can specify an URL(which contains the video) and their email, with our service sending out the emails once the algorithm has completed its summary. Using our WebApp, we can summarize any videos that are less than 15 minutes in length where we hope to eventually expand it to include extended videos as well as a feature to parse the web for videos of a specific type.
We believe that our project,'Summarize This Vid For Me', will be quite effective even in its current state in allowing busy health professionals as well as other professionals with increased responsibilities (due to the virus) to get accurate and minified summaries of a set of videos which they mean to watch but cannot due to time constraints. In this way, we believe that this will contribute to benefiting their health where they can use their increased time to serve the needs of taking care of themselves as well as socializing (an integral aspect of their mental health). On a broader scope, our project can also be used in finance environments to summarize publicly available analysis/market trends offered by different experts, and even by the general public to educate themselves on different news and cumulative sources.
How I built it
From a front-end perspective, our team did not have much experience with front-end technologies and we are happy to have built a working full-stack application which communicates with a backend.
To build the algorithm, having little experience in the field of building custom NLP models, we had to research common NLP libraries and how these can be used with text summation techniques. From our research, we found that there are two common techniques (ways) of summarizing text: "Extractive Summarization" and "Abstractive Summarization" where the former uses recurring data (words/ sentences) to recognize which sentences are the most important while the latter extrapolates on the existing data set to build "custom" sentences/ summaries. Given time constraints, we had to use extractive summarization but we hope to expand it to include abstractive summarization in the future. Eventually, we developed our summarization feature based upon the "page-rank" algorithm where we split the text into sentences, deleted stop-words using NLTK, built a similarity matrix by constructing vectors of sentences and using cosine distance using numpy, and used the similarity to construct an adjacency matrix with Networkx, then organizing the nodes based upon the "page-rank" algorithm.
We tried to deploy our algorithm using cloud functions as well as a pub/sub to build a complete data pipeline but owing to time-constraints, we ended up using the Google Kubernetes Engine. The other google services that we used included the Video Intelligence API to transcribe the videos, cloud storage to store the videos along with Twillio SendGrid to send out the emails to the specified users.
- Google Kubernetes Engine - Host the server on the backend
- Google Video Intelligence API
- Google Cloud Storage/Bucket
- Google Cloud Functions/Pubsub
- Twilio Send Grid
- aiohttp (async api calls)
- Natural Language Toolkit (NLTK)
- Networkx Complex Networks (for text rank and graphing)
Challenges I ran into
The main challenge that we ran into was during our research of different NLP algorithms in deciding which algorithm to choose and how to "filter" the text we get from videos in giving this text to our algorithm. This was the case because videos are often quite unstructured and lack the repetitive patterns to make assessments using an extraction algorithm. Other challenges that we faced:
- Having the backend communicate with the front-end
- Setting up all the authentication while deploying the algorithm
- Centering the elements in CSS with a grid-column setup
- Setting up async calls on python since synchronous calls may take quite a while for the user to get a response
- Setting up the NLP algorithm to summarize info
Accomplishments that I'm proud of
- Some of our team members learning HTML and CSS in one night
- Learning to use Natural Language toolkit and resarching about NLP algorithm (textrank) and how it uses extrapolation to extract condensed information
- Using Video Intelligence API to transcribe video into text
- Setting up a full stack application
- Deploying a working application on the cloud and trying to set it all up with a data pipeline
What I learned
Overall, we learned how to setup a full stack application which takes input, downloads the video provided by the input, transcribes this video, and summarizes the text in the video. For the NLP segment, we used a text rank alogrithm powered by NLTK for find sentence similarity and networkx to classify the sentences.
What's next for Summarize This Vid For Me
In the future, we hope to add a feature using Pandas where we can parse different sources for videos just from a key-word and rather than having the users enter an URL, they can enter a keyword and we can summarize the data from all these sources. We also hope to complete our aim to set-it-up all on the cloud by using a custom data pipeline for the aim of this being a more scalable solution and one which is far from secure than setting up API calls. We also hope to add a database where we can store the transcriptions of videos which is our most time-intensive step so that we don't have to repeat this step for existing videos.
Try It out
aiohttp, google-cloud, google-storage-bucket, kubernetes, natural-language-toolkit, python, twilio-send-grid, video-intelligence-api