Even by the most optimistic of estimates, the demand for intensive care beds and ventilators will exceed supply as the number of COVID-19 cases increases. Public officials at all levels of the government will need to determine quickly when and where additional treatment supplies should be channeled.
By designing a tool that makes it easy to identify where supplies will likely run out, we can help officials allocate resources where they are most needed in time to improve outcomes for patients. We aimed to leverage data from the Census Bureau, Definitive Healthcare’s ICU bed data and ventilator availability/utilization data, and our data science team to provide this much needed tool.
What it does
Users can view a map of the United States and filter to a state and/or county of interest. Based on the filters a user applies, information is provided about that area’s total available treatment resources (beds, ventilators), as well as projected days until current resources in the health care system are maxed out.
Because this data can be viewed at a national, state, and county level, government officials at all levels can use this tool to plan resource distribution and understand the possible ramifications.
How we built it
The tool uses public information about population distribution, COVID-19 cases, and COVID-19 testing. Our data scientists have modeled the likely progression of the virus. We have combined these models with Definitive Healthcare’s data on hospital and health system capacity. This allows a user to see current ICU Bed and ventilator capacity as well as estimated cases and estimated severe cases.
Our first step was to ingest and integrate a few sets of data. We assembled several of the Census Bureau datasets relating to population density, demographic information, and county estimates. We normalized the data and utilized the FIPs identification numbers to merge these public datasets with Definitive Healthcare’s proprietary information on healthcare facilities.
To model the growth rate of the number of cases, we used data from the New York Times on the number of cases per day by county and by state. We then fit an exponential function to the data at the county level to model the growth by days since the first case or March 1, whichever was later. We found exponential to be the best fit to model the current data after trying several alternatives. For counties which did not yet have any cases or did not have an adequate number of days of changing data, we fit the state level data to an exponential curve and applied the state level growth rate to those counties.
Finally, we combined our COVID-19 day-by-day predictions with geographical capacity information to predict when case volumes would exceed the healthcare system’s ability to care for severe COVID-19 patients in each county. These predictions were displayed in a Tableau visualization.
Challenges we ran into
Our challenges fell into three basic categories: availability of data, meaningfully linking the data, and modeling the data. As access to local level data varies across the US, we were pushed to focus more on national sources providing data at a county level. In terms of linking and modeling the data, we had initially hoped to create a growth rate model that incorporated features about the county, such as population density. However, we found two gaps in the data sources we used that limited our predictive models: missing nationwide data on relevant features, and case count data not covering a longer period of time across more counties. We had to scale back the scope of our estimation methods, and instead utilized the state level growth models.
From a visualization perspective, the level of detail when aggregating proved challenging. We used Microsoft SQL Server to format the data by county versus by date, which allowed for a smoother transition to Tableau. But by doing so, we unveiled several other issues with the integrity of the analysis. We realized that the data we had sourced was not homogenized and when attempting to leverage it, holes appeared that required immediate attention. Additionally, the fact that not every county had reported cases at the time of beginning this project made some estimations more difficult than anticipated.
Accomplishments that we're proud of
Much of the reporting and modeling our team has seen in the news focuses on tracking and mapping the growth and spread of COVID-19. We feel this data is incredibly important and leveraged some of that work here. However, as a team, we wanted to focus on capacity planning and predictive modeling. As a result, we are proud of doing our part to reframe the conversation around capacity and not just spread.
We believe our tool can be leveraged by officials at all levels of government and the public, as there are going to be many hard decisions to come about where to send resources. It has been a truly fulfilling endeavor enabling those decisions to be as data driven as possible.
We’re grateful for Definitive Healthcare’s willingness to allow us to use its data on health system capacity and proud to add that data to the growing national discourse.
Our hackathon team consists of members from a variety of Definitive’s internal departments. Usually we don’t have a chance to collaborate directly in our day-to-day. Despite this, we are amazed how quickly we gelled into a cohesive team with everyone pitching in.
What we learned
It was sobering to see that all available data still points to the United States being in an exponential growth phase. Even more sobering however, is just how early we are on that curve and how close we already are as a nation to exceeding the capacity of many health systems.
At Definitive Healthcare, we work with data across the entire continuum of care all day, every day. Still, we were shocked to see the magnitude of COVID-19 as a stressor on the healthcare system. Our results highlight the disparity of access to intensive care treatment across the country and the ease with which that access can be overloaded.
On a more positive note, we learned that our team at Definitive can pull together a meaningful project on such short notice. Even as a brand-new team, we effectively communicated our strengths and needs and built something we are proud of.
What's next for Definitive COVID-19 Capacity Predictor
- Our dashboard’s data source is currently static based on the available data from 3/26/2020. This was a concession for our Minimum Viable Product submission, and we would like to incorporate live data feeds into the dashboard.
- The addition of Custom-Value Submissions for users to manually enter resource quantities where they believe they are available. This would allow the model to change as tents with beds are added and mobile care units are deployed. Based on the user submission, the model would provide a custom estimate of when their resources will be maxed out.
- We would like to improve our modeling algorithm. Some of those improvements would likely include adding resolved cases to the model and incorporating local level data when we feel it’s appropriate.
- We would like to continue to add additional relevant data sources as they become available.
Try It out
python, sql, tableau