It all started last month when I first start to pay closer attention to the coronavirus. At the time I was thinking about projects I could make to possibly help my community once coronavirus became a significant issue. For example, one option I was thinking of was to basically have self reporting of how people are health wise (sick in general, healthy, suspect coronavirus or confirmed case) and put it on a map to see how the virus spreads over time. The other night I was thinking about how widespread qr codes are in places like Japan for some reason and how snapchat has this feature where you can add friends through qr codes. I put two and two together while I was sleeping apparently. I woke up and I knew what my project was going to be. Coincidentally when I was checking my phone in bed I got an email saying that submissions for this hackathon started that day. I was already 2 days behind schedule so I got to work the second I got out of bed. The app I came up with basically puts together parts of my first idea and with the connection I made involving qr codes. Instead of tracking the location of people I could just try and model spread through a network of people.
What it does
So lets say I had to meet with someone outside of school about something. Once we would meet up, we pull out our phones and open the app. When we log in, one person would create an "interaction" which would generate a qr code on screen and log anyone who was a part of that meeting (the only things that would be logged are time and who was a part of it, yes, more than one person can take part). The other person would open the QR code scanner and quickly scan the other person's screen. And that would be it. I wanted to keep the interaction logging aspect of the app pretty simple since if it was too hard/complicated, probably no one would use it.
Once interactions are logged you can go to your overview page which would show all your previous interactions with users. Another main feature that I wanted to implement is self reporting of you health status. If someone in the app reports themselves as having corona, anybody they have been in contact with in the past 2 weeks gets a notification saying that they were directly exposed to someone with corona. The reason why I wanted to include a self reporting system even though it has the possibility of being abused is because of the problem time creates. 2 weeks is a relatively long time, and when you have to try to come up with who you have been with in that timespan, that's really difficult to do. I would imagine contact tracing for the CDC or WHO is probably very annoying and expensive so this would maybe ease the pain on that front. I would also say that this better than normal contact tracing in the way that this is much more scalable.
The final main feature I wanted to create was analyzing the transfer of coronavirus and creating a risk calculation that would benefit users. I was working on an algorithm that would create a network of people and figure out the probability of spread based on past data that is collected in the app. Now the main issue is that when there is no past data, you can't really create a risk calculation especially when very few people have coronavirus in your area. I would say once enough data is collected then I would report a risk probability for each user. This would represent the probability that they have corona virus and they could spread it. For example, if I had a risk of 0.3 that means I have a 30% chance I have corona, but based on past data I would infer what probability of spreading to another person, which lets say is 0.4. 0.3 * 0.4 = 0.12 and that would be added on to someone else's risk if I met with them. The algorithm would get better with time so once enough people use it for long enough the probability of spread would hopefully approach the population parameter. Now the example I gave isn't how the algorithm would work but I used it to give a sense of what the result would be.
How I built it
The first day I was mostly researching how I should build the whole project along with experimenting on how to calculate risk. I decided to use Flutter as the mobile device code because I wanted the app to be cross platform so users wouldn't be limited to one phone os. I used Django as my backend since I have a lot more experience than other options out there. I was using postgres as my database since that works very nicely with Django. I then laid out a simple plan for how I would create a few features of the app. In the image gallery there is an example of a diagram I drew that was an overview of the QR code interaction part of the app.
For the 3 days of time I had to code, I was using the Django REST framework to create endpoints and authentication for the mobile app. More on that in "Challenges I ran into"
The app part of the project was being worked on by my partner but there is more to that story which I will explain shortly.
In total I spent more than 35 hours in 3 days working and I only managed to getting close to finishing the backend. I actually thought at the start that I had a week to build the whole thing which I thought was doable, but yesterday I found out that in fact the 30th was not a week later from friday.
Challenges I ran into
This entire project was a challenge. Let's start with the most recent setback. My partners computer is extremely old. Like we're talking 2010 old with windows 7. He tried to compile the flutter app which ended up blue screening his computer, made a loud pop and refused to boot. For whatever reason he thought it was a good idea to try to solder something on computer, which only made things worse and made it start to smoke. So whatever progress we made on the app was probably lost or on an earlier version. I guess this is what happens when you only have your git repo progress saved locally. I can't really speak for what my friends experience was, but I can imagine it was pretty difficult.
While my computer didn't end up frying itself, I had the typical "I have no idea what the hell I'm doing" experience of coding. I have never used Django REST so I was learning as I was going. In my opinion that's the only way to actually learn something, by struggling and feeling the pain as you are going along. There have been countless hours creating unit tests and having them fail, hidden bugs that would take hours to find and staying locked away from my family for hours at a time since I was trying to finish as much as I could in the time I had. I'm over exaggerating a tiny bit, but it certainly felt like everything took that long.
Now neither of us have actually used flutter so my friend was learning as he went as well. I was going to join him on the work of the flutter app once I finished with the backend so we would have been in the same boat.
Another pretty big challenge I had was trying to create an algorithm for calculating risk. I honestly had no idea where to start because I've never tried to create algorithms for modeling a network. I used the python library networkx to create the networks. At first I tried to create something based on intuition, which was not very scientific of me. That ended up not being very promising. Next I did lots of research on the math behind epidemics. A lot of the information that I found was extremely dense, complicated and beyond my level of understanding. I didn't have a college degree in statistics or calculus at my disposal since I'm only a junior in highschool so I tried to create my own model instead. I made an example network in my notebook and tried to show the relationship between each node in an interaction with linear equations. For example, if there are 3 people in an interaction, person a can spread their risk with the other people and the other 2 people can spread their risk to the first person. The issue was that in order for me to model the relationships between nodes, I had to know what the probability that coronavirus could be spread from person to person. The pictures of my work can be found in the gallery. Now some early research would should something promising called R0, but that didn't end up going anywhere. I made a huge concept error, and I only realized this after I spent 9 hours working on the math. Even though I don't have a degree in statistics, I am taking AP Statistics this year so I tried to apply my knowledge to make something. It's kind of funny how it seemed like I was making progress but in fact it was basically worthless. Let me explain. So I tried to take the binomial distribution formula and solve for the probability (p hat) in terms of n (size of the sample). Now the solving of the equation wasn't the hard part, the hard part was understanding what the heck the answer was representing. I made a lot of mistakes along the way and had to fix my work multiple times. After spending an insane amount of time, I sent my stats teacher a really long email asking for help (because she is awesome!). It goes further into depth of what I was doing so I shall include part of the email here.
So there is this coronavirus themed hackathon that I am participating in and I have to submit a project by next week. My goal is to make an app that people can track interactions between people by scanning each others phones or a bar code. Now here's where the stats come in (and my problems). I was trying to come up with a way to calculate the probability that you will contract corona based on the people who use the app and who they have been exposed to. I was trying to represent users using graph theory (nodes and edges) and see if that was promising. Here's an example I randomly generated: **see nodes.png** The red represent infected people, the blue represents unconfirmed/not symptomatic/not sick and the black dots represent events where multiple people met. I would assign each blue node a risk probability based on the number of indirect and direct connections they have to someone with coronavirus. Now the math that represents each risk value based on connections I don't really need to show, but the one roadblock I've hit requires me to figure out what the transmission rate is. So I started to use things from stats! I decided to try and model the risk with a binomial distribution. One of my main issues is that I have just gone so deep into the math that I barely understand what it's representing anymore so hopefully I can explain to you and clarify to myself what it all means. It's entirely possible that all of my work is wrong so keep that in mind. So I started off by trying to solve the probability of a binomial equation for p. I decided that if there is a meeting of group size n, the number of people that can be infected is n-1 (since you can't infect yourself). When you substitute n - 1 into the combination equation is simplifies to being n. Then when you substitute that back in you then can solve for p using the quadratic formula (I set x equal to n-1 just to keep things straight while solving). I was actually super proud of myself since up to that point I didn't really need to look anything up. My interpretation of what n (at first) was the probability of infecting an n group of people. **see binomialwork.png** Then when I graphed this on desmos I got this. I was not sure this was right because as the number of people to infect in a group increased, the area under the curve decreased. I decided to roll with it since I was probably only going to use it for a value of 1. After this I thought that maybe the higher n values could represent probability of consecutive infections so If I have 3 nodes and the first one is infected O - O - O, maybe the probability at n=3 would be the probability the last O gets infected? What do you think? **see desmos1 image** This is where things started to get murky. I thought that this didn't look right and was too perfect. I knew about this thing called R0 which basically is the expected number of people that the virus would be passed on for one case. For coronavirus that is 2.2. Wikipedia: In infection modeling there is something known as R0 which the dictionary definition says: In epidemiology, the basic reproduction number (sometimes called the basic reproductive rate, denoted R0) of an infection can be thought of as the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection. Then using that as my expected value, I plug that into the expected value for a binomial distribution. I knew that 2.2 was the expected for 1 person so I substituted in 1 for p. Then here is a questionable choice I made. I divided both sides by 2.2 so on the left is 1 but on the right is (1/2.2)*p. I thought that this would sort of correct it and put it at the right "scaling". This is what it looked like on desmos **see desmos2 image** It seemed to make sense that for spreading it to one person it was not exactly one, but then I remembered that R0 meant that reproduction number was the total number of expected cases, not for every meeting of people. So then I got stuck between a few routes for fixing this. I thought that dividing by a correction factor N (N which would be the number of people the infected person has seen) would account for the number of people he has already seen. But then I saw as N increased, the probability of infecting someone else decreased. But then if I put N in the numerator it would go way above 1 so it wouldn't make sense for it to be a probability. Then I wasn't sure to do. Here were the 3 options I was thinking about and their graphs (The third one is the black graph in the image above) N at 1 **see desmos3 image** N at 3 **see desmos4 image** And then after that I remembered that R0 is not a set value because it varies from person to person. There are some people who have infected nobody and then others who have infected hundreds. Then I thought maybe I should make a sampling distribution and report a margin of error and a confidence interval for R values and then calculate a range of risk values based on the confidence interval. This was all well and good until I realized that we haven't learned how to create a sampling distribution for data that wasn't a proportion. Then I wondered what the area under the curve meant. I don't know a lot of calculus since I'm not taking it this year but I then took the integral of this between 1 and 2. I honestly have no idea what this means in this context but I wondered if this would be useful to me in anyway (looking at area under the curve versus individual integer values for group sizes) **see desmos5 image** And now here I am. At this point I'm wondering, if the probability of risk is wrong, but at least seems reasonable and is consistent, is it alright to say that it "somewhat estimates" risk when I put in my app? I doubt anybody is expecting me to create some groundbreaking research on epidemiology so I'm sure I could get by. If you couldn't tell, I have no idea what the heck I'm doing and if you have any advice on anything and especially how I can know if what I'm doing is right please let me know. I've tried looking up the statistics behind actual models and there's too much calculus and prior knowledge required for me to be able to understand and actually code it.
The verdict was that the statistical methods I was using were completely wrong which kind of sucked so that is something I have to rework.
Another setback was the burn out I was starting to experience. After working for so long with such high intensity, it really started to get to me. I decided on Sunday morning that I wouldn't work as much on the project since I knew there was no way in hell that I was going to have a finished project by then.
Accomplishments that I'm proud of
I'm honestly really proud of my partner and I because we both had to go through a lot to get as far as we did. For me at least, not only is this the most I have coded in this short span of a time, but I don't think I could have made as much progress if I had a whole week to work on the back end. The time constraint really pushed me to work faster and more efficiently. I just hope I can take that intensity of work back into a normal time frame.
What I learned
I also learned a lot in the time frame that we had. Considering I had no knowledge of how to use Django rest, I'm happy with the amount I learned. I also learned that you can't always get the amount of work done you want to do. Burnout is very real and I tend to forget that when I just want to get something done so badly.
What's next for CoronaGo
I'm extremely excited for what the future has to hold. 4 days to only work was a really short time span, and based on the amount of work I got done in that time, this project isn't too far from being released on the app stores in my opinion. It's going to take a lot of work that's for sure, but I can't wait to see people actually using it. I think that this was one of my best ideas so far (yes I found out after the fact there was a similar suggestion on the ideas page). I already registered the domain coronago.net so I'm excited to continue working on it. I think this idea has a lot of potential!
I'm kind of sad that I don't have any actual tangible results to show (like a completed app), but this is the best my partner and I could do with the time we had. I'm pretty sure this was both our first ever hackathon so I would call it a success! I have no doubt that this won't do very well since other people probably have had a similar idea and have worked on it for longer, but if it is possible to give feedback on any of my ideas, I would really appreciate it!
Try It out
css, django, flutter, html, networkx, postgresql, python, rest