Authors
Inspiration
Patient Matching Challenge presented by Office Ally
What it does
Trained a model that can group multiple patient records of one person. Compares levenshtein edit distance vectors of chosen categories between patient records to determine if they are the same
How we built it
Used Python's scikit-learn to train a SVM classifier. We chose to use a SVM classifier over logistic regression due to its higher accuracy on the provided data, but overfitting may have occurred. In future cases, logistic regression may be a better option to prevent overfitting. We decided that certain categories such as name or address are more important than others that may be very sparse, and subsequently used only those categories in the model.
Challenges we ran into
Figuring out how to approach problem (which categories/classifiers to use), handling missing input, figuring out how to incorporate the model into a web interface
Accomplishments that we're proud of
Achieving high accuracy on provided Patient Matching Data
What we learned
How to utilize machine learning tools with medical data
What's next for Patient Matching
Improving model accuracy and efficiency (to accommodate for larger data sets), training with more data, creating a web app/interface for this model
Try It out
Hackathons
Technologies
fuzzy, matplotlib, nltk, numpy, pandas, python, scikit-learn