Patient Matching Challenge presented by Office Ally

What it does

Trained a model that can group multiple patient records of one person. Compares levenshtein edit distance vectors of chosen categories between patient records to determine if they are the same

How we built it

Used Python's scikit-learn to train a SVM classifier. We chose to use a SVM classifier over logistic regression due to its higher accuracy on the provided data, but overfitting may have occurred. In future cases, logistic regression may be a better option to prevent overfitting. We decided that certain categories such as name or address are more important than others that may be very sparse, and subsequently used only those categories in the model.

Challenges we ran into

Figuring out how to approach problem (which categories/classifiers to use), handling missing input, figuring out how to incorporate the model into a web interface

Accomplishments that we're proud of

Achieving high accuracy on provided Patient Matching Data

What we learned

How to utilize machine learning tools with medical data

What's next for Patient Matching

Improving model accuracy and efficiency (to accommodate for larger data sets), training with more data, creating a web app/interface for this model

Try It out



fuzzy, matplotlib, nltk, numpy, pandas, python, scikit-learn

Devpost Software Identifier