Today, the technology exists as a support for all people, it has helped make our traffic signals work, allow us to use the internet, and even goes as far as powering our toaster at home. Our inspiration was the supporting role that technology plays in society today. With this project, we plan to expand that support to help a new category of people.
The project has 2 protocols, the blind protocol which is able to help the visually impaired user with visualizing the world around him through his other senses. The user at any time could ask the specto to take a picture. Specto could take a picture and will tell the user all the objects in the area and if there is any text in front of a user, like a book. It would read out the text for the user. Specto also has the ability to alert the user if there is a wall or object in front of the user, to guide the user through the world. Specto's deaf protocol is a web app that is able to detect any audio in the real world and turn it into text for the user to understand. The user could type back and it would turn in to audio for the other person to understand. These two protocols help support both blind and deaf people through their everyday life.
This project has multiple parts and was split up by each of the members in our team. We built the blind protocol using infrared sensors to tell the user when there is an object in front of them. The computing for the blind protocol happens inside the raspberry pi. We used the Pi Camera so when we ask Specto to take a picture it would take it. And the image to text algorithm and the detect multiple objects program would run to insight the user on what is in front of him or her. For the detect multiple objects program we used the Google vision API to make it work, and for the image, to the text, we used a python library known as pytesseract. To take a picture the user says "Specto take a picture" which could be detected using the speech recognition library. For the deaf protocol, we used the speech recognition library again to help deaf people communicate with someone else. The speech recognition detects the other person's voice and turns it into text for the other person to understand. The deaf person could speak back by typing on the web app and using gTTS we turn that text into speech.
A few challenges we had was trying to get the service key for the JSON file for the google vision API to run on the python code, and after that, we face a challenge that only was able to detect multiple objects in the demo file, but we just had to change a few minor details to make it work for any image file. For the Image to speech for the blind protocol, we faced a few problems with trying to play an mp3 file using playsound because the pi was not able to find the mp3 files.
What we learned
We learned how to make an organized project with 2 protocols that work together to solve a single problem.
We are in the process of adding a feature for users with hearing disabilities who want to use our platform. Functionality to text back and play responses into audio is coming soon.
Because of social distancing, we had to tackle this problem separately, so the finished form of this project is a bit separate, but after this pandemic, we plan to meet and connect the parts of the project to make it more unified
Additionally, we would like to improve on the Vision API and making the voice to text more reliable.
Installation and other info
Additional information can be found here: GitHub ReadMe
Try It out