How a maker uses image processing, speech recognition, and machine learning to solve Sudoku puzzles.
Many people keep their minds sharp by solving crosswords over a weekend, and others enjoy the challenge to solve 9x9 Sudoku puzzles. In Arijit’s case, he wanted to make a robot to solve the puzzle for him.
Using a Raspberry Pi and camera as the main parts, Arijit made a robot to recognise and solve the sudoku using image processing and machine learning-based techniques.
We caught up with Arijit to learn all about it.
Thank you for sharing details about your project with us, Arijit. Please tell our readers about yourself and what got you into electronics.
I am Arijit Das, a Computer Science engineer from India. From my childhood, I loved to build things. Thus, while studying engineering, I started to learn skills that can help me to work on the things I always wanted to build.
At first, I learned programming, and after that, I realised though I can build any software with my knowledge, I can not build any hardware project without the knowledge of electronics. And from there I started to learn electronics and also started my journey as a maker.
Since then it’s been 6 years and I have made quite a lot of projects. I also have created a YouTube channel https://www.youtube.com/c/SPARKLERSWeAreTheMakers to share my knowledge and my projects with people who are just starting their journey. My few other projects have been featured in top magazines and websites from all over the world.
Excellent. It’s great that you are sharing your projects to inspire others. What motivated you to make your robot, SODU?
Sudoku problems are typically tough for most people to solve. A novice could need many hours to finish a straightforward sudoku problem. Since sudoku puzzles may be quite difficult, we wondered if it was possible to create a robot that could swiftly and simply solve any sudoku puzzle just by looking at it.
And if we can build a robot that can do this, that same robot can solve many other kinds of problems also with the help of some other algorithms only.
Another motivation was to demonstrate how a robot like this solves complex problems like this. Like we humans solve sudokus in a different way but coming to a computer, it follows a very specific algorithm. And as humans, if we follow that algorithm it will take us days and maybe months.
But as computers are super fast, they can do it within a few seconds. So we also wanted to show all these things and that’s why this robot not only solves the problem, but also demonstrates the overall process of solving.
How does it work?
The operations of this robot are as follows:
After being turned on, the robot will be ready to react to voice commands.
The robot has an LED indicator on the top of its head. When the LED is on, the robot is listening to the user, and when it is off, the robot is processing the previous command.
When asked to introduce itself by the user, the robot will do so, as it has speaking abilities.
The robot will start capturing video when the user instructs it to start solving sudoku. Additionally, the video will be streamed on the screen.
The user must then hold a sudoku in front of the robot. Sudoku can be printed on paper or displayed on a screen of a digital device.
After holding the sudoku image in front of the robot, the user must order it to take a picture.
After capturing an image of the puzzle the robot;
i. Detects the sudoku from the image.
ii. Extracts digits from the detected sudoku and forms the unsolved puzzle.
iii. Demonstrates the process used in solving the puzzle while using a backtracking approach.
iv. When the puzzle is solved, the robot shows the result on a screen and waits for the user to direct it to go on to the next puzzle.
The robot will provide feedback at each stage of solving the sudoku puzzle using an artificial voice.
Brilliant. What parts does it use?
We have used the following components while building this robot:
- Raspberry Pi 4 B
- Raspberry Pi Camera Module V2
- Raspberry Pi 3.5 Inch Touch Screen Module
- Speaker (small in size)
- Small LED
- Raspberry Pi Power Supply
- USB Mic
- Jumper Wires
- 3D Printed Body Parts
What kind of prototyping was needed?
At first, we simply started with a Raspberry Pi and camera. We started with the sudoku recognition part and when we got it correctly, we used backtracking to solve the recognized sudoku. And after these two main functions worked properly, then only we started working on the face animations, voice recognition, speaking etc. After all the functions worked perfectly, we attached these components with the body parts and made the final robot.
A sensible approach. What challenges did you need to overcome?
The main challenge was to recognize the sudoku puzzle accurately. As we are recognizing the puzzle from a piece of paper or some kind of screen, we had to apply lots of image processing techniques to recognize the digits accurately.
This is because if the robot misses at least one digit, that can change the solution completely. And this recognition depends on a lot of things like background light, digit size, font type, quality of the image and many other things. Thus we need to apply image processing techniques in such a way that it should cover all the possible difficulties and give us a very accurate result.
It's great to see the camera work on printed or onscreen puzzles. How accurate are the results?
The level of precision is highly influenced by the lighting. In 90% of circumstances, if someone holds the sudoku correctly in good lighting, it will accurately identify all of the sudoku's digits. One or two digits are incorrectly recognised in 10% of situations. It frequently gets mixed up with the numbers 6, 8, and 9.
Tell us about the code. How much was custom coding and/or third-party applications
The code is completely written in Python 3. And here we had to use the following libraries:
- OpenCV: Used for image processing based tasks
- Numpy: Used for working with arrays
- Pytesseract: User for optical character recognition
- Pygame: Used to create GUIs
- Speech Recognition: Used for speech recognition
- Imutils: Used to capture video
Why did you choose to make it speech controlled instead of an old fashion touchscreen or pushbuttons?
At first, we created a motion-controlled robot that required us to hover our hands over it in order to operate. However, we wanted to make it seem more real, so we added the ability for it to respond to our voice orders as well.
Additionally, there is no restriction on the amount of instructions we may use to operate it while using voice control. Since it can currently just solve sudoku puzzles, we may add many more functions to it in the future.
In such a situation, it would be challenging to operate a variety of functions using push buttons, and since the robot's screen is mostly used for showing results, we did not want to utilise it for operating the robot.
You have given your SUDO a personality with a face and voice. Is this for extra fun or is there a deeper meaning?
Like I've said before, we wanted to create SUDO to be a fully functional robot that we can interact with and that feels like something real. Because we didn't want it to be a lifeless computer program, giving it a voice and a face made it much more engaging and interesting.
Does it work with just the 9x9 puzzles?
Yes, for now it only works on 9x9 puzzles.
Is your build all 3D printed?
It’s not completely 3D printed. Actually, when we started building this robot, we did not have any 3D printer. So I took one of my old toys and used it as the body of the robot. Later we bought a 3D printer and printed a few parts to give it a better finishing.
Repurposing an old toy was a good alternative. If our readers want to build one for themselves, do you have the code and build details available?
Yes, you will find the codes and resources at https://github.com/Arijit1080/Speech-Controlled-Sudoku-Solving-Robot
What are you working on next?
After SUDO, I have built a voice controlled robotics spider, called SPY-DER.
Great stuff. We’ll be sure to publish your robotic spider in an upcoming issue. Thank you for speaking with us.
|1 x Raspberry Pi 4 B||XC9100||Z6302G||-|
|1 x Raspberry Pi Camera Module V2||XC9017||-||ADA3099|
|1 x Raspberry Pi 3.5" Touchscreen Module||-||-||-|
|1 x Small Speaker||AS3004||C0606||-|
|1 x 5mm LED*||ZD0240||Z0800||ADA299|
|1 x Raspberry Pi Power Supply||XC9122||-||-|
|1 x Mini USB Microphone||-||-||ADA3367|
|1 x Pack of Jumper Leads||WC6026||P1023||ADA266|
* Quantity required, may only be sold in packs.
The robot's body was constructed from one of my old toys. If you are handy with 3D printing, you could design an enclosure to suit, or use an enclosure from one of DIYODE’s advertising partners.
Next, mount the Raspberry Pi camera to the body/enclosure.
Connect jumper wires to the display.
Mount the speaker and LED, along with the wiring.
Plug the USB microphone into the Raspberry Pi and attach the speaker, camera, screen and LED. Note that the negative (short leg) on the LED needs to go to pin 39 (GND) on the RPi, and the positive (long leg) to pin 40 (GPIO21).
Finally, mount the Raspberry Pi and conceal the wiring in the base of the enclosure.
You will find the complete code with all the required libraries and resources at:
To program and configure the Raspberry Pi, follow these steps.
Using Raspberry Pi Imager, install the Raspberry Pi OS Buster (Legacy) on a memory card. To download visit:
Insert the memory card into the Raspberry Pi.
Utilise the official Raspberry Pi power supply to run the Pi.
Connect the Raspberry Pi to a big display via HDMI before beginning the setup process, or just use SSH.
Install the Raspberry Pi screen's necessary drivers (refer to your supplier’s user manual).
Enable the camera using “raspi-config”.
Set "Audio Output" to "3.5mm jack" using “raspi-config” or from settings.
To control the Raspberry Pi without the huge display, connect it to WiFi.
Test and set up the microphone. You must modify the "/home/pi/.asoundrc" file in this step.
Install all the required Python 3 libraries in the Raspberry Pi.
Finally, download the code with required resources and run the code which is also available on my GitHub site.
A few crucial code methods to review:
faceAnimation(): This function creates facial animations and displays them on the screen by utilising the PyGame package. Figures below show the "face1.png" and "face2.png" files that this function uses to create the animations.
focusGrid(): Identifies the largest contour in the image provided, assesses its shape, cleans it up using specific filters, and then returns it.
splitUp(): It divides the cleanest, largest contour into cells as input. returns the cell matrix at the end.
highlightDigit(): It eliminates the noisy regions from an input cell using connected component analysis, leaving us with simply the digits of the cell.
getDigits(): Uses optical character recognition to pull out numbers from the highlighted cells. Pytesseract has been utilised for the OCR in this function.
extractGrid(): This method extracts the grid of recognised digits from the supplied sudoku picture using the focusGrid(), splitUp(), highlightCells(), and getDigits() functions.
draw(): Draws the puzzle in the display with the help of pygame.
draw_box(): This method draws a red box around the cell the robot is currently working on when it is solving a sudoku problem.
draw_val(): Draws a value (digit) in a cell, while solving the puzzle.
show_puzzle(): Uses PyGame to display the puzzle on the screen. Internally, it makes use of the draw() method.
valid(): If a certain value is entered into a particular cell at each stage of the procedure, this function determines whether the current sudoku solution is accurate.
solve(): Solves the sudoku using recursion. This function uses valid(), draw(), draw_box() functions internally.
sudoku_solve(): This function controls the video streaming, records the pictures, and sends them to other processes so they may be used to solve the problem. It also outputs voice at different stages of the solution process.
main: Here, we used the speech recognition technology, configured global variables, and created a number of threads. According to various user instructions, this portion of the code will change the values of global variables, and based on those values, various operations will be performed.