Let’s explore image classification and train AI to tell between Raspberry Pi and Arduino!
In this instalment, we’re going to jump forward a few steps with our exploration of Machine Learning and do some work with image classification.
As the process for image recognition is rather complex, we will depart from our Sci-kit Learn journey for now, but we will come back to it soon. After all, we need to work through testing accuracy, train and test breakdowns, and so much more - we have only just skimmed the surface.
We will come back to it and explore some deeper functionality, such as writing our own classification algorithms, but I know you’re hungry for some tangible outcomes. So let’s get started with some image classification.
Now it is indeed possible to perform image classification with Scikit-learn, however, it’s difficult to not adopt something more suitable for the job. That is, Google’s Tensorflow.
Tensorflow is a Machine Learning platform developed by Google which is open source, which is fantastic.
With so much development on the platform, Tensorflow goes from strength to strength and sees improved performance consistently.
Tensorflow image classification leverages Deep Learning and Neural Networks to perform image analysis. We’ll explain more about this shortly.
HOW DOES IMAGE CLASSIFICATION WITH MACHINE LEARNING WORK?
While “regular” data-driven Machine Learning uses raw data values to make predictions, image processing is a more challenging problem. However, it’s not entirely dissimilar.
Image recognition using Machine Learning, at its core, simply converts the raw pixel data that makes up the images to use as comparison data.
Unlike regular table data, the complexity involved in decision making and classification is almost infinitely more complicated. Why? Because the volume of data is exceptionally high for each image.
Consider the mapping of a very simple graphic, a black and white letter “A” that we’ve created.
At its most fundamental, as a black and white bitmap image, we have two potential values for each pixel. A 1 (black) or a zero (white).
Note: These values may be inverted in reality. These values are purely demonstrative.
If you squint at the above array you can almost make out the letter A. Of course, there are ways to make it more legible.
We’re using a modern font, and when bitmaps were being used more regularly, a system-style font that would cater for simpler shapes would have been used. But you get the idea.
However, whatever the contents of the image are, we tend to have more detail and aren’t so binary. This is where the next-complex image style comes in. Greyscale.
Greyscale allows a scale of grey values between black and pure white. How many values will depend on the bit-depth of the greyscale image, but will always be more legible than the bitmap when representing shapes that aren’t optimised for the bitmap.
The biggest difference with greyscale is that each pixel requires a byte (or bytes) of data compared to a single bit (black or white, binary options).
As you can see, the greyscale image if much more recognisable even at only 8 x 8 pixels.
If we map each greyscale value to a simplified integer (0 to 10, with 0 being white and 10 being pure black) then we end up with a multidimensional array something like the code below.
While 0-10 isn’t efficient from a data storage perspective, it’s good for human comprehension.
That’s 64 “features”, which not only correlate to the others in the dataset, but also have internal relationships to other pixels in terms of their position in the array too.
Keep in mind, this is a VERY small image by modern standards. 8 x 8 pixels isn’t even enough for an app thumbnail on your phone.
But we’ve over-simplified things here. In reality, we have colour (three values for each pixel in RGB) between 0 (black) and 255 (white).
If we have a purple letter A stored as full colour, we would have a far more complex array such as:
That’s a whole whack of data, and all for just an 8 x 8 pixel image, barely enough to even make out what letter we have!!!
Now consider if we had a full HD image from a video, that’s a resolution of 1920 x 1080.
Think about that for a second... that’s 2,073,600 pixels. If you’re looking at a colour image that’s 6,220,800 different data points which are important, not only in raw value, but relationship to each other.
For a RGB image, that’s 6,220,800 data points that need consideration. That’s insane! But wait, we’re still not done...
The current iPhone model takes an image of approximately 4032 x 3024, or 12,192,768 pixels.
In RGB, that’s 36,578,304 data points. 36-million data points, for one single iPhone image!!! You can see how complicated this gets, quickly.
Now, I bet you thought we were getting complicated with just a handful of features in our regression examples in Part Two!
NEURAL NETWORKS AND DEEP LEARNING
When we start to think about the complexity of consideration required between pixels, even for an 8 x 8 pixel image, you start to get a rapid picture of the complex relationships.
This exceptional complexity of how each pixel relates to every other pixel in the image, and indeed how the red pixel correlates to each of the blue and green pixels around it, is why Neural Networks and Deep Learning excel at these particular problems.
TENSORFLOW EXPERIMENTS WITH TEACHABLE MACHINE
As we’ve touched on previously, Google’s Tensorflow is a library used to develop Machine Learning, including neural network that grows in performance daily (perhaps even by the second as there’s so much evolution and improvement daily).
Tensorflow is open source, and very very powerful. Tensorflow takes over where Scikit-learn leaves off, though you can actually use Tensorflow and Scikit-learn together too.
Before we dig too far into setup and configuration of Tensorflow, we’re going to run some fun examples with a freely accessible Google project called Teachable Machine.
Teachable Machine is a fun, browser-interfaced method to experiment with the sheer power of Machine Learning.
Interestingly, it can also help you train models to then deploy elsewhere. It’s like a GUI for AI! OK, so perhaps not, but you’ll quickly see the power contained within its code.
Now, it’s worth noting that this option has limited deployment ability. It allows you to train models, but you can’t create new models from scratch. That process is far more in-depth, and we’ll get there eventually. but it’s loads of fun anyway, and even with simple models, we can generate powerful results.
Using your favourite browser (preferably Chrome), head on over to https://teachablemachine.withgoogle.com/
Feel free to explore around, but what you want to do is train a new image classifier. You can jump to it here: https://teachablemachine.withgoogle.com/train/image
This is as simple as image classification training gets. We’re going to train our system to determine the difference between Raspberry Pi hardware, and Arduino hardware.
Remember with Scikit-learn, we had our features and labels? This process is no different. Except the “features” are the images we’re going to upload, and they all have one label.
With only two classifications available, it’s only really for fun at this stage. But you’ll start to grasp how this can form the basis of a component identification utility or similar.
CREATING YOUR OWN REFERENCE LIBRARY
This is actually somewhat difficult and time consuming to do, regardless of what you’re training or how you’re training it. It’s raw data that needs to be high quality, and it’s compounded mostly by the volume of images you require.
For everyday objects, such as determining if the image contains a person or a house, it’s likely that you could use a pre-trained model and never really have to train your own objects too much. But if you’re pushing hard on Machine Learning you’re likely to want more customised abilities than merely what is already trained.
So to help you all out, we’ve taken the liberty of photographing 600 different images of Raspberry Pi and Arduino hardware to use for this process. You can thank us later :D
You’ll find all the images in the Resources section for this article, categorised into folders for easy use. In order to increase the data quality we’re using to train, the boards are photographed on a variety of backgrounds and angles, to help Tensorflow train well. We could have also included the underside of the board, but stuck with angles for now.
TEACHING YOUR MACHINE
This process is very straightforward using Teachable Machine. Simply label your two classes and upload the image folder that corresponds to the class label.
Set Classes and Upload
We’ll label ours Raspberry Pi Hardware, and Arduino Hardware.
You’ll need the “raspberry_pi_general” and “arduino_general” folders of images respectively from the resources, or use your own if you prefer.
When we’re working with Tensorflow at a code-level, images need to be conditioned for use, but Teachable Machine will help us out here and make sure everything is sanitised appropriately.
It’s generally accepted that you need 100x images for a good classification, so we’ve provided that. In reality, it will work with much less, but we’re nothing if not thorough!
NOTE: If you’re uploading 100 images to each of your classes, that’s a reasonable amount of data. The Teachable Machine interface provides no feedback on the upload, so be patient.
Once you have uploaded your images, you should be greeted with something like this:
Train the Model
As prompted by the system, click the “Train Model” button to begin training.
It will take a little while, ours took approximately 30-seconds. Less time than it took to upload the images!
NOTE: You’ll need to keep the browser tab active while training is taking place. This is due to the resource-management implemented in Chrome and most browsers, to limit resource usage when the tab is not active.
You can experiment tuning the training parameters, but we’ll leave the defaults here.
Teachable Machine is configured to use 85% of your samples to train the model, while 15% are reserved for testing. This is partly why it’s so important to have a good number of images to train with.
Providing you have no errors, you should see “Model Trained” and be ready to test it!
Now, you can preview classifications using images!
Grab your camera and a handy Raspberry Pi or Arduino board, and take a snap. Alternatively, pull up the website of your favourite retailer and borrow one of their images (we’re confident they won’t mind). Upload it to the system and it will provide you with a classification!
You’ll see that in our example, we’ve taken a photo of a Jaycar Uno+WiFi board which it’s detected as an Arduino board with 100% certainty! Amazing!
Alternatively, if you have a webcam available, you can switch on the webcam and watch the classification in real time.
EXPLAINING THE RANDOMNESS
It’s important to remember that this particular AI only knows two things in its universe. Arduino, and Raspberry Pi. It doesn’t know about Pycom boards, resistors, humans, or anything else. In terms of its ability to classify those things, it doesn’t exist.
So, when a picture of anything but a Raspberry Pi or Arduino is presented (or you use the webcam and put your own face in it), you’ll find some highly variable results.
I’m happy to say that I am currently an Arduino, with 65% confidence! But that might change tomorrow, depending on the shirt I wear.
Naturally, the machine is trying to determine “which one are you” and “none” is not an option.
The same thing will happen if you’re training the model to classify between red and blue marbles. If you feed it a purple one, it’s going to struggle to provide you with a quality classification.
You could feasibly add 100-images of random unrelated content to a new class called “no classification” to provide your neural network with another option to classify with.
The specifics of your own application will determine whether or not this would be useful. You can always handle the results and reject a confidence of less than 80%, or whatever you determine is useful.
You can now even export the trained model for use with Tensorflow.js, Tensorflow, or Tensorflow Lite.
We’ll actually be using Tensorflow.js next time, so if you’d like to export your model for next time do so (we’ll also provide you with ours too, so if you’re using our images there may not be a need).
I know what you’re thinking... “but it’s easy to tell the difference between an Arduino board and a Raspberry Pi”. Well yes, in many ways it is. While this is purely an example, we did make it fairly easy.
So, let’s make it a tougher challenge! Let’s create a Raspberry Pi Type Classifier.
Most importantly, a classifier for classification of 3B and 4 boards. You can see how this would be useful in the real world. You could use the fundamentals of what we’ll do here to classify ripe and unripe fruit, between two similar objects, and so much more. So, let’s go!
In the digital resources, you’ll find two folders of images. raspberry_pi3b and raspberry_pi4, which you’ll need for this next example.
We’ll run through the same process as before, using our images. We’ve also shortened the labels for each class to just RPi 3B and RPI 4.
Click the "Train Model" button and wait until it's finished.
With your model trained, you’re ready to upload a new image of a Raspberry Pi! We ducked over to Core Electronics’ website for an image of the RPi4 and uploaded it.
Success! 97% confidence that it’s a Raspberry Pi 4!
And indeed, I switch over to the webcam and hold up a Model 3B to the camera, and it’s 100% confident it’s a Raspberry Pi 3, which it is.
INCREASING CLASS COUNT
Now that we’ve tested Tensorflow’s ability to recognise different Raspberry Pi variations, we’re going to add Arduino boards back into the mix.
We’ll create four different classes. Arduino UNO, Arduino MEGA, Raspberry Pi 3B, and Raspberry Pi 4. Each has 100 images in the resources under an appropriate folder name. Upload the images and train the model.
Now grab your favourite board and hold it up to the camera (or upload a photo). You’ll notice here that in certain angles and lighting, the board recognition is just as accurate before, depending on the angles.
But you’ll now notice that sometimes our complexity creates classifications that are now rather incorrect under some circumstances, or that the model really just can’t figure it out.
Failures is perhaps a little harsh, however, I wanted to highlight where the classifier struggles and how our training data affects the outcomes.
Take a look at the two classifications below. I’m holding a RPi3B to the camera.
In the first image, the prediction is correct. In the second, however, when I rotate the board 180°, the classification switches to a Raspberry Pi 4. As many of you will be aware, the ports were essentially reversed on the RPI4 compared to previous models.
Without anything else to base it on, it’s a logical classification for our model to make. This is where the volume and quality of images used to train, and the quality of the image being classified come into play. If you don’t provide enough data to the model because of a particular angle, or the image is blurry, then your results will start to degrade.
While you’ve probably realised this fact, it’s worth noting that 100 images that don’t demonstrate different angles and perspectives, don’t really add value compared to having only a small number of quality varied images.
CLASSIFICATION STACKING vs COMPLEXITY
It is perhaps worthwhile when developing classification models, whether or not you should do it in one classification, or stack them.
As you can see in our misclassification below, we classify an Arduino UNO as a Raspberry Pi 3B. However, we struggled to make the same error when all it had to do was classify “Raspberry Pi or Arduino”.
If we stacked various pipelines, we could already have determined Raspberry Pi or Arduino, and limited the possibilities. The second stage only has to determine the particular Arduino board model, or whatever the case may be.
The same approach can be said for classifying humans. If you’re trying to classify hair colour, you may find it useful to first determine that it’s a human and not an animal.
Naturally, this takes substantial data stacks, careful training and pathways. However, it’s essentially like using the decisionTree classifier we have previously experimented with, in addition to the power of an image classifier.
Thoughts to ponder... we’ll work our way through some of that next time. But while you're pondering AI, consider this... there's about 500 of our 600 images we used displayed below as a grid. When we consider how we need to carefully plan our Machine Learning to derive desired outcomes, take a second to recognise the absolutely insane amount of data we can process with Machine Learning now.