Our great new series on Machine Learning is here! But before we dive in and swim around in some code, let's make sure we get the fundamentals.
WAIT A MINUTE! I have a thorough understanding with Machine Learning and experience using it to solve problems. This is designed as a practical, easy to follow series, particularly aimed at the maker community, with relevant and practical outcomes. There are many people out there with far greater knowledge on this topic than I have. For anyone who feels I have misrepresented a concept here, I welcome discussion - feel free to reach out. It will only serve to provide a greater resource long term and allow more makers to tackle ML.
WHY PART ZERO?
OK, so at the last minute, I decided to call this part zero instead of part one. Mainly because it’s a precursor to getting our hands dirty and doing the fun stuff with Machine Learning!
It doesn’t really matter, ultimately, but there’s a little coding humour in there since we start at zero. It also made sense, since it feels like this is precursor knowledge before we commence experimenting and coding with Machine Learning.
PROBLEM SOLVING WITH MACHINE LEARNING
Welcome to our brand new series, aimed at getting you started with Artificial Intelligence, and how it can be applied to your next project.
OK, that is the one and only time I’ll refer to this series as being on Artificial Intelligence. While Machine learning (ML) and artificial intelligence (AI) are commonly interchanged, they are not the same thing. Machine Learning is really a subset of Artificial Intelligence. When promoting these concepts to those who are unfamiliar, Artificial Intelligence is usually the phrase being used. Probably because it sounds fancier, can command a higher price tag, and is likely because general public understands it more readily.
Think about this with the following scenario. Someone asks you how a project you’ve developed works and you reply “Machine Learning Algorithms”, you might get a few blank stares. If you replied with “Artificial Intelligence”, virtually anyone will understand what you mean, even if they don’t TRULY understand it. Using AI when you mean ML is a misnomer, however, they’re related to a degree. Just as “the cloud” is the phrase used to cover just about anything which involves data centres, software as a service, and much more.
WHAT IS ARTIFICIAL INTELLIGENCE?
The aim of artificial intelligence is to make decisions based on available data. However, what Artificial Intelligence truly means is something of a debate. This is largely because we can’t easily define “what is intelligence”.
However, the most agreed metric of whether something is truly Artificial Intelligence or not, is whether or not it can think like a human, and indeed learn like a human.
You might think this is fairly easy, but while a computer might have access to the entire internet, humans have a unique way of shaping our view of the world and information. Just because a computer has access to billions of pieces of data in a fraction of a second, a traditional programme doesn’t know how to truly make sense it.
Indeed, two different humans will make sense of data differently too. There are thousands of factors that influence essentially every decision we make. Some environmental, some personal, it’s a complex thing to comprehend.
Ever wondered why Aussies often like vegemite, but Americans usually think it’s strange? Or why a person might stop to lend assistance to a stranger even while putting themselves at risk? Or why some people will cross the road when they see a large dog approach, while others will want to pat it?
Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.
Note: there's actually some debate about whether or not this quote is from Einstein. But until we figure out otherwise, he can have the credit.
Human decisions are made using data, but not data alone. Every decision we make is filtered and influenced by knowledge, experience, skills, and less measurable factors such as “gut instinct” (though even this is arguably driven by data).
This is where an Artificial Intelligence attempts to cross the bridge between computers and humans. Rather than specifically coded algorithms to filter data through a series of predetermined criteria, the objectives are defined, and the AI creates the algorithms.
Just like a baby learning to walk isn’t given step by step instructions, AI is given an environment and a goal, and through testing and measuring outcomes, it finds the path to a solution. More than that, it can find multiple paths, then select the most optimal one.
If you have ever watched a baby learn to stand or walk, you will recognise this pattern of test and analyse. They’ll try and fail many times, gradually improving over time. This constant test and improve iteration is also how AI systems develop too. However, while it might take a child months to learn to walk, Artificial Intelligence can learn solutions by simulating and testing different methods millions of times in no time at all.
While it is debatable whether or not it’s true AI, this deep learning has been used to create design iterations, optimise code, and so much more. It can simulate the efficiency of a million different production line layouts, or test the code runtime of a trillion different tweaks. These are all things which would take lifetimes to do as humans.
This sector of AI is less about behaving like a human, and more geared towards optimisation and solving problems we don’t even know are problems yet. In this instance, the rapid test/revision process can do what a human can do in a tiny fraction of the time.
NEED A NON-TECHNICAL EXAMPLE?
There’s a lot of concepts here. While I don’t really like to include a TV-series reference in an article like this, there is a powerful representation of this in a Netflix series called Black Mirror. In particular, there’s an episode named "Hang the DJ", part of the fourth season of this anthology. Without giving too much away, this episode essentially represents how Artificial Intelligence creates these iterative examples. It also highlights how "failed" tests are still valuable data.
While I doubt the writers of this episode had education in mind when they created it, this episode, in particular, has a special knack for doing so. I recommend you watch the episode, all the way to the end, and I’m certain you’ll experience a solid “ah ha!” moment right at the end. Indeed, if you enjoy this creative twist on real world technology, you’ll likely enjoy more in the series. While there are subtle links and ties throughout some episodes, you don’t have to start at the beginning and won’t be disadvantaged for watching "Hang the DJ" first.
WHAT IS MACHINE LEARNING?
Machine Learning (ML) uses data to analyse data. The more data you have, the more precise the results will become (as long as you’re asking the right questions).
What’s different about developing with Machine Learning compared to regular code is that you identify the source data, and the outcomes you want, and let Machine Learning figure out how to do so.
We use what’s called supervised learning. This means that your data set is like a traditional textbook. You have lots of questions, but you also have the answers.
WHAT IS DEEP LEARNING
Deep Learning is a subset of Machine Learning. The main difference is that it uses unstructured data, and unsupervised learning.
While Machine Learning (and most of what we’re going to do) relies on good data structure and testing, Deep Learning takes things one step further, allowing the machine to find relevance in the data itself.
With Deep Learning, the system doesn’t rely on the programmer to provide feature extraction and classification. Instead, it will look for correlations and relationships in the data itself, in order to drive the desired classifications and outcomes.
We will move on to Deep Learning later in this series, once we’ve established the fundamentals of Machine Learning and built a few working models.
CONCEPTS IN MACHINE LEARNING
Machine Learning still relies heavily on good program design. You can have all the data in the world, but unless you know how to use it, it’s relatively pointless.
It’s important to realise that there’s no "magic" in machine learning. While this first installment is something of a primer, I didn’t want to leave you without something to think about. We’re going to start using these concepts in practice in the next installment, so let’s go over them with a few examples to ensure they can be understood.
While how you use the data is up to you, the process of Machine Learning is fairly consistent. It requires us to decide what aspects of our data are important, and build a model for using them.
In order to prime you for the coding coming next issue, we’ll go over some of the primary aspects and give a few examples to try and get those concepts well understood.
MACHINE LEARNING ALGORITHMS (CLASSIFIERS)
Machine Learning isn’t "magic" in its own right. There are still algorithms and code behind the outputs, but they can be infinitely more complex (often using deep mathematical processing).
In order to demonstrate one of the most common Machine Learning algorithms, we’re going to use a dummy set of data for classifying fruit. This is just to get us used to thinking the right way about Machine Learning and the way we need to approach it.
A simple yet powerful algorithm used in Machine Learning is k-nearest neighbors (k-NN). And yes, that’s American English spelling of neighbour as are most coding terms. k-NN is not new to machine learning, but has been used for statistical estimation and pattern recognition technologies since the 1970s.
k-NN essentially assumes that similar things are found in similar areas. After all, fish swim in the sea, and humans tend to live on land, as a basic example.
Using very simple data, k-NN is fairly easy to grasp, because it uses distance between plotted objects, and our minds are quite good at spacial awareness. So this is a great way to look at how the algorithm can do its thing.
To explore this we’ll use our tiny dataset with three items of fruit in it. You'll notice that the last has no classifier. That is, we don't know what it is.
Now this is rather important. Based on this data alone, the system will classify the last fruit as either an Orange, or a Watermelon. There is no "magic". Even if it's an apple, the system doesn't know what an apple is yet, nor how to classify one. This is a key point of Machine Learning. Data drives outcomes. Without the data, there is nothing.
You’ll notice our third fruit has no classification. This is essentially our objective. What is the type of fruit for the last item?
Note: we’re using only three items here. Realistically for Machine Learning, this is not enough data to train a model on, but we’re balancing simplicity and practicality for comprehension purposes. The data sets will be much larger for actual Machine Learning work. For this example to work properly, the data would need to contain completed examples of fruit in order to increase accuracy.
Using a basic algorithm for trying to resolve this could prove quite difficult. If you started looking at weight, you might find yourself in trouble quickly. Take a look at our diagram for a 2D plot of the attributes weight and ripeness.
From this two-attribute example, it would be reasonable to draw a conclusion that our mystery fruit is closer to an Orange than it is a Watermelon, since it’s light but moderately ripe (since all fruits start out with no weight, this is a factor).
However, can we really draw this conclusion yet?
Here, we’ve visualised our two attributes, ripeness and weight. Our orange is ripe yet doesn’t weigh much. Our watermelon is heavy yet not very ripe, and our mystery item is somewhat ripe yet still lightweight. So how to we classify it? From the graph shown here, it fairly clearly appears to be closer to an Orange than a Watermelon.
What if we change how we actually view the graph by rotating it... now it appears our mystery object is closer. It is, quite literally, a matter of perspective.
So how do we deal with this? The answer is, we need more data. And this is what Machine Learning needs. The more data, the better.
The next graph shows just how important data volume can be for demonstrating proximity, and what makes algorithms such as k-NN so useful.
You can see from this image how three different objects seem to scatter in patterns even though we've plotted three different data points. This is a very simple dataset using three types of fruit with weights and scores for ripeness and colour. You can visually see, how easy it would be to classify a new fruit with those data points based on proximity to others.
You might think “but I can classify a fruit just by looking at it”. Sure, that’s probably correct. But what about a fruit you’ve never seen before? After all, you’ve seen thousands of apples, bananas, oranges, and other types of food in your life. You draw on that experience. If you don’t know what a food tastes like, it probably “tastes like chicken”. This is because you’re unable to classify it. The same outcome would happen with Machine Learning for data it’s not trained with.
We can visualise the basic operation of k-NN in one, two, or three dimensions, but once we add a fourth, or fifth, or thousandth dimension, it becomes far too complex for us to visualise. This is a natural outcome of our own human brain development which is more about making sure we don’t fall off a cliff, can tell how far away the lion is that’s trying to eat us, and other such natural encounters which our brains developed for. Machine Learning also solves issues like “what do we do if it’s equally between two points”. It also adds another layer of complexity by analysing the interplay between all other nodes too. This is beyond the limit of what our brains can do without methodically working through things.
We can also write our own algorithms to improve accuracy with more complex datasets also. We’ll tackle this at a point in the near future too.
Machine Learning relies on us, the programmer, to determine which features of your data are actually important. Sure, the model will “join the dots” for you, but this is about selecting which dots to join.
If you’re trying to determine someone’s hair colour, drawing correlations to the length of their feet or their favourite drink may not be useful. It’s still valid data because if you’re selling shoes, it would be important. This is why not all data will help a Machine Learning model delivery outcomes for you.
Indeed, if you include this data which, while accurate, may not support the desired outcome, you can actually reduce the performance of your Machine Learning Model.
As programmers, we may have already experienced this as a logical fallacy. What’s a logical fallacy? It’s a series of facts which are true and verifiable, yet draw an incorrect conclusion. For example:
- A Raspberry Pi 3 has WiFi.
- A Raspberry Pi 4 has WiFi.
- Therefore a Raspberry Pi 3 is a Raspberry Pi 4.
Usually, this example compares a dog to a table because both have four legs. But you get the idea.
Adding more data doesn’t always resolve the issue either.
- A Raspberry Pi 3 has WiFi and a microprocessor.
- A Raspberry Pi 4 has WiFi and a microprocessor.
- Therefore a Raspberry Pi 3 is a Raspberry Pi 4.
See how important selecting the right data features can be?
Represented as a JSON array, to put it into pseudocode for even better clarity, let’s do this:
var board1 = array(cpu = true, hdmi = true,
clockspeed = 1.4GHz)
var board2 = array(cpu = true, hdmi = true,
clockspeed = 1.5GHz)
if (board1[‘cpu’] == board2[‘cpu’])
print "Not Same!"
// prints Not Same!
We can still find ourselves in trouble. Yet if we ask the right questions, we can get better answers.
if (board1[‘clockspeed’] == board2[‘clockspeed’])
print "Not Same!"
// prints Not Same!
This is a very simple example, using general logic, of how data selection directly influences outcomes, without changing the algorithm. While we’re using exceptionally simple algorithms here, the same theory applies to anything. Poor attribute selection can lead to poor conclusions.
In the real world, if our example was comparing prices between Raspberry Pi boards, and only looked at whether or not it had HDMI outputs, it may draw the incorrect conclusion that they’re the same, when in fact they’re very different.
When it comes to machine learning, the data you EXCLUDE is just as important as the data you include. Like background noise on a poorly recorded audio track, it takes away from the important information, making it more difficult to get accurate results.
The very same method that draws correlations between data, could in fact, draw incorrect conclusions based on the noise. Purely by necessity, all Raspberry Pi boards contain a USB interface, a processor, copper tracks, GPIO pins... the list goes on. All of these pieces information are accurate data, however, depending on your question, they may assist a Machine Learning algorithm to actually deliver you the WRONG answer.
Training is the process of providing data to your Machine Learning Algorithm (Classifier). This is somewhat like the classroom education methods used to teach kids how to do something. We give them the questions, as well as the answers.
The classifier (algorithm) is the route to finding the answer. Like the “working out” portion of a maths problem. This is a key point here.
However, this process is still called Model Training, because it is the computer’s version of learning like kids in a classroom. It’s just the method that’s a little different.
As developers, we can tune and train the Model infinite times, to increase the accuracy of the outputs. How do we do that? Model Testing.
We’ll perform some Model Training in next month’s installment.
Naturally, before we deploy our amazing models to production, we need to see how accurate they are with data that doesn’t yet have an answer.
There are several ways to do this. The simplest way, and perhaps the most common for a maker with a limited dataset, is to split the data into train and test groups. Even though you have the answers for all of your training data, testing ignores the answers.
The model will then predict the answers on the test set, and then check how accurate it was. This is like taking an exam, and then marking the answers.
While ideally, you would test with data the model has never seen before, this isn’t always possible either. This is why the volume of good data will influence the accuracy of outcomes.
You can iteratively tune the model itself to drive up accuracy. Your application will determine how accurate you need to be, while your data will determine how accurate you can be.
We have oddly high expectations for Machine Learning outputs. Perhaps because early computers were binary and things were simple. While that’s still true for most computing, the questions we’re asking our computers these days require far more comparative thinking than simple binary allows.
For whatever reason, we expect 100% perfection from them (maybe with a Windows Reboot thrown in there every other week). In certain scenarios that’s fair. You won’t even expect an incorrect answer from your calculator, and you don’t want your phone to dial the wrong number despite you selecting the correct contact. However, those are drastically simpler tasks and require no interpretation.
Machine Learning accuracy is really driven by good data, but also good tuning and testing. That part is on us. There are machines out there doing this iterative process autonomously too, but that’s beyond the scope of Machine Learning for Makers.
WHERE TO FROM HERE?
Next month, we’ll set up our Machine Learning development environment on a Raspberry Pi, and start applying these principles with some sample data sets to get some OUTPUT!