What The Tech

Google Voice Assistant With Raspberry Pi

Rob Bell

Issue 4, October 2017

As a kid growing up, I was fascinated by the potential for voice control. In fact, even VOX (voice operated relay) CB radios had me intrigued. Such a simple thing captivated my imagination from a young age.

In the 90s I was fortunate enough to get to experiment with what was rather advanced software for its time. Not only could it translate speech to text (which seemed amazing in itself), but it could open and close applications, and control the operating system in some amazing ways. Of course, back in the day, it was shareware, probably on CD (I think we’d progressed past floppy drives, but not as far as CD burners). Out of curiosity I recently ran a Google search for INCUBE, and it returned a few parts of the internet that needed dust blown off them! But I digress...

Back in those early days of the internet, multimedia was an emerging buzzword. But there were a few serious limitations: RAM was counted in the kilobytes, soundcards had limited quality, computers had clock speeds in megahertz - the technology was really letting the concept down. It was ahead of its time, and for many years voice technology stayed on the fringe.

Now, we know how clear and accurate voice commands can be. Sure, we could create a feature film around what Siri doesn’t understand, but the reality is that this technology works exceptionally well. Whether it’s a Google, Apple, or Amazon product, voice commanded technology is improving every day. With the inclusion of machine learning, we’re going to continue to see improvements at a rapid pace, while the voice command blooper reel will continue to shrink.


Google has long provided API access to some of their greatest technologies, which is fantastic. They develop ideas with billions of dollars of backing, that we could rarely dream of achieving on our own; rather than making the system proprietary and hiding it behind an expensive corporate fee.

Sure, just as INCUBE did in the 90s, it is indeed possible to create a useful, voice recognition system that’s self-contained. Though if I recall correctly, INCUBE required configuration of voice files to get things running, so it could learn your tone of voice. I don’t think it required individual voice commands to be recorded for every single action, but there was a reasonable level of setup (it was cutting-edge at the time though, let’s not forget that).

This is where the Google Assistant SDK comes in. You can leverage it on just about any piece of hardware capable of running Python with fantastic speed and accuracy.


Where hardware-interfacing computers gain a huge advantage, it’s clearly the GPIO. Raspberry Pi is a logical choice for a Google assistant project, but in reality you can use any board you like which can run Python. As long as you have WiFi access to your pi, you could create a voice-commanded robot to follow your every command.

If you’re really game though, all you need is internet access, so it is possible to use a 3G module or similar if you need access outside of a WiFi network. The data demands are reasonably low so you’re not going to blow through your data any faster than a few cat videos and LOLs will!

But let’s just consider what ramifications this has for hardware. You can effectively voice control ANYTHING. Lights on? Sure. Door open? Definitely. Turn on the TV? You bet. Of course, the applications need to be ones that aren’t critical or where safeguards need to be employed. For example, if the voice assistant mishears you and suddenly your robot is running away forever, it’s probably not running so fast that you can’t catch it. There are loads of creative ways to use it, where failure of the system won’t have any catastrophic implications.


We have our first project this month using the Google Assistant. It integrates MQTT protocol, so you can remotely control anything you could action with the MQTT system. However next month we’ll look at some more simple voice-command functionality that you can get started with, in order to play around with the functionality. Even from an experimentation process, simple functionality such as turning an LED on or off, playing music via the raspberry pi’s audio output, and many other simple projects take on a whole new life when voice commands are added.


Those of you familiar with the Raspberry Pi will know it doesn’t have a hardware-level native microphone input. This doesn’t present much of a problem however. USB sound cards are an easy go-to option. You can also use a USB webcam you might have lying around, which often has an audio input. Google Assistant won’t access the camera portion, but it provides a USB audio input for your Raspberry Pi - problem solved.


You know what I would really like to do? Create two virtual assistants and initiate a conversation between them. Even if we have a little code to help them out with particular questions and answers.

Why? I think a more appropriate answer is “why not?”

While it’s unlikely to yield anything substantial (it’s a computer, remember), it’s fun to hear two machines talking to each other, even if for a brief period and about nothing in particular.

What would be more interesting to investigate however, is creating a set of environment sensors and conditions which create a platform of discussion for these two sensorless devices. Providing them with real world input about the current temperature and humidity, could we then spark a conversation about the weather?

Of course, we can add real questions and answers, and follow-up questions as part of the Google SDK; coding a false-intelligence into them is achieved rather easily. But this isn’t really the point, and represents no artificial intelligence outside of the precise speech patterns themselves.


As the technology improves and continues to evolve, it’s likely that we’ll see more comfort and social acceptance around voice-commanding computers. Even though it’s more commonplace now, talking to your phone when there’s not an actual human on the other side of the conversation can still feel a little awkward when there are others around. Saying “hi mate, how are you?” is still far more socially acceptable than “OK Google, what’s the weather like where I’m going?”.

This paradigm of machine-response to speech is becoming ubiquitous to machine interactions however. So commonplace, that within a few years you can probably “tell” the parking meter that you need to pay for parking, and it’ll know who you are from your speech patterns.

There’s no telling when the general social attitude to talking aloud to your devices will be come as universally accepted as talking to another human via your phone, but I’d suggest that we’re closer than you think. Until then though, go ahead and create a raspberry pi virtual assistant, and have a play with the system. It’s easier than you think, and loads of fun!