Turn on and off a mains appliance using voice commands? Yes, you can do that.
You’ve probably seen the new Google Home® devices that are now available, but what you might not realise is that you can actually build something very similar with a Raspberry Pi (RPi). And not only that, it’s very easy to interface it to your own hardware, such as a remote controlled power point.
The Google Voice Kit is a popular item, but you can achieve the same results with a RPi and a few other parts. This project however, is more than just a DIY version of the Voice Kit. Building on September’s MQTT Light Switch project, I’ve built this project which can take voice commands and then transmit them via MQTT to an Arduino. In turn, this operates a remote controlled power point. While this sounds like an elaborate solution to a problem, the MQTT interface means that we can incorporate other features (such as the app control from the MQTT Light Switch) to expand the functionality even further. And of course, it’s just plain cool to be able to speak a command and have it happen!
THE BROAD OVERVIEW
Just like the MQTT Light Switch, there are three parts to this project; and not surprisingly, they are all fairly interchangeable with the different parts of that project. One device (the publisher in the MQTT system) issues commands, which are relayed by a broker to a subscriber, which acts upon the command. In this case, the RPi running the Google Assistant voice recognition software creates messages, based on spoken commands which are transferred via the broker software to an Arduino. This then sends a radio signal via a 433MHz transmitter to a wireless power point to control an appliance.
While this is just one use of the RPi with voice recognition software, the options are really interesting. When set up correctly, the Google Assistant has many different skills. It won’t be as capable as a Google Home® device due to hardware limitations, but it can respond to simple queries such as “Where’s the nearest pub?”. It can also be configured to do pretty much anything that can be done via a terminal command (and more), so the possibilities are well beyond what is mentioned in this article.
Of course, if you just want a really basic app-controlled power point, you can use an MQTT app for input to control a wireless power point, and it won’t even interfere with the normal remote control operation. For more information on MQTT, revisit the article.
HOW IT WORKS
As you might imagine, a very big part of this project is setting up and configuring the RPi to do the voice recognition. There is an SD card image, which is designed to work with the Google Voice HAT, but the hat appears to be hard to obtain. Fortunately, it’s not hard to replace the hardware with something equivalent; in this case, a USB sound card connected to a microphone and speakers, as well as an LED for troubleshooting. There’s some account setup (you’ll need a Google account to be able to access Google’s services), and finally some further configuring to get our specific commands working. There’s also an extra step to set up the “OK Google” hotword, if you wish to use it.
The MQTT system requires a broker, and this is probably the easiest part to install. It can actually be installed in just two commands on a RPi, and this is the most obvious place to install it (although you might want to wait until we’ve created our SD card image).
sudo apt-get update
sudo apt-get install mosquitto-clients
In addition to this, it is recommended to give the Pi a fixed IP address under your router, so that any apps can easily find it. Look under DHCP setting for DHCP address reservation options.
The other piece of our setup is an Arduino running a sketch that is similar to that in the MQTT Light Switch project; it looks for MQTT message packets, and then acts upon them. In this case, the Arduino is connected to a 433MHz transmitter and emulates the signal from a remote control, which operates a Jaycar MS-6148 Wireless Power Point. The code emulates the original product's 433MHz commands to the wireless power point, providing us with electronic-controlled mains power. The beauty of this hardware is that it takes wireless signal in and sends a different wireless signal out, and so it could be hidden just about anywhere.
For the voice recognition section of the build, I used the following parts:
|1 x Raspberry Pi 3||XC9000||Z6302B|
|1 x USB Sound Card||XC4953||D0290|
|1 x 3.5mm Microphone||AM4092||C0398|
|1 x USB Speakers||XC5191||D0806A|
|1 x 16GB Micro SD Card||XC4989||D0328|
Of course, you will need other common RPi accessories, such as a HDMI cable and monitor, keyboard, mouse and power supply. It’s also recommended to have some plug-socket jumper cables to emulate the push button on the Google Voice HAT hardware; and an LED wired in series with an appropriate resistor, to give feedback on the status of the voice recogniser program.
As noted above, the MQTT broker can run on the RPi, but there are versions available for other operating systems such as Windows and even OpenWRT. If you haven’t already got one set up, I would use the RPi option, as it will need to be running for this project to work.
The Arduino side of things doesn’t require much work. The prototyping shield is optional, but I found it meant that everything slotted together in a compact way, and could easily be pulled apart later for use in other projects. The alternative is to solder some plug-socket jumper leads onto the legs of the 433MHz Transmitter Module, and plug these into the headers on the WiFi Shield.
|1 x Arduino Leonardo Board||XC4430||-|
|1 x ESP8266 WiFi Shield||XC4614||-|
|1 x 433MHz Transmitter Module||ZW3100||Z6900|
|1 x Prototyping Shield (optional)||XC4482||Z6260|
|1 x Wireless Power Point||MS6148||-|
BUILDING THE CIRCUIT
As the WiFi Shield can plug directly onto the top of the main board, the only connections that need to be made are soldering some jumper leads to the 433MHz Transmitter Module, or alternatively, plug 433MHz Transmitter Module and the jumper leads into a breadboard.
|LEONARDO||433MHZ TRANMITTER MODULE|
The ANT (antenna) connection does not need to connect to anything, but a short length of insulated wire (about 17cm is ¼ wave at 433MHz) should be attached to improve signal transmission. For the Prototyping Shield version, first solder the 433MHz Transmitter Module to the Prototyping Shield.
Next, make the necessary connections between the shield and module. Note how the green antenna wire is threaded through the hole in the Prototyping Shield; this helps to stop it flexing and breaking off.
Alternatively, this is what it would look like if the connections are made directly to the WiFi Shield.
The shields are then attached to the top of the Leonardo to complete assembly of the Arduino side of the project.
The RPi doesn’t need much in the way of mechanical build. The USB Sound Card is plugged into the one of the RPi’s USB ports, and the microphone and speakers are plugged into their respective locations on the USB Sound Card. It will be assumed that you have a keyboard, mouse and monitor set up to use with the RPi. Once the build is complete, the keyboard, mouse and monitor can be disconnected and the entire project can be run headless.
I found it easiest to solder the LED and resistor (I used a blue LED in series with a 100Ω resistor) to the plug ends of the plug-socket jumper leads; and instead of a switch, I left the ends exposed, so I could briefly touch them together to emulate a button push.
CODE AND SETUP
The code for the Arduino is similar to that used in the MQTT Light Switch project, and setup is similar. The following lines will need to be configured to match your WiFi network and MQTT broker.
//spec WIFI network and MQTT server
#define SSIDNAME "SSIDNAME"
#define SSIDPWD "SSIDPASSWORD"
#define MQTTBROKER "BROKER IP ADDRESS"
#define MQTTPORT "1883"
If you haven’t set up your broker yet, the easiest way is to use the RPi - just enter your RPi’s IP address for the broker. A good idea is to get into your router settings and give it a reserved address under DHCP. This also makes it easier if you need to SSH in, to tweak a setting while it’s running headless. The MQTTPORT shouldn’t need to change as 1883 is the default for most MQTT implementations.
You might also want to change the below if you want to use different topic names and messages. The default topic names are the strings in the first five “if” statements and the messages are “on” and “off”. These correspond to the commands that are run on the RPi, so both need to be changed to suit your specific customisation.
The Arduino uses an arbitrary code which is set in this line.
const unsigned long address=0x12340;
//this could be any 20bit value (not all tested)
This code will probably not be the same one as the remote you have, so the Arduino will have to be programmed as a second remote (these units can be paired with up to three remotes). After this, the sketch can be compiled and uploaded.
Next, ensure that the wireless power point operates correctly with the remote control. The wireless power point is in pairing mode for the first 30 seconds after it is turned on, so I found it easiest to plug the wireless power point into a switched power point and toggle the power to enter pairing mode. Pressing an “on” button during pairing mode will cause that button to be paired - you can tell that it works as the wireless power point will turn on at that time.
To set up the wireless power point, it is recommended to install an MQTT app or other form of client to be able to issue commands to manually trigger the Arduino. If you are using the MQTT Dashboard app, you will need to set up a broker and then “switch” controls on the “publish” tab.
Create four more switch objects corresponding to the topic names and messages in the Arduino code. If you have the mosquitto clients installed on a PC or Pi, you can run the following commands (substituting the IP address of your broker for 192.168.0.223):
mosquitto_pub -h 192.168.0.223 -t lounge/lamp -m on
mosquitto_pub -h 192.168.0.223 -t lounge/lamp -m off
Replace “lounge/lamp” with the other strings from the Arduino code to emulate the other buttons. Monitoring the Arduino debug on the serial monitor can also assist in checking whether everything is functioning.
Once the Arduino side of things is working, we can move on to the RPi. The first step is to download the image from https://dl.google.com/dl/aiyprojects/voice/aiyprojects-latest.img.xz, and write the image to a 16GB or greater card using a program like Etcher or WinDiskImageWriter.
Boot up the RPi with the new card installed, and connect to a WiFi network with internet access.
Click on the icon marked “Start dev terminal”, and run:
sudo leafpad /etc/asound.conf
Replace the contents of asound.conf with the following, then save.
This changes the software to use the USB sound card instead of the non-existent Voice HAT, and also allows the hotword trigger to be used; it is documented in this thread: https://www.raspberrypi.org/forums/viewtopic.php?t=183932&p=1167683.
Reboot the RPi and then double-click the “Check audio” and “Check WiFi” buttons to make sure everything is working so far.
If you haven’t got a Google account, now is a good time to get one set up. The next step is to set up our Google account to allow the RPi to access the Google Assistant API. There is some information about this, on this website: https://aiyprojects.withgoogle.com/voice#users-guide-1-1--connect-to-google-cloud-platform
I found that these instructions didn’t match the web pages I was viewing, so I did the following:
Open https://console.cloud.google.com/ on the RPi’s browser (there’s a link to it on the bookmarks bar), and make sure you’re signed into your Google account. Then create a new project.
Open the project, and go to APIs and click “Services>Dashboard”. Click on “Enable APIs and Services”. Type “assistant” in the search box and click “Google Assistant API”.
Click “Enable” and click “Get Credentials” under “Add credentials to your project”.
Select “Other UI” from “Where will you be calling the API from” and “User data” from “What data will you be accessing”. Then click “What credentials do I need”.
Click “Create client” and enter an arbitrary product name (I used “pivoice”), then click “Continue” and download credentials. The downloaded file “client_id.json” in /home/pi/downloads should be renamed “assistant.json” and should be moved to the /home/pi folder.
Click on “Start dev terminal” and run “src/main.py”, which will start the voice recogniser. A browser window will open so that you can provide authentication, so click “Allow”, and close the browser window once this has completed.
By default, the voice recogniser is set to trigger from the switch on the Voice HAT, so make sure that src/main.py is still running in the terminal window, and trigger it by touching the GPIO23 wire to the GND wire. The terminal should indicate that it is listening, so speak into the microphone. “What day is it?” is something that Google should be able to handle. You can stop the voice recogniser by pressing ctrl-C.
If you get a message like “Actually, there’s some basic settings that need your permission first”, open Google account settings in a browser, and click through “Personal Info and Privacy” to “Manage your Google activity”. Go to “Activity controls” and turn on “Voice and Audio Activity” and “Device information”.
By default, hotword activation (“OK Google”) is not installed. The process for adding this is documented on the RPi forum at https://www.raspberrypi.org/forums/viewtopic.php?f=114&t=183932#p1164380
It is recommended that this be viewed on a browser on the RPi, so that all the code can be copied and pasted. Effectively, we add a hotword.py file to perform this function, and modify main.py to be able to use hotword.py.
Once the new code is set up, test it using:
src/main.py –T hotword
This is where it helps to have the LED, as you will see it turn solid on when it has received the trigger correctly. To set the hotword as default trigger, change the ini file and run the following:
#trigger = clap
trigger = hotword
Test this by running src/main.py, and saying “OK Google”. There will be a delay of a few seconds before the phrase is recognised, and this will be seen on the LED and output on the terminal. If this all works as expected, then our setup is practically complete.
By default, the voice recogniser is not set to auto-start, as it would error with the credentials not set up. To start the service, run the following:
sudo systemctl start voice-recognizer
To enable auto start, run:
sudo systemctl enable voice-recognizer
If you make changes to the code (which we will be doing shortly, to add our commands to interface the MQTT broker), you can also run:
sudo systemctl stop voice-recognizer
Do this before making changes, and then run the above start command afterwards, to restart the service.
To add manual commands, we need a trigger sentence, and a command to execute. It’s a good idea to test the trigger sentence while you have the voice recogniser running manually, as you can see what it thinks is being said. To add a set of commands to match what our Arduino is looking for, edit the “action.py” file leafpad/home/pi/voice-recognizer-raspi/src/action.py and place the new commands in this section, just before the “return actor” line.
# Makers! Add your own voice commands here.
The code we have inserted into this space is shown below.
actor.add_keyword(_(‘lounge light on’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t lounge/lamp -m on && echo ok",_("fail")))
actor.add_keyword(_(‘lounge light off’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t lounge/lamp -m off && echo ok",_("fail")))
actor.add_keyword(_(‘bedroom light on’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t bedroom/lamp -m on && echo ok",_("fail")))
actor.add_keyword(_(‘bedroom light off’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t bedroom/lamp -m off && echo ok",_("fail")))
actor.add_keyword(_(‘spare light on’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t spare/lamp -m on && echo ok",_("fail")))
actor.add_keyword(_(‘spare light off’),SpeakShellCommandOutput(say,"mosquitto_pub -h 192.168.0.223 -t spare/lamp -m off && echo ok",_("fail")))
Looking at the first line, the “lounge lights on” is the phrase that the recogniser is looking for, and the part in double quotes is the command that is run on the terminal. This actually runs two commands: the first part “mosquitto_pub…” is the MQTT message, and the second part “echo ok” is simply text printed out to the terminal; but because printed text gets read out by the RPi, it can also be used for audio feedback. The IP address, topic names and messages should of course, be changed to match your setup.
At this point, you should be able to restart the voice recogniser service and give voice commands to your power points.
WHERE TO FROM HERE?
Once the Google Assistant API is up and running, it can do much more that just control the lights. It has many of the features of a Google Home device, so you can try asking it things like “How’s the weather?” and “Where’s the nearest restaurant?”. But that's not really what you set this up on a Raspberry Pi to do is it?
The src/main.py and src/action.py files are well commented with tips about how to add actions directly to the Python code, so if you are comfortable with Python, this is another place to add functionality.
It's worth noting too, that we have integrated MQTT here, and are using Raspberry Pi as well as an Arduino system to provide all the functionality. By having MQTT in the middle, we can effectively control the system using our familiar MQTT app and process. This is valuable in many ways, but also creates the requirement for additional hardware which, perhaps you don't want to use.
If you aren’t interested in hooking up an MQTT broker and an Arduino, then you probably want to be able to control the GPIO on the RPi directly. This process would be faster, and require less hardware. Of course, there's no redundancy with a backup / remote app, unless you create that directly on the Raspberry Pi also.
To use the Raspberry Pi GPIO and instantly interface into the real world, you'll need to make a few changes to the action.py file. If you're solely an Arduino fan and aren't too familiar with Python, it's not too difficult to see the differences in convention and figure it out. As for any other alterations, remember to stop the voice recogniser before making changes.
Add the following line to rest of the imports at the start of the file:
import RPi.GPIO as GPIO
Just above the section marked “Makers! Implement your own actions here”, put this code:
self.say = say
def run(self, voice_command):
self.say = say
def run(self, voice_command):
And finally, inside the “make actor” function (with the other lines we added earlier), add these lines:
Then restart the voice recogniser, and use the keywords “on” and “off” to turn GPIO 24 on and off. It’s a bit more involved, but it is also possible to set pins as inputs and read back their state.
How Accurate Is A Voice Command?
One of the cool things about modern voice command technology is that it leverages deep learning and continuously improves accuracy thanks to machine learning.
Traditionally, software such as Dragon Naturally Speaking, which aimed to get us off the keyboards and on to the microphones, used self-contained voice data. It's efficient and effective, and they really carved out a niche (though arguably, was aimed at users before the large majority of us spent a day in front of the keyboard and many of us could type better than we could write with a pen).
Where this type of voice pattern recognition falls down however, is its ability to detect voice patterns when you change users. If you use something like Dragon Naturally Speaking daily, it will become increasingly accurate to your own voice. If you then hand the microphone to a colleague however, it will instantly lose substantial accuracy.
Of course, even the Google Assistant, Apple's Siri, and others, can be tripped up rapidly with a unique accent. But they learn, and learn fast. Once it's learnt, everyone benefits from the development.
Soon, these assistants will have better conversational artificial intelligence, and it will be able to decipher implied speech, sarcasm, and other nuances which usually fall short. The way we interact with these systems will become more natural, relaxed, and conversational. Even those of us who use them daily, tend to talk to them like robots.
Hey Siri! CALL-JOHN-SMITH. "Calling Jack Quiff Mobile"...
We've all been there... but this is just the beginning of what's to come. Soon, we'll be able to have a conversation with a virtual assistant as casually as we would with a mate in a bar.
Background noise, slurred speech, random queries, will all be seamlessly handled by the virtual assistant, with the accuracy (or probably better) of a human. That's exciting, and it's right around the corner.