This article includes additional downloadable resources.
Please log in to access.

With all these IoT devices out there... something’s going to crash.

A somewhat inevitable part of computing and complex electronic hardware, is the fact that now and again, things just don’t work according to plan.

The idea for this particular project came while driving out of a carpark. I was in my car which has an integrated GPS, reversing camera, and stereo, as is fairly common now. But after using the reversing camera, the system, for whatever reason, failed to return back to “stereo mode” once I resumed normal driving. After a few minutes, the system did reboot itself, and eventually brought up my stereo controls.

It’s not the first time I’ve had this problem, but it happens very infrequently. Certainly not a regular occurrence where I would take the car to the dealer to have it looked at. But it did get me thinking. What happens when our IoT devices hang in the field? It’s not always a bug in the code, or a hardware fault, or really anything explainable which we would encounter during development. Sure, poorly written code can create hanging hardware and all sorts of problems, but sometimes these things just happen.

We’ve seen it with web server implementations and other things that should be accessible long-term. Sometimes, for whatever reason, they simply stop responding. In the case of non-IoT (or integrated data-push scenarios), the updates simply stop coming. If we’re physically in the proximity of the device that has hung, it’s usually just a case of rebooting the hardware, and everything springs back to life. We wonder for a moment what went wrong, but usually just get on with life until it happens again.

The problem becomes infinitely more complex, however, when we’re not in the proximity of the device that has stopped responding. With an IoT device, we can literally be anywhere in the world. This inevitable “reset” action then becomes something of a challenge.

There had to be a simple way to ensure our hardware was alive and kicking. Absolutely there is. And it’s the Arduino Keepalive.

THE BROAD OVERVIEW

The concept of keepalive is fairly common in networking technologies, to ensure all devices know the connection is, and will be kept active for a certain time. However our concept varies from this to a degree, and really just shares a name. The diagram below helps convey the logic flow for how we’re implementing it into our setup.

On a TCP network, we can use a “ping” to try and obtain a response from given devices. But many of our little DIY gadgets don’t have network connectivity, so you still have the problem of trying to take action when you don’t receive a response.

This rather simple system leverages an ATtiny85, with hardware-level digital communication between the ATtiny and the device we’re trying to keep running. The hardware requirements for our code are exceptionally minimal. You can get away with barely no additional electronics (other than the ATtiny) once you have programmed the ATtiny to do the job.

The ATtiny can run from the 5V power available to your Arduino, and only three connections are then required between the keepalive and the device being monitored: one to send the request, one to receive, and a connection to the reset pin.

HOW DOES IT WORK?

For the purposes of clarity, we’ll refer to our ATtiny85 as the “keepalive”, and the IoT device (in our example it’s an Arduino Uno), as the “DUT”, which stands for “device under test”.

Every minute or so, the keepalive sends a digital high to an interrupt on the DUT. The DUT should then send back a reply via a digital high on a second pin.

The keepalive checks for a response, and uses some logic around what to do next. As you can see in the next schematic, we listen for a response on the keepalive. Assuming it’s received, the keepalive goes back to idle for another minute or so.

While unlikely, it’s possible that the reponse either doesn’t come, or is missed by the keepalive. If a response isn’t received, a second chance is given. So we send another digital high from the keepalive, and check for a response.

If we finally receive the response, our keepalive goes back to idle for another minute. If we still don’t detect a response, we assume the DUT has hung, and the software is no longer running. We then pull the reset pin on the DUT low, which reboots the DUT. Assuming there is no hardware malfunction, the DUT should return to full operation, performing whatever tasks have been programmed.

One caveat to this configuration is that if you have a miscoded DUT, which isn’t appropriately sending the responses to the keepalive, it will be perpetually restarting.

It means we need code on both sides (i.e. the keepalive, and the DUT) to work properly.

HARDWARE OPTIONS

It’s entirely possible to use two identical Arduino boards, or pretty much anything, for the keepalive.

We have used an inexpensive ATtiny85 eight-pin microcontroller IC as the keepalive which facilitates the restart, although you could use another Arduino as we did during development. Certainly you could use this idea to monitor a Raspberry Pi etc too. Just be aware of any logic voltage incompatibilities (i.e, 3.3V versus 5V). While you can level-convert the logic to make things compatible, since the logic is going both ways, it’s really more effort than it’s worth, so you’re better off trying to match controllers.

The DUT needs minimal changes to its code, just 12 extra lines, which makes little impact on the running of the original sketch, It also needs two I/O pins, one of which is capable of generating interrupts. On an Arduino Uno this is limited to pin 2 or 3. Check the documentation for your particular board. The rebooter requires 5V+ and ground for power, and uses three of it’s pins to control the DUT. One is an output to the DUT that carries the “are you alive” pulse. The second is an input that detects the “yes, I’m alive” pulse from the DUT. The third pin is connected to the reset pin on the DUT. If the DUT should hang or otherwise not respond to the rebooter, then it gets rebooted.

THE BUILD

Parts Required:	Jaycar	Altronics
1 × ATtiny85	ZZ8721	Z5105
1 × ATtiny85-Compatible Programmer	-	-
1 × Arduino UNO (or other interrupt supporting microcontroller)	XC4410	Z6280

Note: This project really only requires two Arduino-compatible devices. We’re using an UNO as the DUT simply because they’re common; and an ATtiny85 because they’re very cheap and readily available. You could also use an UNO as the keepalive or some other combination. So long as the DUT supports interrupts, the sketches should load without any issues. However you’ll need to adjust the pinouts for the keepalive in the sketch accordingly.

As the ATtiny85 is a small device and needs no external components for our application, hardware setup is very straightforward. We recommend that you use a small broadboard that supports the ATtiny DIP8 package. This is the simplest way to connect jumper wires. We had considered a small PCB, which was built like an Arduino module, but it seemed to be overkill for such a simple wiring task. However you could achieve a module-like configuration fairly easily with some other prototyping board.

There are only a few connections required.

ATTINY85 AS KEEPALIVE	UNO AS DUT
Pin 4	GND
Pin 8	5V
Pin 5	Pin 3
Pin 6	Pin 11
Pin 7	Reset

As you would have noticed, we’re powering the ATtiny85 directly from the DUT, which seems to make sense. If the DUT loses power, then there’s nothing to restart anyway, so it’s not a massive problem. If you’re using two UNOs you can run one from the other without overloading the internal regulator, or power each of them independently.

THE CODE

This is one of those ideas that takes very little code to execute well. We’ve tried keeping things as simple as possible.

THE KEEPALIVE CODE

You’ll need to programme your ATtiny85 first. There are several methods available for doing this, however a USB programmer is usually the simplest, which is why we recommend this method. If you’re used to advanced programming methods you probably won’t need our assistance anyway!

If you haven’t used an ATtiny85 before, you’ll have to add support for it, via the boards manager in the Arduino IDE.

Load up the code onto your ATtiny85.It's fairly straight forward, though we do implement the "TinyPinChange" library for a reliable pin change event. The ATtiny85 doesn't behave quite the same as a regular Arduino board. For this reason, if you are testing on an UNO or similar, you may need to make some adjustments accordingly.

#include "TinyPinChange.h"

It's quite easy to use, and we implement it with the following.

TinyPinChange_Init();
virtPort0 = TinyPinChange_RegisterIsr
(rebooterReceivePin, itsAlive);
TinyPinChange_EnablePin(rebooterReceivePin);

This allows us to detect the returned pulse from the DUT after we send out our ping to see if it is indeed alive.

By default we send a pulse every 60 seconds, however you can eaily increase or decrease the frequency.

unsigned long waitTime = 60000;

Simply update it to the time you prefer (in milliseconds) and you're set. Keep in mind that a very short waitTime could create additional load on your microcontroller, and you should really set the minimum to a few seconds, since the call and response actions take a little time to complete.

THE DUT CODE

The DUT needs to know to listen for the ping from the keepalive. Otherwise the call would perpetually go unanswered and the keepalive would keep trying to restart the system.

Since the restart action from the keepalive is a hardware-level reset, if you don’t load the code correctly and have the keepalive properly attached, you could certainly find yourself in a restart loop.

Since this keepalive idea may be integrated into your own sketches (after all, you’re unlikely to need to keep an Arduino running if it has no code!), we’ve tried to make implementation as easy as possible.

We have created the dut.h file, which provides all the code required for operation. All that's required is to use a #include in your own sketch and initialise the code. We have also provided a template sketch for clarity, which you can use as a basis for your own sketches which implement the code.

Note: This will knock out the operation of pin 3 and 11 on your Arduino. If you need these for other purposes, you’ll need to adjust the pin configuration and wiring accordingly. However the reset pin is determined by your microcontroller (as it’s a hardwired interface at the chip level), and keep in mind that the pin receiving a ping from the keepalive must have interrupt support.

// Your library include's and 
// variable declaration here
#include dut.h
  
void setup() 
{
  // This configures the I/O pin and sets up ISR
  dutSetup();         
  // Your main setup code here
}
  
void loop() {
  // put your main code here, to run repeatedly:
}

As you can see it's just a few lines more than an empty sketch! This file is included as DUT_setup_example.ino in the digital resources.

The pin setup for the DUT is inside the dut.ino code, which can easily be updated. However Pin3 is the default as it's interrupt-enabled on the UNO.

#define dutIOPin 3

OPTIONAL ADDITIONS

While not essential for operation, you can easily add LEDs on the two communication lines to watch for call and response actions.

You can also add a third LED to the reset pin, which is pulled to ground on reset. Take careful note of the orientation and wiring for this LED, as it’s essentially reversed to the other two, since it’s pulled to ground, not sent high. So it needs to connect to 5V, not ground, and therefore is reversed when compared to the others.

See the amended diagrams for wiring.

Additional Parts Required:	Jaycar	Altronics
3 × Red LED	ZD0150	Z0800
3 × 330R Resistors*	RR0560	R7546

*Quantity shown, may be sold in packs.

TESTING

If you have taken the extra step of installing the LEDs, you can test operation before you install any software on your DUT. You will see after 60 seconds (less if you have changed the interval time in the keepalive code), you will see LED1 briefly illuminate. After a second interval passes, you’ll see it happen again. Since the DUT is not running, no response will be received. You will then see LED3 illuminate and the Arduino will restart.

If the DUT and keepalive both have code installed, you can easily test the restart actions by disconnecting the feedback wire from the DUT. Even if the response is sent, the keepalive will assume it was not, and attempt restart after both failed responses. However note they are very rapid, so they don't hold up your DUT. We also have a "slow" version of the code in the resources if you wish.

WHERE TO FROM HERE?

If you are pushing the limits of your microcontroller’s memory, you may find that this system is invaluable to keep things running. However in its current form, there’s no additional actions taken when a restart is initiated.

If you know that it happens and would like to keep a log of the events, then perhaps upgrading the ATtiny85 to a more capable piece of hardware, such as an ESP-based WiFi-enabled chip, you could easily send alerts when a restart took place, so you were aware each time a restart event happened.

Sometimes, there’s simply no logical reason for a freeze event to happen. Sometimes it’s code, a tiny memory error that the software failed to handle, a math error, or something else so minor that it never appears again. Of course, with external interfacing we do via GPIO, occasionally the hardware will trigger a fault because something strange has happened in the real world that is outside of expected parameters.

Whatever the cause, restarting it will often fix things, and now it will happen automatically, and your Arduino can continue doing it’s thing - whatever that may be.

Issue 79 out now.

Issue 79 available now

in interactive & digital.

Arduino Keepalive

A Reset is Good For The Soul

This article includes additional downloadable resources.
Please log in to access.

THE BROAD OVERVIEW

HOW DOES IT WORK?

HARDWARE OPTIONS

THE BUILD

THE CODE

THE KEEPALIVE CODE

THE DUT CODE

OPTIONAL ADDITIONS

TESTING

WHERE TO FROM HERE?

This article includes additional downloadable resources.Please log in to access.

THE BROAD OVERVIEW

HOW DOES IT WORK?

HARDWARE OPTIONS

THE BUILD

THE CODE

THE KEEPALIVE CODE

THE DUT CODE

OPTIONAL ADDITIONS

TESTING

WHERE TO FROM HERE?

This article includes additional downloadable resources.
Please log in to access.