Secret Code

The Basics of Datalogging

Oliver Higgins

Issue 10, April 2018

Data logging is the collection of data over a period of time, and is something often used in scientific experiments. Data logging systems typically monitor a process using sensors linked to a computer. Most data logging can be done automatically under computer control. If you can read it, you can log it.

One of the first projects I ever designed and built using the Arduino Uno was a GPS data logger for a racing motorcycle. The job of this data logger was to log a timestamp, then a GPS location, followed by the various inputs on the bike, including throttle position, brake activation, lean angle and G-forces. This data was then plotted on a graph over time, and on a map showing where the rider was braking and accelerating.

Taking data logging to the extreme, consider the modern Formula 1 racing car, where each car has around 300 sensors.

Data is sent from the car to the pit garage, using around 1,000 to 2,000 telemetry channels, which are transmitted wirelessly using the 1.5GHz frequency. These channels are encrypted to stop other teams from reading the data. The typical delay between the data being collected and it being received at the boxes is about 2ms. For each race, the amount of data collected is in the range of 1.5 billion samples. When combined with practise and qualifying sessions, the total amount of data collected is typically in the range of around five billion samples! During a 90-minute practise, session teams can receive between five and six gigabytes of raw, compressed-format data, from one car alone.

Data Files

When logging data, there are two different ways in which we can choose to save it; we can use a traditional text file or a binary file. Both have advantages and disadvantages, and it depends on your final goal as to which one will suit you best. Text-based files are human-readable and simple to interpret. Examples of this would be eXtensible Markup Language (XML) and Comma Separated Value (CSV) files. Both these files can be open in a text browser such as Notepad++. In a CSV file, each row of data is present with a comma separating each value. This is raw and not without its issues when defining data, but it is very easy to implement, and spreadsheet programs such as Excel will open it. XML can also be read but the greater definition of the individual data elements comes at a significant overhead in terms of file size. By contrast, a binary file will have all its data written at a byte level. Each number, reading or measurement will be recorded in either a set pattern or a match pair-style arrangement. For the human reader, this is non-sensical; however, due to its small size, the storage requirements become quite compact, meaning you can either store lots more data or use a smaller space much more efficiently. For example, the binary or single byte for the number 129 would be 1000001. This is a single byte, however, to store this as human-readable text would mean storing the character 1 then 2 and finally 9, resulting in 3 bytes being used: 00000001, 00000010 and 00001001. So, for the purpose of these experiments, we will be sticking to CSV files, which makes debugging and interpreting the data much easier.

Data Types

We generally stick to main data types, text or strings, numbers and time/date. However, some of these can be used and interpreted in different ways. It is crucial during the design stages of your project that you ask yourself “what do I want or need the outcome to be, and how do the elements of the data relate to one another?”

One key thing that your data must have is a unique identifying record to ensure each record can be identified. When data logging, this is usually a timestamp or some type of counter. It can be sequential, but it does not need to be. When logging any data, numbers are always faster to process. If your project is time-sensitive in its intervals, try using an index or timestamp instead of the time and date itself.

While some may be divisible, we find using a Unix timestamp one of the best and simplest ways to do this. There are not many languages that do not support it, and it is native to tools like C/C++ that were born out of Unix computing labs. The time stamp is represented as a whole number and the number of seconds since Epoch. In this case Epoch is 01/01/1970 00:00:00. Using this we get a unique number that is universally and easily translated. Just a reminder, this is the number of seconds, so your timing for collecting data needs to be done in at least one-second intervals. If you need faster cycle rates, then you will need to investigate other data types and processors.

In this part, we break down how to make the system create a CSV file and write data to it. In this case, we will just iterate a variable, then apply some basic algorithms to give you a data stream. What we have done though, is supply the bare minimum required to get this working on the Arduino or Raspberry Pi platform (using Python).

What's a datalogger?

We have been discussing data-logging but what is a data-logger? A datalogger is generally a commercial product designed to record data for a particular task. Take for example small data-loggers used by air-conditioning companies. These are often as small as lithium cell batteries and on the surface have no discernible way of retrieving the data. They are used in solving problems with air-conditioning systems that require multiple spaces to be measured at the same time. They record data in a particular way and are often proprietary in how the data is read and interpreted.

These can often be left “in the wild” for months or even years at a time. We can then extend this to many types of test equipment such as multimeters, oscilloscopes and multichannel bench-top data-loggers. These units will log 16 or more simultaneous data channels allowing you to test multiple data streams but compare them against each other with a common index point. If you are designing circuits that are TTL or logic level, a data-logger will prove invaluable in finding faults and issues.

Arduino

Firstly, we will look at the Arduino. For this article, we will be using the XC4356 from Jaycar. This shield contains an SD card slot and a Real Time Clock (RTC). Both of these can be used on their own, and both use standard libraries. The SD library is pre-installed, but you will need to install the RTClib library, which is included - although we recommend getting the latest version from the Arduino library. To do this, open up the Arduino IDE and go to the menu bar, selecting: Sketch > include library> manage library, to open the library manager. Once you have this open, type in “RTClib”. You will then find RTC, which will be used in this sketch.

If you decide to run the SD test code or any other code that uses the SD card with this shield, then be mindful that the standard Chip Select pin is 4; but in this instance, it is connected to pin 10. Bear in mind that using the SD card FAT library will require a large overhead to operate. If your sketch ends up large or close to the capacity of your microprocessor, it may result in runtime errors.

The following code is truncated from the full code set. While the code itself is not large we will just cover the key points.

void setup() {
  rtc.begin(); 
  //Uncomment the below line to
  //set the time on the RTC 
  //rtc.adjust(DateTime(__DATE__, __TIME__)); 
}

One of the first things we need to do is to set the time. The RTC that we used, in this case, had the time and date set, but it was for 2012. The first time you plan to run any code you need to set the time and date. In this case, we have included a commented out line that will set the time and date based on the time and date of the sketch being compiled. Uncomment this and upload your sketch. Then the next time you upload your code, either comment it out again or delete the line. If you leave the line in place, then each time the code is restarted the time will be reset to whatever time it was when you compiled the sketch. This will be fine for the testing phase when you are recompiling, and the unit is plugged into the computer, but once it is out in the field, then it will reset the time whenever there is a restart.

int 1=1; 
void loop() { 
  DateTime now = rtc.now(); 
  String dataString = ""; 
  dataString += String(now.unixtime()); 
  dataString += ","; 
  dataString += String(i); 
  dataString += ","; 
  dataString += String(-i); 
  dataString += ","; 
  dataString += String(i - 10); 
  dataString += ","; 
  dataString += String(i + 5); 
  dataString += ","; 
  dataString += String(i * i);

This code is what generates our data for the row that we will be logging. At the start of the loop we create a DateTime variable that will hold the time at the moment we start the logging. We then read the current time from the RTC using the rtc.now() function (we declared this object outside of this code).

Next, we declare the String that we will use to hold the data. Once this is done we start to concatenate the string together. First, we’ll get the current time in the Unix timestamp format and convert it to a string. Then add a comma to the string “,” before moving to the next value.

We then get the string of the variable “i”, and add the comma. This goes on slightly modifying the string output of “i” with some simple math. You could easily put in one of the analogue or digital ports, GPS data, or anything you can read from. Please note: this code is intentionally verbose to highlight how each part comes together.

File dataFile = SD.open("datalog.csv",FILE_WRITE); 
if (dataFile) { 
  dataFile.println(dataString); 
  dataFile.close(); 

i++; 
delay(5000); 

With our String complete and ready to write to the file, we open the file on the SD card. In this case datalog.csv, we have also specified that it be opened as FILE_WRITE, as this is what we plan to do. If it is not there, then the SD library will create the file. To write to the file, it is very similar to the Serial.println function. In this case, we specify the SD object and use the println function filing out the parameters with the string we just created. Once complete, we close the file, iterate “i” for the purpose of this article, and then delay the system for five seconds.

Python

Using Python on the Raspberry Pi is even easier than the Arduino version. In the below example we have just specified to write to the user directory; however, you may find it better to write to a USB storage device. If you do, you will need to locate the mount point of your USB device and replace it in the path below.

import time
from time import sleep
file = open("/home/pi/data_log.csv","a")
i=0
while True:
  i=i+1
  file.write(str(int(time.
time()))+","+str(i)+","+str(-i)+","+str(i-10)+","+str(i+5)+","+str(i*i)+"n")
  file.flush()
  time.sleep(5)

We start by importing time and sleep, with time being used for our timestamp and sleep to add in a delay. Next, we create a file object giving the location of the file and the name we will use. The “a” is to append to the file. Similar to the Arduino example we use “i” as our counter, then we enter a never-ending loop. The first line iterates the “i” variable, so we are starting at 1.

Next, we go straight to the file write. Python handles its strings much more easily than in C++/Arduino, and we can build the same string but on one line. Note: we need to include the “n” newline character at the end, as the file.write does not work like the println function.

Next, we flush and stop writing to the file before the sleep function is called. After five seconds the code loops and we start again. We have included a file in the resources. Note: we have added in a line to make this loop 50 times before being terminated, and it then uses the file.close() function to release the file back to the operating system.

The resulting data will look like this:

1518085427,1,-1,-9,6,1 
1518085432,2,-2,-8,7,4 
1518085437,3,-3,-7,8,9 
1518085442,4,-4,-6,9,16 
1518085447,5,-5,-5,10,25 
1518085452,6,-6,-4,11,36 
1518085457,7,-7,-3,12,49 
1518085462,8,-8,-2,13,64 
1518085467,9,-9,-1,14,81 
1518085472,10,-10,0,15,100 

READING YOUR RESULTS

Congratulations! You now have a dataset! But what do we do now that we have it?

There are many tools available to help you analyse your data. Matlab would be the gold standard but it has a steep learning curve. One of the quickest and simplest ways is to use a spreadsheet program like Microsoft Excel, Apache’s OpenOffice, or Google’s online offering, Sheets. Sheets is free to use for anybody with a Google account. Go to https://docs.google.com/spreadsheets/u/0/ and select a blank sheet. Once created, go to file then import and locate your CSV file that we created. You will see all your data laid out here in neat, tabular divisions. You will note that column A is the Unix time stamp. To convert it to something we can easily use, insert a new column next to column A. In this column insert the following formula:

=A1/(60*60*24)+"1/1/1970"

Once complete select the “123” icon from the toolbar and select “Date Time” and you will now have the correct time and date for your sample. Select the cell again and copy down using the handle on the right-hand side to the bottom of your data, replicating the formula and format. What we need to do next is visualise this data. Select all of your data from B1 all the way down to G50. With all of this data selected go to “Insert” on the menu bar, then select “Chart”. Google Sheets will create a line chart by default, and you will be able to “see” what your data looks like.

If you want to take your data analysis to the next level you should take the time to learn and understand MATLAB. It is more then a program, it is a language in itself. It is designed specifically for engineers and scientists. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, apps, and built-in math functions enable you to quickly explore multiple approaches to arrive at a solution. It lets you take your ideas from research to production by deploying to enterprise applications and embedded devices.

Data logging is a fundamental component to exploring the world around us, and fundamental to getting your data ready for the Internet of Things (IoT). But remember, while everyone is pushing hard with the IoT ideals, there is very much a place and time for the simple data logger working in remote and difficult situations. The data you will be able to gain will be invaluable to any project.