Secret Code

Computer Languages

Mike Hansell

Issue 21, April 2019

When we write a program or sketch for our beloved Arduino, Raspberry Pi or any other computer, we generally use a programming language such as C or Python. The brain of your computer, the Central Processing Unit (CPU), does NOT understand this at all. So how does it know what to do and when to do it?


Our computers do not understand English, Swahili, C or Python. They have an internal language called machine language. It really is down at the level of 0’s and 1’s so is not particularly easy to understand and this is why other languages have been developed.


High level languages are developed to make it easier for programmers to write useful programs. Certainly, some computer languages are all but incomprehensible. For example, the most difficult language I have seen is APL. This is an example of APL program that plays the game of Life:

life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.ϕ⊂ω}

As previously stated, a complete mystery.

Some high level computer languages are much closer to English than the APL above. BASIC (Beginners All purpose Symbolic Instruction Code) was designed to be an easier language for beginners. It is still very popular. The early versions of BASIC were not as easy to understand as more modern versions and allowed very sloppy code practices.

An example of BASIC code:

10  FOR j = 1 to 10
20    PRINT j
  30  NEXT j

Early BASIC required the use of line numbers. The big problem with early BASIC was the GOTO statement. This allowed code to branch to a specific line number and continue execution from there.

10  IF k = 1 THEN GOTO 200
20  IF k = 2 THEN GOTO 300
30  (if k does not equal 1 or 2, execution continues here)
200  PRINT "k = 1"
  300  PRINT "k = 2"

The code above shows testing the value of variable k. If the value of k equals 1 then code execution continues from line 200. If the value of k equals 2, then code execution continues from line 300. If the value of k does not equal 1 or 2 then code execution would continue from the next line.

Note: Without further code, if k = 1 then the program would print both k = 1 and k = 2. This is only an example of syntax, not a fully functional program.

Consider that line 200 or 300 means very little to anybody and would be difficult for the original coder to recall after a short time.Later versions of BASIC introduced the GOSUB keyword. This is short for “Goto subroutine”. A subroutine is a block of code that performs some function then continues to the next statement (line) following a RETURN at the end of the subroutine. So GOSUB allows branching as before but in a more controlled and predictable manner.

Just as we can GOTO or GOSUB depending on not only the value of a variable (as in k = 1), we can branch on a number of conditions too, like >, >=, <, <=, =, not equals. Also, we can RETURN from the subroutine if a variable is > etc.

IF k > 1 THEN GOSUB bigger()
  IF k < 1 THEN GOSUB smaller()
    PRINT "Greater than"
    PRINT "Less than"

The previous code is an example of more modern BASIC syntax. Note the use of named subroutines. This demonstrates that computer code can make decisions, which is a very powerful concept. Being able to make conditional branches is fundamental to all programming languages right down to machine code.

Translating the above simple examples to Arduino C looks like this:

int k = 1;
void setup()
  if (k > 1)
  else if (k < 1)
void loop()
void doGreaterThan()
  Serial.println("Greater than");
void doLessThan()
  Serial.println("Less than");
void doEqual()

You’ll notice in C that line numbers are not used. doGreaterThan() is a function (the C term for a subroutine) that we have written to handle the case of variable k being > 1. We can name the function anything within reason. Xy24-gev() is meaningless compared to doGreaterThan(). Similarly, GOTO 200 is less meaningful too.

There are many high level languages available depending on the hardware platform you are using. The Arduino’s standard environment is a hybrid of C and C++, with some hardware specific code thrown in. A more visual programming environment for the Arduino is ArduBlock or Snap4Arduino.

The limited hardware of the Arduino range is a contributing factor to the limited range of alternate programming languages.

The language most often use with the Raspberry Pi is Python. Several other languages have been ported (rewritten or adapted) to the RPi including C, C++, Scratch which we have featured in several articles and web oriented languages like HTML, Javascript and PERL.


A compiler converts the source code (the high level human readable code) of a high level language to executable machine code. Not all languages have or need a compiler. The C language is generally compiled.

This makes debugging more difficult as if you change the source code then it needs to be re-compiled before you can test it again. In the case of the Arduino, this is often not a major hurdle as sketches (programs) written for an Arduino are relatively short, meaning that they compile and are uploaded to the CPU fairly quickly.

An interpreter reads the source code line by line every time the program is run, and executes the appropriate machine language instructions to perform the necessary action such as adding, multiplying, comparing variables etc, in the same fashion, every single time. This is generally relatively slow, however, it does allow easy modification of the source code to improve or debug it.

BASIC was initially an interpreted language only. Modern BASIC may have an interpreter and/or it may be compiled depending on the implementation.

Depending on the language, it may be possible to 'include' extra external code libraries. This is very common and often a simple means to provide extra functionality.


Some languages or versions aren’t interpreted or CPU machine code. Instead, they are converted to an intermediate code, often called p-code. If a compiler is written for a PC using Windows, the executable code that it generates would not run on a Mac or Linux computer. However, if the program is converted to an intermediate p-code that isolates the source code from the underlying CPU and/or operating system, then that p-code could be run in many different environments provided there is a suitable package to execute the p-code for that environment. P-code is generally smaller than native machine executable code but as it is interpreted, it may well run slower.


To understand machine code, we need to understand a little of the internal operation of a computer. For this discussion. we’ll keep to microprocessors and microcontrollers.


The CPU (brain) of an Arduino UNO is the ATmega328p made by Atmel. The CPU in a Raspberry Pi is based on the Cortex-A53 made by ARM and is incorporated into an even larger IC. The original IBM PC used an 8088 CPU from Intel. The 8088 was quite a step up from the CPUs generally used in microcomputers before the IBM PC was launched. Modern PC’s and Mac’s use a much more advanced CPU, like the Intel Core i7.

We should understand that one of the major developments in CPU design has been the introduction of ‘multi-core’ CPU’s. This means that there are multiple pretty-much independent CPU’s in the one package. The advantage of this is that instead of being bound to one task, there are multiple CPU’s able to do multiple tasks.

We also need to understand that a microcontroller is a superset of (more than) a microprocessor. It may have built in counters, timers, flash memory, read only memory, Analogue to digital converters, Digital to Analogue converters, GPIO etc. All this but still based on the same basic hardware.


In electronics, the term ‘bus’ usually refers to a group of similar signals. You may be familiar with the terms USB bus, PCI and PCI-E bus. The data bus is part of the connection between the CPU and physical memory systems like RAM etc. This memory can be read and depending on the system, written too.

So it can read, but what does it read? In the most basic terms, it reads binary, 0’s and 1’s. Depending on the CPU it may be able to read/write a single bit (a ‘binary digit’ being 0 or 1), an 8-bit byte, a 16-bit value or bigger. Another term that is used is word. A word is usually the same size as the data bus, but not always. The size of a word is specific to the CPU. The CPU of an Arduino UNO is 16-bit and can read and write in 16-bit words. Modern personal computers may be using 32 or 64 bit words. Some VLIW (Very Long Instruction Word) CPUs may be using 128 or 256 bit words.

The address bus also connects the CPU to physical memory and is used to determine which of the available memory locations is going to be read or written. Let’s say you are looking up a component in your favourite electronics store’s catalogue. You would refer to the index which would give you the page number. Similarly, a computer refers to a memory address (the catalogue index) and reads the data it finds there. This data is transferred to the CPU via the data bus.

The control bus carries many important signals, for example, a signal that determines whether a memory location is to be read or written. Other signals on the control bus are dependent on the CPU architecture, but may include a clock signal (not the time, but a signal related to timing of access of memory etc), reset and others.

In the case of the 328p, none of the data bus, address bus or control bus is exposed at the hardware level. i.e. the pins on the CPU package itself. In the case of RPi, the CPU is, in fact, part of a bigger SoC or System on a Chip which seems to include all of the memory, in which case it doesn’t need to expose the buses.

CPU’s like the Core i7 do expose the buses as these CPU’s often have access to far more resources (like RAM) that could be included on the CPU.


A register can be thought of as very high speed internal memory that can be used to temporarily hold a value. Depending on the CPU, it may have many internal registers or few. The 328p has 32 x 8-bit general purpose registers. The RPi’s CPU has 13 x 32 bit registers plus many other non-general purpose registers.


The program counter is an internal register that keeps track of the address of the current instruction. It is not directly writeable.


The status register tracks the outcomes of mathematical and logical operations. Depending on the CPU it may be 8-bits (eg 328p) to 32 (Intel x86) bits wide. It is this register that allows testing for values being equal, greater than etc.


The stack is an area of memory that is reserved to allow the executing program to temporarily store the values of registers if they are likely to be overwritten by other code. For example, some instructions may require specific data to be stored in a register before they are used. To keep a copy of the current value, it can be ‘pushed’ onto the stack. After the code that overwrites that register has completed, the original register value can be ‘pulled’ from the stack. When using call’s (rather than branches) the address of the call statement is automatically pushed and pulled from the stack. The stack pointer, as you may have gathered points to where in memory the stack is.


Modern fast CPU’s have long exceeded the speed of the RAM memory they use. When reading from this memory the CPU may need to slow down waiting for the data to become available. The idea of a cache is that when an instruction or data is read from memory, more is read than is required at the time, in the assumption that it may be required for subsequent operations. This is held in fast memory and can be accessed as necessary from there.

Modern CPU’s have independent caches for instructions and data. Each of these caches may have several ‘levels’ of cache with different sizes used for different functions.

The 328p has no cache. The Cortex A53 has 2 levels of cache with configurable sizes. The core i7 has 3 levels of cache, with sizes of 32kB (level 1), 256kB (level 2) and 2MB (level 3).


We now know that our computer reads instructions and gets data as it runs a program. What happens if a hardware device (or maybe another program) needs urgent attention? Maybe you pressed a button connected to a GPIO pin and the program needs to perform some action (maybe illuminate an LED). How can this be accommodated?

A hardware interrupt is an external signal that is applied to a specific pin on the CPU. The number of pins that accept interrupts depends on the CPU.

When the interrupt occurs, assuming that the CPU is set to handle them, the program calls a piece of code that the user has created to handle it. This is generally referred to as an Interrupt Service Routine or ISR for short.. As the main code is not now being executed, the ISR should execute as quickly as possible, so that the main code can continue. In the case of microcontrollers, timing of events may be critical. For example, if the program reads a sensor every 10ms, and if the ISR runs for 20ms, a reading of the sensor will be missed.

Depending on the CPU, it may be possible to programmatically disable interrupts. This could be useful in ultra-critical timing operations. Many CPU's have a non-maskable interrupt, often referred to as NMI. This type of interrupt cannot be disabled. This would be useful, for example, if it is determined that the power is failing. In this case, we could save and initiate a graceful shutdown.

While we’re discussing interrupts we should look at multi-tasking. In computer terms, multi-tasking makes it appear that several things are happening at once, but in fact, the CPU can attend to only one task at any time. If it is regularly switched between several tasks, then it will appear to be doing many things at once. i.e. multi-tasking. Many modern CPU’s have multiple cores, that is multiple CPU’s in the one package. Each core may be working on a separate task or multi-tasking, making the whole system faster.

Some computers have many discreet CPU's. In fact, modern super-computers may use thousands of individual CPU's working in tandem.

While the 328p has only one core it is easy to set up a regular ‘heartbeat’ interrupt to handle the above case of regularly monitoring a sensor, or anything else really. The simplest method is to use one of the digital GPIO pins to generate a stream of pulses, basically a PWM signal and feed that back to one of the interrupt handling pins. With the appropriate ISR, the main program can run and when the interrupt occurs, the ISR can read the sensor.


The design of a CPU includes a definition of its instruction set. Each instruction is given a mnemonic (pronounced nem-on-ic) which is a number of letters such as JMP for a jump instruction, to allow easier understanding of the individual instructions. Instruction sets vary greatly from CPU to CPU. An Instruction tells the CPU what to do, maybe add 2 registers, write a value to a GPIO pin, branch to a subroutine etc. While the CPU reads raw binary 0's and 1's, humans don't as we've already discussed. Assembly language allows a program to be written with these mnemonics instead of the raw machine code.

Mnemonics are like keywords in a high level language. While they are human readable (to some extent) they are not the final product that the computer will execute.


When a computer is started (powered on, reset etc) it needs to find some instructions on what to do. This is called bootstrapping, derived from an old term, “pull yourself up by your bootstraps”.

We know that a computer has basic facilities like read and write. It can read the contents of memory (EEPROM, ROM, RAM) but without a lot of extra software (an operating system like Linux, Windows, macOS, etc.) it can’t read a hard disk drive, USB drive etc, just basic hardware.

When we power on an Arduino it loads a bootstrap program. When we start Windows or macOS, there are many stages of booting. There will be some code stored in ROM or similar. This will allow reading of storage media such as a hard disk drive or USB flash drive. This then leads to the loading of the operating system (Windows, MacOS, Linux etc).


RISC is an acronym of Reduced Instruction Set. CISC is an acronym of Complex Instruction Set. The CPU’s in PC’s and Macs use a CISC instruction set.

CISC architectures have instructions, for example, that do very heavy maths, may have many ways to access external memory and perform multiple operations in one instruction.

RISC architectures have a reduced number of simple short instructions. It was proposed that this scheme would outperform the CISC system, and in some cases this is true. The 328p is a RISC processor with 131 instructions that typically take only one clock cycle (the default clock of 16MHz gives a clock cycle of 62.5ns). The Cortex-A53, which is the CPU of the Raspberry Pi 3 Model B+, has multiple instruction sets. A total count of its instructions has proven difficult to determine.


Now that we have a good understanding of how high level languages work and how the hardware of a CPU works, it’s time to get down to the lower level software.

Assembly language (sometimes referred to as Assembler language) is a human readable representation of the machine code the CPU actually runs and is written with the mnemonics discussed earlier.

It is the task of the assembler program to convert (to assemble) the source code into executable machine code. Assemblers generally allow the use of defined or named constants, often referred to as equates or labels.

For example:

* We use * char to show that this is a comment
ORG        0000  * Program runs from here
myVar EQU  0x12  * This is an EQUATE similar to a 
                 * #define meaning that the 
                 * variable MyVar = 0x12 and can't 
                 * be changed.
myLab LDA, B
JMP        myLab * This is an example of using a 
                 * label

ORG is short for origin and tells the assembler where in memory the program is designed to run. Each type of CPU will have a memory address that is expected to have instructions to execute at startup and reset. i.e. for bootstrapping.

Note: Many programmers do not regularly use assembler and possibly have never touched it.


In our discussion of the BASIC language, we mentioned that the GOTO statement is or was available depending on the particular implementation of the language. We also mentioned that GOSUB is a better choice in that it allows easier to understand code.

There are machine code versions of these ideas. GOTO can be implemented in branch or jump instructions. These simply move execution to a nominated place in the program. The GOSUB keyword is similar to the typical call instruction. It calls (branches to) a subroutine but uses a return instruction at the end of the subroutine to continue execution from the next instruction.

In the BASIC examples, we showed how we can compare values and make decisions based on the result. Machine code typically has several types of compare instructions. These may compare to a given value, to a value in a register or to a value stored in memory. The important concept here is that mathematical or logical operations set the appropriate bits in the status register.

We can, again depending on the instruction set of the CPU, test and branch on the value of bits in the status register. For example, if an operation was equal (i = 1), then we could branch to code that handles that condition. If the result was not equal, we could branch to different code to handle that too.


We need to remember that the 328p is a RISC CPU. It does NOT have the bells and whistles of a CISC CPU because it doesn’t need them. RISC CPU’s are designed to be relatively simple but fast.

The 328p has CP (compare), CPI (compare with immediate), CPC (compare with carry) instructions to compare 2 values. CP compares values in 2 registers. CPI compares the value in a register with a given data value. i.e. the value is part of the instruction. CPC compares values in 2 registers plus the carry bit. These instructions set the bits, often referred to as flags in the status register as necessary.

We’d also expect to be able to do some basic maths like add and subtract. Some CPU’s can perform extensive floating point operations.

SUB: Subtracts the value of one register from another and stores the result in another register.

SUBI: Subtracts a supplied value from a register and stores the result in another register.

Both of these instructions use 2 bytes (1 word) and execute in one clock cycle which is typical of the RISC architecture.

The 328p also supports multiplication but oddly not division.

There are two call (call a subroutine) related instructions, CALL and RCALL. CALL allows access to the whole address space of the CPU. RCALL allows access to a restricted address space, that being 2048 words below the current program counter or up to 2048 words above the current program counter. In typical RISC fashion, this instruction including the offset (+ or – 2048 ) is included in the 16-bit (single word) instruction. The call instruction uses two words (4 bytes). You can see that we can save 2 bytes of program space if we can use a RCALL rather than a CALL. When working with the limited available memory of an Arduino, every byte counts.

A little surprisingly, the 328p has an instruction that supports the DES (Data Encryption Standard) algorithm. It uses all sixteen of the internal registers to perform the data encryption or decryption and requires sixteen iterations of this instruction to perform a complete DES operation.

In contrast, the Core i7 CPU has instructions with a size of 1 byte up to 17 bytes. Certainly CISC, not RISC.

For a good look at the Core i7 refer to:


Depending on the CPU, there may be several different methods to access a data value or memory location. Let’s assume we want to read memory address 0x1000 and put that value into register R4. We’ll use pseudo-code for the following examples.

Using direct addressing, we could have an instruction like:

When using direct addressing, the memory address to use is part of the instruction. The pseudo-code above would copy the contents of memory location 0x1000 to register R4.

Another common addressing technique is generally referred to as

LOAD R4, (1000)  
    *The ( ) around 1000 implies ‘the 
    *contents of’ memory address 

Indexed Addressing. This means we use a value stored in memory or a register plus another value to give the final address.

Let’s assume that we want to read the second element of an array. In a high level language we could use something like:

byte myArray[4] = 255, 1, 3, 5, 7;
byte myValue = 0;
myValue = myArray[2];  
*NB myValue = 3 as referencing elements in an array generally starts from element[0], which in this case = 255

In Assembler we could use indexed addressing, like this:

LOAD R1, addressOfMyArray  
  *We have reserved some memory for this array so we know where it is in memory
LOAD R3, 2
  *We want the 2nd element of the array
LOAD R4, (R1) + R3 

This means:

  1. Put the value addressofMyArray into register R1. Note at this point the value in R1 may be a data value or an address. The program doesn’t know.
  2. As we want the second element of the array, put that element number, 2 in this case for the second element into R3.
  3. Determine the memory address to read by adding the value held in R3 to the address held in R1.
  4. Read the contents of the computed (indexed) memory address and store the value in register R4.

We have effectively read the second element of the array. Indexed addressing modes can be much more complicated than these examples.


You may have thought that we were at the lowest level of CPU operation, but some CPU's have another lower level called Microcode.

Some instructions are very complex and in some CPU’s these are broken down internally and invisibly to the user and programmer into smaller instructions. These are called microcode instructions.

As this technology is very valuable (secret) to manufacturers such as Intel, little information is available on how it all works, however, it is known that the internal microcode can be changed to fix bugs or enhance operation.

Intel CPU’s use microcode, it appears the 328p does not, and there is not enough data for the ARM Cortex-A53 to determine whether it uses microcode or not.


Depending on how the source code is written, especially with respect to dependence on external code modules, a compiler will either produce an executable program or it will produce an object code file. Object code files may be combined (linked) with other modules to form an executable program.


A linker may be used (sometimes invisibly) to combine several modules, libraries etc into 1 module or the final executable program.


A cross-compiler if available, may be used to generate executable code on one computer that is designed to run on another CPU architecture or operating system.


Like the cross compiler discussed previously, a cross assembler converts source assembler code from one style of machine to executable code for another.


A disassembler may be used to turn executable code into assembler source code. The original source code probably had many comments to identify what the code was doing. As these are stripped from the executable code because the CPU wouldn't understand them, the disassembler can't reproduce them.


Decompilers are similar to disassemblers, except that they generate source code in a high level language like C++.

Note: Disassembling and decompiling may be seen to be necessary to allow interoperability between software but is often used in hacking. This may be prohibited by a software license agreement or prohibited by law.