UART, which stands for Universal Asynchronous Receiver and Transmiter, is a serial communication protocol often used by microcontrollers to talk to computers or other microcontrollers. It’s often used to transmit characters and strings, rather than binary numbers like other serial protocols, like SPI or I2C. UART is extremely useful for debugging and interfacing with the user, so let’s take a look!
Quick disclaimer: this communication protocol is usually referred to as serial. However, SPI and I2C are also examples of serial (and synchronous) communication, so I will refer to this as UART, though technically UART refers to the hardware that implements this protocol, rather than the protocol itself.
UART Protocol
Serial interfaces like SPI and I2C have clocks that tell whatever is receiving data when the data should be sampled. For example, SPI mode 1 samples data on the falling edge, and I2C always samples data on the rising edge of the clock (I’ll make posts elaborating on this in the future). However, these are synchronous interfaces. UART, as the name suggests, is asynchronous, which means that there is no clock.
So, if you have no clock, then how are you supposed to interpret data? How frequently are you supposed to sample the data, and when should you start sampling? Well, let’s see what a UART packet looks like:

From ATMEGA32U4 datasheet
When there is nothing happening, the UART line is held high; you can see that the signal is high during idle. Then, when data is being sent, the line is driven low to indicate a start of a transmission. This low is not part of the data, and is only there to let whatever is reading the line that a transmission is about to begin. After one unit of time, the line is driven high or low for the first bit. Then, after another unit of time, the line is driven high or low for the second bit. This is repeated until all data bits are sent; during one transmission, 5, 6, 7, 8 or 9 bits are sent. Then, if parity is used, then the parity bit is sent. Lastly, the line is driven high to indicate a stop bit, ending the data frame. Afterwards, the line returns to idle, or the next data frame begins with a stop bit.
The stop bit is crucial to let whatever is listening to the line know that a transmission is beginning, so the receiver can use that to know when it should start paying attention. But I also said “one unit of time.” What is that, exactly? Well, this is where the asynchronous part becomes important. In something like SPI and I2C, since the master provides the clock, the slave always knows when to sample. However, since there is no clock in UART, the sender and receiver must each be told what the sampling frequency should be. This sampling frequency, which indicates bits per second, is called a baud rate. Popular baud rates are 9600 and 115200. If you’ve ever used an Arduino and seen the line Serial.begin(9600), then that’s what the 9600 refers to. If the sender and receiver both have the same baud rate, then proper communication is possible because both devices will send and read data correctly. However, if there is a mismatch, then the receiver will misinterpret the data and sees it as garbled nonsense.
The parity bit in the data frame is useful for detecting errors. If the parity is set to even, then the sum of the data bits and the parity bit should be even; if the parity is odd, the sum should be odd. For example, if the parity is even, and the data being sent is 0b00001110, then the sum of the data bits is 3 since there are three 1’s. For this, the parity bit will be 1, so that the sum of the data bits and the parity bit is 4, which is even. If the parity were odd, then the parity bit would be 0 for the previous example so that the sum is 3, which is odd. Parity bits are used for checking data corruption. If the receiver is expecting even parity, but the sum of the data bits and the parity bit is odd, then it’s likely one or more of the data bits, or the parity bit, was corrupted.
The most popular data frame is referred to as 8N1. This means that each data frame has 8 data bits, No parity bit, and 1 stop bit. However, there’s no need to adhere to this; if so desired, you can have 5 data bits, odd parity and 2 stop bits. What’s important is that both the sender, which sends the data frame, and the receiver, which receives the data frame, have the same format. Otherwise, the data will be flagged as invalid or corrupt and is usually ignored by the receiver.
Why UART?
So it’s cool that we learned how UART works, but why use it? SPI allows for way faster communication, and I2C allows addressing specific devices on a bus. USB has advantages of both SPI and I2C; it’s faster than UART, and can address specific devices. Surely, any of these protocols would be superior to UART? Well, the answer is because UART is easy for the hardware to implement. UART uses fewer signals and wires than SPI. The lack of addressing makes it simpler than I2C. And USB is a beast of an interface, with the documentation on the protocol easily exceeding hundreds of pages. USB also has to have a lot of hardware and software dedicated to it. Therefore, if you want something a simple interface to send and receive data, then UART is the way to go. This makes UART perfect for embedded systems.
So say you have a microcontroller with UART done and ready to go; how do you hook it up to your computer? That’s the whole point, right? Letting a user send commands and read data? Well, computers generally don’t have microcontroller compatible UART interfaces built in to them. You’ll have to get an adapter. I use FTDI Friend, which turns the powerful and complex USB protocol that computers use into the simple UART protocol that microcontrollers use, allowing my computer to talk to microcontrollers. On the software side, you’ll need a program that can connect to the USB adapter. When I plug in my FTDI Friend (after installing drivers for it), it shows up as a COM port. I use Realterm to connect to the COM port, which is where I can read microcontroller output and send commands by typing. You should check out Adafruit’s guide for a more detailed walk through.
UART Hardware

From ATMEGA32U4 datasheet
In ATMEGA32U4, the hardware that we’ll use is actually a USART, which stands for Universal Synchronous and Asynchronous Receiver and Transmitter. USART can do synchronous or asynchronous, but we’ll only use it for asynchronous, which is why I’ve been referring to it as UART until now.
The clock generator is responsible for generating the baud rate. It takes the system clock, then uses a pre-scaler and counter to generate the desired baud rate, just like timers. The equation is given below:

From ATMEGA32U4 datasheet
The Transmitter puts data in UDR (USART Data Register) and loads it into a shift register, which gets shifted out by the baud clock. This ensures that data is outputted at the right baud rate. The transmitter also generates the parity bit, if it is used (won’t be for our case).
The Receiver is more complex. Basically, it takes the received data and writes it to UDR. However, in order to do that, it must perform clock and data recovery, which is a fancy way of saying the hardware works hard to make sure the data is sampled correctly. The receiver also performs parity check, and reports if a parity error has occurred. But all of this is done in hardware, and so the software doesn’t have to worry about it. All the software needs to know is that the receiver loads the received data into UDR.
But wait… the transmitter reads from UDR, and the receiver writes to UDR? Wouldn’t that mean the transmitter would just transmit whatever the receiver just received? Or that writing to UDR to transmit something would cause the data written by the receiver to be overwritten? As it turns out, UDR is actually two register masquerading as one:

From ATMEGA32U4 datasheet
UDR has, from the CPU perspective, a write-only and read-only portion. When the code wants to transmit something, the CPU writes to the write-only portion of UDR, and then the transmitter loads that data into the shift register. When the receiver receives data, that data is written to the read-only portion of UDR, and then the CPU can read that data. As you can see, UDR has the transmit data and receive data separate, so there is no risk of conflict.
It would actually more accurate to describe UDR has three registers: one write-only, and two read-only:

Above, you can see UDR, the write-only portion, feeds the transmit shift register. Meanwhile, the receive shift register feeds two read-only registers. This is because the received bytes go into a buffer. If the receiver receives a byte, it’ll load that byte into one of two UDR read-only registers. Then, if it receives a second byte, it’ll be loaded into the other UDR read-only register. If a third byte is received, then that data is held in the receive shift register, to avoid overriding data in UDR. If a fourth byte is received, then the data being held in the receive shift register is lost. This buffer gives the CPU more times to read data out of the receiver. Each time UDR is read, one spot is opened up, allowing one more piece of data to be received by the shift register.
Likewise, the transmitter has a buffer. When data is written to UDR’s write-only portion, that data is immediately loaded into the transmit shift register, where it is shifted out bit by bit, which can take time. During this time, UDR is ready to receive more data, so the CPU can load data in again. As soon as the transmission of the first data is complete, the second data, which was being held in UDR, is immediately loaded into the transmit shift register again. This allows the transmitter to send data continuously, even though the CPU is only loading data into UDR periodically.
Software
So now we know how UART works, why we should use it, and what hardware we have to work with. Let’s look at how to code for it.
The first thing we have to do is configure the hardware. This involves setting up the baud rate, the number of data bits, parity bit, and number of stop bits. In addition, we have to set the pin directions; the transmitter should be an output, and receiver should be an input, right? Actually, in some applications the microcontroller might only be receiving data, so the transmitter pin isn’t actually used as a transmitter, and might be an input for something else. Likewise, if the microcontroller is only transmitting data, then the receiver pin might be used as an output for something else. So we’ll have to be careful about that. Lastly, we have to configure the USART to act as a UART. The constructor is shown below:

At the top is the constructor name and its arguments: it takes in a baud rate, mode (receive only, transmit only, both, neither), data length, parity and number of stop bits. All of this is used to configure the UART as desired. Additionally, RX_Buffer is created; this is the ring buffer we worked on last time. I’ll go more into it later, but for now the important part is that (a) the ring buffer is created, and (b) the global variable RX_Buffer_Ptr is set up to point to RX_Buffer. Besides that, the constructor is pretty straight forward: USART is configured to work as UART, and then helper functions set up everything else. Note that the constructor has default values for its arguments. Unless told otherwise, the constructor assumes a baud rate of 9600 with 8N1 format, with both receive and transmit. If you’re happy with those settings, then you can call the constructor with no arguments! Convenient!

The code for setting up the UART is shown above. It’s mostly just bit manipulation and follows the datasheet. However, take a look at set_baud, which is a bit different. Here, F_CPU is the system clock, and needs to be defined as a macro. Then, UBRR_val is calculated using the equation provided in the datasheet, and then UBRR1 gets the value. Note that since UBRR1 is a 16 bit register, you have to write the high byte first, then the low byte.

Here’s the code to read and write a single byte (currently there is no code to read and write 9 bit data). send_byte simply waits until the write-only portion of UDR is ready to take data, and then a write is performed. read_byte is more complex. I could write the code to wait until data is received, and then return that value, but then there’s the risk of waiting forever, which freezes the program. Instead, I wrote is_available, which checks to see if data has been received. If data has indeed been received, then the data is written to the provided reference, val, and the method returns true to indicate that the read was successful. If no data has been received, then the method returns false to indicate that val has not been updated, and that no data has been received.

This method, instead of sending a single byte, transmits a null-terminated string. In other words, the method sends consecutive bytes, starting from the provided pointer, and stops when a null terminator (‘\0’ or 0x00 in binary) is found. However, take a note of the first line of the method; it checks to the global variable tx_busy. I’ll explain why this is done, and the significance of this variable next.


Transmitting a string can take a long time. The string can be pretty long, and even if it isn’t, then a low baud rate will cause send_string to take a long time to execute. Since send_string halts the CPU until it is complete, send_string may be a time consuming method to call. So what to do? The answer is to use interrupts! In send_string_int, if tx_busy is true, then the method returns a false to indicate the method fails. However, if tx_busy is false, then tx_busy is set to true. Then, the global variable tx_char_ptr is set to point to the provided pointer, which is the start of the string to be sent. Lastly, the UART Data Register Ready interrupt is enabled. This means that an interrupt occurs when UDR is ready to receive more data to transmit.
The second picture above shows the interrupt service routine. When UDR is ready to receive more data, the ISR loads the byte located at tx_char_ptr into UDR, which sends that one byte to the transmitter. Then, the pointer is incremented so that the next time the ISR is called, the next byte is sent. However, if the byte to be sent is a null character, which indicates the end of the string, then the ISR disables its own interrupt, and then tx_busy is set to false. At this point, the string has finished sending, and send_string_int can be called again.
This approach has its advantages and disadvantages. The advantage is that it doesn’t block the CPU from executing code. send_string’s while loops will cause the CPU to do nothing but wait for the transmission to finish. send_string_int, meanwhile, just sets up some global variables and registers, then returns. The ISR, which only runs when the transmitter is ready for more data, is fast and intermittent. Between executions of the ISR, the CPU can continue executing other pieces of code. Therefore, send_string_int is non-blocking, and allows the CPU to work on other stuff while also sending the string.
The disadvantage, however, is that the code becomes more complex. With send_string, you can be confidant that the string has finished sending when the method returns. With send_string_int, you can’t be so sure since the method returns when the set-up is complete, not when the string has finished sending. So here’s a question: what happens if I try to send a string while another string is still being sent? With send_string, you can have two send_string back to back no problem since the second send_string will only start executing when the first string has finished sending. This is not the case with send_string_int; if not coded properly, then calling two send_string_int back to back would cause the second send_string_int to interrupt the transmission of the first string, since the second send_string_int would be called before the first string has finished transmitting. This is why the global variable tx_busy is important. The first send_string_int would set tx_busy to true. tx_busy is only set to false when the string has finished sending, inside the ISR. Therefore, when the second send_string_int is called, it’ll see that the transmitter is busy with the first string, and not interrupt it. This is why tx_busy is also checked in send_string; though send_string cannot be interrupted, it can still interrupt the transmission of another string.
By the way, the only reason tx_char_ptr and tx_busy are global variables are because that’s the only way to share data with an interrupt service routine. An ISR cannot have any arguments or return value. This is a key instance where programming for embedded systems differs from conventional programming: global variables are almost always a necessary evil when ISRs are used.


Let’s return to RX_Buffer that we saw in the constructor. Firstly, what is it? The first image above shows RX_Buffer declaration: it is a ring buffer of size UART_RX_BUFFER_SIZE, containing data type char. UART_RX_BUFFER_SIZE is a macro, which in this case is 20. RX_Buffer is allocated in the initialization list of the constructor. Then, in the constructor, the global variable RX_Buffer_Ptr is set up to point to RX_Buffer. The reason RX_Buffer_Ptr is used is because I want an ISR to have access to RX_Buffer, but since it’s a member of the HAL_UART class, I can’t make it a global variable. Therefore, I did the next best thing: a global variable that points to RX_Buffer. Since the constructor sets up RX_Buffer_Ptr, the ISR I write later can have access to RX_Buffer by dereferencing RX_Buffer_Ptr.


Now that RX_Buffer and RX_Buffer_Ptr are all set up, let’s see how they’re used. Because we put in all the effort to create a ring buffer class previously, an interrupt based receiver is really easy to code! First is the ISR. This ISR fires when the receiver has received data (as long as the interrupt is enabled). When data is received, that data is pushed into the ring buffer. The ISR is so simple because the logistics of data management are hidden by the ring buffer class; loose coupling at its finest!
The ISR is responsible for loading up the ring buffer with data. Conversely, read_rx_buffer is responsible for reading data out of ring buffer. When this method is called, the provided buffer is filled up with characters that had been saved in the ring buffer up to that point. There’s a couple of nuances here:
- In order to prevent an overflow, the output buffer size must be specified. Let’s say that the buffer size is 10.
- The buffer will be null terminated. That means at most the buffer will have buffer_size-1 valid characters, so in this case 9.
- The buffer will be filled as much as possible. This means the buffer will be filled with characters until either the ring buffer is empty, or the limit has been reached (9).
- The method returns the number of characters copied from the ring buffer to the provided buffer. This is useful because the return value can be used as the condition of a while loop or an if statement; if no characters have been received, then the ring buffer will be empty, and the method will return a zero, so the while loop or if statement will not execute.
This is why the ring buffer is so important: it makes the ISR (and read_rx_buffer) extremely simple by hiding how characters are saved and retrieved.
The only caveat to using read_rx_buffer is the receive complete interrupt (and global interrupt) must be enabled. Otherwise, the ISR will not execute, and the ring buffer will not be loaded with data.

One last thing: C++ allows functions to be overloaded. This means you can have multiple implementations for the same method, as long as the argument type(s) or the number of arguments varies. The send_string shown before is good for sending strings like “Hello World”. However, if I have an integer foo, then sending it is pretty tricky because foo is not a string. So, to send foo, I would have to convert the integer to a string, and then feed it to send_string as an argument. This would be true for floats and other data types as well. Since doing that every time you want to send something is a pain in the butt, I wrote send_string methods to do that for me. This makes debugging much easier. I noticed I almost always have the same format when debugging: a header, the value of a variable, and then a footer. For example, “foo: ” + (value of foo) + “\r\n”. By overloading send_string, it makes debugging much, much easier. An example of using these overloaded methods is shown below:

Now, we can send and receive data from microcontrollers! Next time, I’ll talk about SPI!