If you have any questions or feedback, pleasefill out this form
This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.
The Raspberry Pi Foundation launched the microcontroller Raspberry Pi Pico in early 2021, which at the time felt quite novel since there hadn't been a production line like this from Raspberry Pi before. After purchasing one, I initially set it aside without further exploration.
Upon closer inspection, I realized that the specifications are actually quite impressive:
Raspberry Pi Pico | Specification | Arduino Nano |
---|---|---|
ARM M0+ dual core | MCU | ATmega328p |
133MHz | Clock Speed | 16MHz |
32-bit | Bit Width | 8-bit |
264kB | SRAM | 2.5kB |
2MB (RP2040 itself has no flash) | Flash | 32kB |
UART x2 USB Host 1.1 SPI Timer RTC | Peripherals | UART x1 SPI Timer |
With dual-core, 32-bit architecture, 264kB of memory, and 2MB of flash memory, it far outperforms the Arduino Nano, and even beats some STM32 models. The price is also quite affordable, available for just over 100 TWD. In addition to C/C++, it supports MicroPython and even provides debugging features that allow you to set breakpoints and observe memory or variable states through GDB.
Another feature that amazed me is PIO, or Programmable I/O. But before diving deeper into that, let’s first discuss what GPIO is and the problems that GPIO and PIO aim to solve!
What is GPIO?
GPIO stands for General Purpose Input/Output, which typically controls the output or input of pins in a microcontroller. You can control the output of a specific pin to be high or low through programming.
A simple example can be illustrated with an LED. Suppose you want to implement an LED blinking function. You can connect one leg of the LED to ground and the other to a GPIO pin. By controlling the output voltage level through programming, you can achieve the blinking effect. Besides controlling LED blinking, GPIO is also used for data transmission, such as I2C or UART.
Peripherals
To enable microcontrollers to communicate with external devices, they usually have built-in support for common communication protocols. For example, Arduino supports UART; if you use a Pro Micro, the built-in AVR chip ATmega32U4 even has USB functionality.
However, if a microcontroller doesn't have these communication protocol features built-in, developers need to purchase corresponding ICs to implement them or use GPIO pins to implement communication protocols on their own. This concept is somewhat similar to the difference between hardware decoding and software decoding.
For instance, in Arduino, we can use SoftwareSerial to implement the UART protocol at the software level, as demonstrated in my Arduino CO2 sensor project from last year1(https://blog.kalan.dev/2020-07-24-arduino-esp32-co2-sensor-2/):
// https://github.com/kjj6198/MH-Z14A-arduino/blob/master/co2.ino#L14
...
SoftwareSerial co2Serial(3, 4); // RX, TX
co2Serial.write(commands, 9); // send command
co2Serial.readBytes(response, 9);
The implementation of SoftwareSerial utilizes GPIO pins to perform the UART protocol. The advantage of using SoftwareSerial is that it allows Arduino's native UART to communicate easily with a computer for debugging, while enabling Arduino to communicate with other external devices through software serial.
Data Transmission Relies on Precise Timing
In hardware data transmission, timing is crucial, sometimes even requiring precise calculations of CPU cycles to avoid errors. In the implementation of SoftwareSerial, the timing calculations can be seen in the implementation:
void SoftwareSerial::begin(long speed)
{
// Details omitted for brevity
uint16_t bit_delay = (F_CPU / speed) / 4;
// Other timing calculations based on GCC versions
...
}
While the code is not extensive, it demonstrates how important timing is for data transmission, as it even factors in the number of CPU cycles needed for different GCC versions. Although hardware timers and interrupts could also be used for implementation, the number of hardware timers is limited.
Bit Banging
Implementing data communication protocols through code is convenient, but the downside is that this communication can be very resource-intensive for the processor. As the communication frequency increases, the processor must expend more resources on timing calculations. Therefore, if precise timing output is required or to avoid overloading the processor with communication protocol tasks, PIO can be utilized to assist.
PIO (Programmable GPIO)
Introduction
As mentioned earlier, the issue is that the timing required by communication protocols consumes processor resources. PIO can meet these requirements without consuming processor resources, operating at the same frequency as the processor (133MHz). We can think of PIO as a small processor within GPIO that doesn't occupy the main processor's resources, specifically designed for GPIO usage, and can communicate with the main processor via FIFO and IRQ.
Each RP2040 contains two PIO blocks, and each block has four state machines. Each state machine can be reconfigured through programming, allowing for the implementation of different communication interfaces dynamically.
PIO provides a simplified assembly language with only 9 instructions and two registers, capable of executing a maximum of 32 instructions. Although it appears minimal, this functionality is sufficient to satisfy most communication protocol requirements.
(Images sourced from the RP2040 datasheet)
From the image, we can see that the four state machines share the same program code, and the instruction memory has four read ports, enabling each state machine to access the code simultaneously without blocking.
Introduction to State Machines
Each PIO block contains four state machines that share the same program memory. However, each state machine can be configured for different GPIO pins. For example, if you implement UART, the four state machines allow you to set up to four completely independent UARTs.
A State Machine consists of the following components:
- OSR (Output Shift Register): 32-bit, can receive data from the main processor via FIFO
- ISR (Input Shift Register): 32-bit, can send data back to the main processor via FIFO
- X, Y Registers: Each state machine has two general-purpose registers
- PC: Program Counter
- Clock Divider: Each state machine can operate at the processor's maximum frequency, which is too fast for most communication protocols, and the frequency can be adjusted using the clock divider (range from 1 to 65536).
- Code
IO Mapping
IO mapping is somewhat more complex than other microcontrollers, which might seem a bit convoluted at first. However, once understood, this design makes perfect sense. Each IO can have four states: input, output, set, and sideset.
- Input: Can read data from external sensors or devices (similar to
digitalRead
in Arduino) - Output: Can control the voltage high or low (similar to
digitalWrite
in Arduino) - Set: Can set the voltage high or low (similar to output but with some differences)
- Sideset: Can change the voltage or direction of other pins while executing an instruction
The set and sideset states may be the more challenging aspects to grasp, which we will discuss further. A single GPIO can simultaneously have multiple states; for example, you can configure a GPIO as both input and output at the same time.
The configuration method for each IO mapping can be achieved using a base pin and pin count. For instance, if you want to set GPIO0 and GPIO1 as SET, you can set the base pin to GPIO0 and count to 2. This indicates that each state’s pins will be consecutive, meaning there won’t be any OUTPUT pins like GPIO0, GPIO3, and GPIO5.
INPUT and OUTPUT can support a maximum of 32 pins, although there are only 30 pins on the Pico. Set and sideset can support a maximum of 5 pins.
In summary, IO mapping has several characteristics:
- The same pin can have multiple states simultaneously, such as being both set and output.
- Input and output can support a maximum of 32 pins; set and sideset can support a maximum of 5 pins.
- Pins must be consecutive, for example, GPIO0 to GPIO3.
IRQ (Interrupt Request)
You can trigger interrupts or synchronize states between state machines using IRQ flags.
Introduction to PIO Assembly Language
PIO offers a simple yet powerful assembly language with only 9 instructions:
- SET
- IN
- OUT
- PULL
- PUSH
- JMP
- WAIT
- MOV
- IRQ
The writing style is generally similar to that of standard assembly language, but there are several variables to remember in PIO assembly language:
- Pins: Represents the pins selected by this PIO. For example, if starting from GPIO0, then pin0 is GPIO0; if starting from GPIO2, then pin0 is GPIO2.
- Pindirs: Sets the direction of the pins. 0 for
input
, 1 foroutput
. - X, Y: Registers.
- OSR: Output Shift Register.
- ISR: Input Shift Register.
- Data: Can be immediate, up to 5 bits, which means 0 to 32.
With registers and jumps, this meets the basic requirements for Turing completeness; theoretically, you could use PIO for arithmetic operations, but PIO is not designed for that purpose and can be more of an experimental tool.
Wait Function (Delay)
To achieve precise timing control without wasting instruction memory, PIO provides a very useful feature. You can specify the number of cycles to delay by adding []
after executing an instruction. For example:
loop:
set pins, 1 [1] ; set requires 1 cycle, then wait another cycle
set pins, 0
jmp loop
This has the same effect as:
loop:
set pins, 1
nop
set pins, 0
jmp loop
This way, we can control the pins to be high for 2 cycles and low for 2 cycles without wasting instruction space on nop
to adjust the timing. The number in []
can range from 1 to 32.
Side-Set
You can simultaneously set the side-set pin states while executing instructions. You need to explicitly declare the number of side-set pins to be used in the program:
.side_set 1
This indicates that one pin will be used as a side-set. You can set up to 5 pins, and this can overlap with other mappings (like input or output).
This is very convenient for state transitions, such as UART, where the idle state maintains a high voltage, while the start bit is low. We can change the voltage while starting to pull data (example referenced from pico-examples/uart_tx.pio
):
.program uart_tx
.side_set 1
pull side 1 [7]
set x, 7 side 0 [7]
loop:
out pins, 1
jmp x-- bitloop [6]
In this example, each time pull
is called, the side-set pin will be set to 1, acting as the stop bit. When setting the x register, we can also use a side-set to set it to 0 as the start bit. This way, we avoid wasting an additional cycle and instruction on setting it, which is quite convenient.
Here, 8 cycles are used as the cycle count required for each bit, meaning that within one bit, you can execute up to 8 instructions. Even if there are remaining cycles, you can easily achieve precise timing using the delay function.
It's essential to note that when using side-set
, the available cycles for delay are reduced. For example:
.side_set 3 ; Set two pins as side_set
set x, 1 side 0 [3] ; Since 3 bits are allocated for side_set, the delay can only use up to 2 bits (5-3), which is 1-3.
SET
This instruction puts data into the destination. If the destination is pins, it sets the configured SET pins high.
set X, 30 ; Set X to 30
set pins, 1 ; Set SET pins to 1
set pins, 5 ; Set SET pins to 5 (which in binary is 0b101)
IN
This instruction shifts data in the source right or left by a specified bit count. For example:
in osr, 1
This means shifting the contents of OSR by 1 bit and placing it in ISR. Alternatively:
in pins, 4
This means reading 4 bits of data from the input pins (starting from the base pin) into ISR.
OUT
This instruction shifts data from OSR and places it in the destination. For example:
out pins, 2
This means taking 2 bits from OSR and sending them to the output pins.
PULL
This instruction reads 32 bits from the Tx FIFO into OSR. If no parameters follow pull
, it defaults to noblock
, meaning the program will continue waiting until there is data in Tx (it doesn't need to fill all 32 bits) before proceeding. For example:
loop:
pull ; Will only continue after data is available in tx, otherwise it will stall here
out pins ; Sends the data from OSR to output pins
jmp loop
Additionally, several parameters can be configured:
ifempty
: Executes the instruction only if OSR is empty; otherwise, it does nothing.block
: Continues waiting if Tx has no data (stalls).noblock
: Places the contents of register X into OSR, equivalent toMOV OSR, X
.
For protocols like UART, which continuously wait for a start bit when no data is being transmitted, using block
can easily achieve this effect.
PUSH
This instruction transfers data from ISR into the Rx FIFO and clears ISR to 0.
iffull
: Executes only when ISR is full; otherwise, does nothing.block
: Continues waiting when Rx is full.
JMP
This instruction jumps to a specified address when conditions are met. The PIO jmp
can set various conditions:
-
No condition: When no condition is specified, it defaults to
always
. -
!X
: Jumps whenX=0
. -
X--
: Jumps whenX=0
, automatically decrementingX
after each execution. -
!Y
: Jumps whenY=0
. -
Y--
: Jumps whenY=0
, automatically decrementingY
after each execution. -
X!=Y
: Jumps whenX!=Y
. -
PIN
: Decides to jump based on the voltage level of an input pin (requires setting with thesm_config_set_jmp_pin
function).- Jumps on high voltage.
- Does not jump on low voltage.
-
!OSRE
: Jumps when OSR is empty.
loop:
set x, 30 ; Set x
jmp x-- loop; Jumps to loop while x is not 0, decrementing x by 1
WAIT
This instruction waits until a condition is true. Several special parameters exist:
- GPIO: Selects the corresponding GPIO pin from the index, note this is independent of the state machine's IO mapping (absolute).
- PIN: Selects the corresponding input pin from the index (state machine's IO mapping).
- IRQ: Executes the next line only when the specified IRQ flag reaches the desired polarity.
wait 1 pin 0 ; Waits until input pin0 is 1 before executing the next line of code
wait 0 gpio 1 ; Waits until GPIO1 is 0 before executing the next line of code
MOV
This instruction copies data from the source to the destination. Two special syntax options help with operations:
!
or~
: Performs bitwise NOT during copying.::
: Copies after reversing the bits.
This design also aims to save instruction space.
mov X, Y ; Copies Y into X
mov pins, X ; Copies X into pins
mov pins, ::X ; Copies X into pins after reversing its bits
IRQ
This instruction sets or clears the IRQ flag, which can trigger interrupts based on the main program's configuration.
irq wait 1 rel
Available parameters include:
- wait: Continues waiting until the flag is cleared.
- nowait: Continues execution without waiting for the flag to clear.
- clear: Clears the IRQ flag.
If no parameters are specified, the default is nowait
, meaning it will not wait for the flag to clear before continuing execution. Adding rel
means using the last bit plus the current state machine's index and mod 4, allowing the main program to handle different state machines.
Specifying Code Execution Locations
When the state machine is not stopped, it will jump back to the beginning after executing its code. However, we can include specific labels to tell PIO from where to start and where to end execution:
set x, 8
.wrap_target
set pins, 1
set pins, 0
.wrap
By wrapping the code between .wrap_target
and .wrap
, we specify the segment of code that the state machine will repeat, thus avoiding the need for an additional jump.
Integrating Main Code with PIO (C/C++)
The pico-examples repository contains many PIO examples for reference. Here, we’ll consider a simple blink function as an example.2
First, we write the PIO code with a .pio
extension:
.program blink
.wrap_target
set pins, 1
nop [19]
nop [19]
set pins, 0
nop [19]
nop [19]
.wrap
This program is straightforward: it sets the pin output to 1, waits for 40 cycles, then sets it to 0, achieving the LED blinking effect. Next, we need to write the initialization code, which the official documentation recommends writing directly in PIO:
.program blink
.wrap_target
set pins, 1
nop [19]
nop [19]
set pins, 0
nop [19]
nop [19]
.wrap
% c-sdk {
void blink_program_init(PIO pio, uint sm, uint offset, uint pin, float div) {
pio_sm_config c = blink_program_get_default_config(offset);
pio_gpio_init(pio, pin);
pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true); // This line can be omitted; we only use set pin in this example
sm_config_set_set_pins(&c, pin, 1);
sm_config_set_clkdiv(&c, div);
pio_sm_init(pio, sm, offset, &c);
}
%}
This is wrapped in % c-sdk{ %}
.
- We get the configuration using
blink_program_get_default_config
(this function is auto-generated). pio_gpio_init(pio, pin)
sets the GPIO pin to be used with PIO.pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true)
sets the pin direction (false for input, true for output).sm_config_set_set_pins(&c, pin, 1)
sets the set pin.sm_config_set_clkdiv(&c, div)
sets the clock divider.pio_sm_init(pio, sm, offset, &c)
initializes the state machine.
In addition to set_set_pins
, there are also functions like set_sideset_pins
.
To utilize PIO functionality, you need to include hardware_pio
in target_link_libraries
:
cmake_minimum_required(VERSION 3.12)
include($ENV{PICO_SDK_PATH}/external/pico_sdk_import.cmake)
include($ENV{PICO_SDK_PATH}/tools/CMakeLists.txt)
project(pio C CXX ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
pico_sdk_init()
add_executable(${PROJECT_NAME}
main.c
)
pico_add_extra_outputs(${PROJECT_NAME})
pico_generate_pio_header(${PROJECT_NAME}
${CMAKE_CURRENT_LIST_DIR}/blink.pio
)
+ target_link_libraries(${PROJECT_NAME}
+ pico_stdlib
+ hardware_pio
+)
pico_enable_stdio_usb(pio 1)
pico_enable_stdio_uart(pio 1)
Next, in main.c
:
#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/pio.h"
#include "hardware/clocks.h"
#include "blink.pio.h" // Automatically generated header after compiling PIO
void blink(PIO pio, uint sm, uint offset, uint pin, uint freq);
int main()
{
stdio_init_all();
PIO pio = pio0;
uint offset = pio_add_program(pio, &blink_program);
blink(pio, 0, offset, 2, 2000);
while (true)
{
printf("test");
sleep_ms(200);
}
}
void blink(PIO pio, uint sm, uint offset, uint pin, uint freq)
{
float div = clock_get_hz(clk_sys) / freq;
blink_program_init(pio, sm, offset, pin, div); // Function declared within PIO
pio_sm_set_enabled(pio, sm, true);
}
To obtain clock information and perform PIO-related operations, include hardware/pio.h
and hardware/clocks.h
. After compiling and uploading the code, you will see the LED blinking, and the serial output will continuously print the test string, indicating that the two functionalities operate completely independently!
Conclusion
This article primarily provides an introductory understanding of PIO and its syntax. In the next piece, I will attempt to use PIO to implement common communication protocols like UART, or communicate with the DHT11 temperature sensor, deepening my understanding of PIO. PIO is a relatively novel concept for me, and as far as I know, this is the first time a microcontroller has been designed this way.
This design not only prevents the processor from expending too many resources on communication protocols but also frees developers from hardware limitations, allowing them to implement desired functionalities directly through PIO. Therefore, I am very excited about the potential applications that PIO can bring. Additionally, the official site sells the RP2040 separately, and you can even refer to their documentation to create your own board3.
The official documentation4 is quite detailed, and I highly recommend reading it to understand the design rationale. If you have any questions regarding the SDK's functions, you can check here.
Footnotes
-
https://blog.kalan.dev/2020-07-24-arduino-esp32-co2-sensor-2/ ↩
-
For setting up the development environment, refer to https://www.raspberrypi.com/documentation/microcontrollers/raspberry-pi-pico.html ↩
-
https://datasheets.raspberrypi.com/rp2040/hardware-design-with-rp2040.pdf ↩
-
https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf ↩
If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨
☕Buy me a coffee