Raspberry Pi pico PIO 初探

💡

如果想問問題或單純回饋的話可以填寫表單唷

Raspberry Pi pico	規格	Arduino nano
ARM M0+ dual core	MCU	ATmega328p
133MHz	主頻率	16MHz
32-bit	bit	8-bit
264kB	SRAM	2.5kB
2MB （RP2040 本身無 flash）	Flash	32kB
UARTx2 USB host 1.1 SPI Timer RTC	外設	UARTx1 SPI Timer

有雙核心、32bit、記憶體 264kB、2MB 的 flash memory，對比 arduino nano 實在好太多，甚至贏過部分的 STM32。價格也相當便宜，100 多台幣就能入手。除了 C/C++ 之外還支援 MircoPython，甚至提供 debug 功能可以設定斷點透過 gdb 觀察記憶體或變數狀態。

另外一個讓我驚豔的功能是 PIO，Programmable I/O。不過在繼續介紹之前，我們先來談談 GPIO 以及 PIO 嘗試解決的問題是什麼吧！

GPIO 是什麼？

GPIO 全文為 General purpose input/output，在微控制器當中通常具有控制引腳輸出或輸入的功能，可以透過程式控制某一腳位的輸出為高電位或低電位。

一個最簡單的例子可以用 LED 燈來舉例，假設今天想要實作 LED 閃爍功能，我們可以將 LED 的一個接腳接地之後，另一接腳接到 GPIO 腳位，並透過程式控制輸出的電位高低，這樣就可以做到閃爍效果。除了控制 LED 閃爍之外，GPIO 也會被用來當作資料傳輸使用，例如 I2C 或是 UART。

Peripherals

為了讓微控制器能夠與外部設備溝通，通常微控制器裡頭也會內建一些常見的傳輸協定，例如 Arduino 就有支援 UART；如果使用 Pro Micro 的話，裡頭的 AVR chip ATmega32U4 還有內建 USB 功能可以直接使用。

但是缺點是，如果微控制器沒有內建這些傳輸協定功能，開發者就需要自行購買對應的 IC 來實作，不然就是透過 GPIO 引腳自行實作通訊協定。這個概念有點像是硬體解碼跟軟體解碼的差別。

舉例來說，在 Arduino 當中我們可以使用 SoftwareSerial 在軟體層做到 UART 協定，我在去年撰寫的 Arduino 二氧化碳感測器實作¹(https://blog.kalan.dev/2020-07-24-arduino-esp32-co2-sensor-2/)當中就有使用到：

// https://github.com/kjj6198/MH-Z14A-arduino/blob/master/co2.ino#L14
...
SoftwareSerial co2Serial(3, 4); // RX, TX
co2Serial.write(commands, 9); // send command
co2Serial.readBytes(response, 9);

SoftwareSerial 的實作背後就是使用 GPIO 引腳來實作 UART 協定。使用 SoftwareSerial 的好處在於可以讓Arduino 原生的 UART 與電腦溝通方便 debug，並透過 software serial 讓 arduino 與其他外部設備溝通。

Arduino UART

資料傳輸仰賴精確的時間控制

在硬體的資料傳輸中相當仰賴 timing，甚至需要精準到算 CPU 的 cycle 才能避免錯誤。在 SoftwareSerial 的實作中：

void SoftwareSerial::begin(long speed)
{
  // 略
  // Precalculate the various delays, in number of 4-cycle delays
  uint16_t bit_delay = (F_CPU / speed) / 4;
  // 12 (gcc 4.8.2) or 13 (gcc 4.3.2) cycles from start bit to first bit,
  // 15 (gcc 4.8.2) or 16 (gcc 4.3.2) cycles between bits,
  // 12 (gcc 4.8.2) or 14 (gcc 4.3.2) cycles from last bit to stop bit
  // These are all close enough to just use 15 cycles, since the inter-bit
  // timings are the most critical (deviations stack 8 times)
  _tx_delay = subtract_cap(bit_delay, 15 / 4);
  // Only setup rx when we have a valid PCINT for this pin
  if (digitalPinToPCICR((int8_t)_receivePin)) {
    #if GCC_VERSION > 40800
    // Timings counted from gcc 4.8.2 output. This works up to 115200 on
    // 16Mhz and 57600 on 8Mhz.
    //
    // When the start bit occurs, there are 3 or 4 cycles before the
    // interrupt flag is set, 4 cycles before the PC is set to the right
    // interrupt vector address and the old PC is pushed on the stack,
    // and then 75 cycles of instructions (including the RJMP in the
    // ISR vector table) until the first delay. After the delay, there
    // are 17 more cycles until the pin value is read (excluding the
    // delay in the loop).
    // We want to have a total delay of 1.5 bit time. Inside the loop,
    // we already wait for 1 bit time - 23 cycles, so here we wait for
    // 0.5 bit time - (71 + 18 - 22) cycles.
    _rx_delay_centering = subtract_cap(bit_delay / 2, (4 + 4 + 75 + 17 - 23) / 4);
    // There are 23 cycles in each loop iteration (excluding the delay)
    _rx_delay_intrabit = subtract_cap(bit_delay, 23 / 4);
    // There are 37 cycles from the last bit read to the start of
    // stopbit delay and 11 cycles from the delay until the interrupt
    // mask is enabled again (which _must_ happen during the stopbit).
    // This delay aims at 3/4 of a bit time, meaning the end of the
    // delay will be at 1/4th of the stopbit. This allows some extra
    // time for ISR cleanup, which makes 115200 baud at 16Mhz work more
    // reliably
    _rx_delay_stopbit = subtract_cap(bit_delay * 3 / 4, (37 + 11) / 4);
    #else // Timings counted from gcc 4.3.2 output
    // Note that this code is a _lot_ slower, mostly due to bad register
    // allocation choices of gcc. This works up to 57600 on 16Mhz and
    // 38400 on 8Mhz.
    _rx_delay_centering = subtract_cap(bit_delay / 2, (4 + 4 + 97 + 29 - 11) / 4);
    _rx_delay_intrabit = subtract_cap(bit_delay, 11 / 4);
    _rx_delay_stopbit = subtract_cap(bit_delay * 3 / 4, (44 + 17) / 4);
    #endif
    ...
    tunedDelay(_tx_delay); // if we were low this establishes the end
  }
  ...
}

程式碼不多，但為了計算正確的 timing，甚至還計算了每個 gcc 版本會需要花上的 CPU cycle 數，扣掉之後才做 delay，可見 timing 對資料傳輸的重要性。雖然也可以改用 timer 以及中斷機制來實作，然而硬體的 timer 數量也是有限的。

Bit Banging

能夠用程式碼實作出資料通訊協定很方便，但壞處在於這樣的溝通非常吃處理器的資源，當溝通頻率越高，處理器就要花更多資源在處理 timing 的計算上。因此如果需要精確時間的輸出，或是要避免處理器耗費太多資源在通訊協定上，就可以使用 PIO 來幫助達成。

PIO（Programmable GPIO）

簡介

我們剛剛有提到，問題出在於通訊協定所要求的 timing 需要耗費處理器的資源，PIO 能夠在不消耗處理器資源的前提下用最高與處理器同樣的頻率（133MHz）達成要求。我們可以將 PIO 想像成在 GPIO 當中又有一個小處理器，這個小處理器不會佔用主處理器的資源，專門設計給 GPIO 使用，同時又可以搭配 FIFO 跟 IRQ 與主處理器溝通。

一個 RP2040 裡頭有兩個 PIO blocks，一個 block 裡頭有 4 個 state machine。每個 state machine 都可以透過程式重新設定，在動態時期實作不同的通訊介面。

PIO 提供了一個簡易版的組合語言，總共只有 9 個指令、兩個暫存器，最多只能執行 32 個 instruction。雖然看起來很精簡，但這樣子的功能已經可以滿足大部分的通訊協定需求。

PIO 架構圖

（圖片取自 RP2040 資料表）

從這個圖片可以看出四個 State machine 會共享同一份程式碼，而且 instruction memory 具有四個 read ports，所以每個 state machine 都可以同時存取程式碼而不會造成 blocking。

State Machine 介紹

每一個 PIO block 裡頭都會有四個 state machine，會共享同一個 program memory，不過每個 state machine 都可以針對不同的 GPIO 腳位作設定，例如今天實作了 UART，4 個 state machine 可以讓我們設定最多四個完全獨立的 UART。

State Machine 由以下幾個部分構成：

OSR（Output shift register）：32bit，可以從主處理器當中透過 FIFO 傳入資料
ISR（Input Shift Register）：32bit，可以將資料透過 FIFO 傳給主處理器
X、Y 暫存器：每個 state machine 有兩個通用暫存器
PC：program counter
clock divider：每個 state machine 最高可以到主處理器的頻率，對大部分的通訊協定來說太快了，可透過 clock divider 調整頻率。（範圍從 1 ~ 65536）
程式碼

arduino2.drawio

IO mapping

IO mapping 比其他微控制器來得複雜一些，剛開始會覺得有點繞，一旦理解了之後會覺得這樣設計相當有道理。每個 IO 可以有四個狀態：input、output、set、sideset。

input：可以讀取外部感測器、外部設備的資料（類似 arduino 中的 digitalRead）
ouptut：可以由程式控制電位高低（類似 arduino 中的 digitalWrite）
set：可以設定腳位的電位高低（跟 output 有點像，但有些差異）
sideset：可以在執行指令的同時改變其他腳位的電位或方向

其中 set 與 sideset 可能會是比較難理解的地方，這點我們等下會再深入討論。同一個 GPIO 可以同時有複數個狀態，例如我可以同時設定一個 GPIO 為 input，同時也設定為 output。

每個 IO mapping 的設定方式可以透過 base pin 以及 pin count 達成。例如我想要將 GPIO0、GPIO1 設為 SET，可以將 base pin 設為 GPIO0，count 為 2。從這邊可以知道每個狀態的腳位都會是連續的，也就是說不會有 OUTPUT 腳位是 GPIO0、GPIO3、GPIO5 的情況發生。

INPUT 與 OUTPUT 最多可以支援 32 個腳位，雖然在 pico 上只有 30 個腳位。set 與 sideset 最大只支援 5 個腳位。

pico-io-mapping

總結來說，IO mapping 有幾個特色：

同一個腳位可以同時具備複數個狀態，例如同時是 set 又是 output
input、output 最大可支援 32 個腳位；set 與 sideset 最多支援 5 個腳位
腳位必須連續，例如從 GPIO0 ~ GPIO3

IRQ（Interrupt Request）

可以透過 IRQ flags 來觸發 interrupt 或是同步 state machine 之間的狀態。

PIO 組合語言介紹

PIO 提供了簡單卻強大的組合語言使用，總共只有 9 個指令，分別為：

SET
IN
OUT
PULL
PUSH
JMP
WAIT
MOV
IRQ

基本上撰寫方式與一般組合語言相同，語法上就不多加介紹，不過在 PIO 組合語言當中有幾個變數需要先記起來：

pins：代表此 PIO 選取到的腳位。例如我從 GPIO0 開始，則 pin0 就是 GPIO0；如果從 GPIO2 開始，那麼 pin0 就是 GPIO2
pindirs：設定腳位的方向。0 為 input，1 為 output
X、Y：暫存器
osr：output shift register
isr：input shift register
data：可以 immediate 最多 5 bit，也就是 0 ~ 32

有暫存器也有 jmp，算是達成了圖靈完備的基本要件，理論上可以用 PIO 來做加減乘除運算，不過 PIO 的設計本來就不是拿來做運算的，可以當作實驗來玩。

等待功能（delay）

為了達到精準的 timing 控制，但又不需要浪費 instruction memory，PIO 提供了非常實用的功能，可以在執行指令時在後面加上 [] 指定要 delay 的週期數。例如：

loop:
  set pins, 1 [1] ; set 需要 1cycle，除此之外再等 1cycle
  set pins, 0
  jmp loop

效果等同於：

loop:
  set pins, 1
  nop
  set pins, 0
  jmp  loop

這樣子我們就能控制 pins 在高電位 2cycle，在低電位 2cycle，不需要塞 nop 浪費指令空間來調整週期。[] 裡的數字範圍一樣是從 1 ~ 32。

Side-Set

可以在執行指令的同時設定 side-set 腳位的電位。需要在程式當中顯式宣告要使用的腳位數量：

.side_set 1

代表會使用 1 個腳位當作 side-set。最多可以設定 5 個腳位，可以和其他 mapping（如 input、output）重疊。

這在做狀態轉換時相當方便，像是 UART 在 idle 狀態時會維持高電位，而 startbit 則是低電位。我們可以在開始拉資料的同時轉換電位：（範例參考自 pico-examples/uart_tx.pio）

.program uart_tx
.side_set 1
  pull side 1 [7]
  set x, 7 side 0 [7]
loop:
  out pins, 1
  jmp x-- bitloop [6]

在這個範例當中，每次呼叫 pull 的同時會將 side-set 腳位設定為 1，當作 stop bit。設定 x 暫存器時也可以用 side-set 直接設定為 0 當作 startbit。這樣一來就不需要再耗費一個 cycle 跟 instruction 去設定，相當方便。

這邊使用 8 個 cycle 當作每個 bit 需要的週期數，因此在一個 bit 裡最多可以執行 8 個指令，就算還有 cycles 數沒用到，也可以用 delay 功能輕鬆達成精準的 timing。

特別要注意的是使用 side-set 時會減少 delay 可以使用的週期數。例如：

.side_set 3 ; 將兩個腳位設定為 side_set 
set x, 1 side 0 [3] ; 因為 3bit 已經拿去給 side_set，delay 最多 2bit(5-3)，也就是 1~3

SET

將 data 放到 destination 當中，destination 如果是 pins 的話會放到設定為 SET 的腳位。

set X, 30 ; 設定 X 為 30 
set pins, 1 ; 設定 SET 腳位為 1
set pins, 5 ; 設定 SET 腳位為 5（以二進位來說會是 0b101）

IN

從 source 當中右移或左移 bit count 個 bit。例如：

in osr, 1

代表將 OSR 的內容 shift 1 bit 並放入 ISR 當中。或是：

in pins, 4

代表讀取 input pins（一樣，算法是從 base pin 開始讀取）4bit 資料後放入 ISR 當中。

OUT

將 OSR 的資料 shift 之後放到 destination 當中。例如：

out pins, 2

代表從 OSR 拿 2bit 之後給 output 腳位。

PULL

從 Tx FIFO 中讀取 32-bit 到 OSR。如果沒有 pull 後面沒有任何參數預設是 noblock，也就是程式會持續等待直到 Tx 有資料進來（不用填滿 32bit 也可以）才會繼續執行。例如：

loop:
  pull ; 等到 tx 有資料後才會繼續執行，沒有的話會一直停在這行
  out pins ; 將 OSR 的資料放到 output pins
  jmp loop

除此之外有幾個參數可以設定：

ifempty：當 OSR 空了之後才執行，否則不會做任何事情
block：當 tx 沒有任何資料時持續等待（stall）
noblock：把暫存器 X 的資料放入 OSR，效果等同於 MOV OSR, X

像 UART 這種協定在沒有資料傳輸時會不斷等待 startbit，接收到資料後才開始後續的處理，這時候使用 block 可以很容易達到這種效果。

PUSH

將 ISR 的資料放到 Rx FIFO 當中並清空 ISR 為 0。

iffull：當 ISR 滿了之後才執行，否則不做任何事情
block：在 Rx 資料滿載時持續等待

JMP

當條件成立時跳到指定位址。PIO 的 jmp 可以設定不同 condition：

No condition：condition 未指定時代表 always
!X：當 X=0 時跳轉
X--：當 X=0 時跳轉，每次執行後會自動將 X - 1
!Y：當 Y=0 時跳轉
Y--：當 Y=0 時跳轉，每次執行後會自動將 Y - 1
X!=Y：當 X!=Y 時跳轉
PIN：根據 input pin 的電位高低決定是否跳轉。（需要透過sm_config_set_jmp_pin函數設定）
- 高電位時跳轉
- 低電位時不跳轉
!OSRE：當 OSR 為空時跳轉

loop:
  set x, 30 ; 設定 x 
  jmp x-- loop; 當 x 不為 0 時跳到 loop，同時 x-1

WAIT

pico.008

持續等待直到 condition 為真。有幾個特別的參數

GPIO：從 index 選取對應的 GPIO 腳位，注意這與 state machine 的 IO mapping 無關（absolute）。
PIN：從 index 選取對應的 input 腳位（state machine 的 IO mapping）
IRQ：等到指定 index 的 IRQ flag 為 polarity 時才執行下一行

wait 1 pin 0 ; 等待 input pin0 為 1 後才執行下一行程式碼
wait 0 gpio 1 ; 等待 GPIO1 為 0 後才執行下一行程式碼

MOV

將資料從 source 複製到 destination。有兩個特別的語法幫助操作：

! 或是 ~：複製時做 bitwise NOT
::：將 bit 反向後複製

這樣的設計也是為了節省指令空間。

mov X, Y ; 將 Y 複製到 X
mov pins, X ; 將 X 複製到 pins
mov pins, ::X ; 將 X 的 bit 反轉後複製到 pins

IRQ

設定或清除 IRQ flag，IRQ flag 設定之後可以根據主程式的設定來決定要不要觸發 interrupt。

irq wait 1 rel

可以使用的參數有：

wait：會等到 flag 被清除之後才繼續執行
nowait：不等 flag 清除也會繼續執行
clear：清除 IRQ flag

如果沒有參數的話，預設會是 nowait，也就是不會等 flag 清除繼續向下執行。如果加上 rel 代表使用最後 1 個 bit + 當前 state machine 的 index 並 mod 4。這樣子可以讓主程式針對不同的 state machine 做處理。

指定程式碼執行位置

在沒有關閉 state machine 時，State machine 執行完程式碼後會跳到開頭重複執行，不過我們也可以加入特定的 label 告訴 PIO 要從哪裡開始執行，哪裡結束：

set x, 8
.wrap_target
  set pins, 1
  set pins, 0
.wrap

透過 .wrap_target 與 .wrap 包起來指定 state machine 重複執行的片段，這樣子可以少寫一個 jmp。

整合主程式碼與 PIO（C/C++）

pico-examples 裡面有很多 pio 的範例可供參考，這邊以簡單的 blink 功能當作範例。²

首先我們先撰寫 pio 程式碼，以 .pio 為副檔名：

.program blink
.wrap_target
  set pins, 1
	nop [19]
	nop [19]
	set pins, 0
	nop [19]
	nop [19]
.wrap

這個程式很簡單，使用 set 讓腳位輸出 1，並且等待 40 個 cycles 之後設定為 0，這樣子就能夠得到 LED 閃爍效果。接下來我們需要撰寫初始化的程式碼，官方文件中推薦直接在 pio 裡撰寫：

.program blink
.wrap_target
  set pins, 1
	nop [19]
	nop [19]
	set pins, 0
	nop [19]
	nop [19]
.wrap

% c-sdk {
void blink_program_init(PIO pio, uint sm, uint offset, uint pin, float div) {
   pio_sm_config c = blink_program_get_default_config(offset);
   pio_gpio_init(pio, pin);
   pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true); // 這一行不加也可以，本範例當中只有用到 set pin
   sm_config_set_set_pins(&c, pin, 1);
	 sm_config_set_clkdiv(&c, div);
   pio_sm_init(pio, sm, offset, &c);
}
%}

透過 % c-sdk{ %} 包起來。

透過 blink_program_get_default_config 拿到設定檔（這個函數是自動生成的）
pio_gpio_init(pio, pin) 設定要使用 PIO 的 GPIO 腳位
pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true) 設定腳位方向（false 為 input，true 為 output）
sm_config_set_set_pins(&c, pin, 1) 設定 set pin
sm_config_set_clkdiv(&c, div) 設定 divider
pio_sm_init(pio, sm, offset, &c) ：初始化 state machine

除了 set_set_pins 之外也有像是 set_sideset_pins。

為了使用 PIO 功能，需要另外在 target_link_libraries 當中加入 hardware_pio：

cmake_minimum_required(VERSION 3.12)
include($ENV{PICO_SDK_PATH}/external/pico_sdk_import.cmake)
include($ENV{PICO_SDK_PATH}/tools/CMakeLists.txt)

project(pio C CXX ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

pico_sdk_init()

add_executable(${PROJECT_NAME}
  main.c
)

pico_add_extra_outputs(${PROJECT_NAME})
pico_generate_pio_header(${PROJECT_NAME}
  ${CMAKE_CURRENT_LIST_DIR}/blink.pio
)

+ target_link_libraries(${PROJECT_NAME}
+	pico_stdlib
+	hardware_pio
+)

pico_enable_stdio_usb(pio 1)
pico_enable_stdio_uart(pio 1)

接下來在 main.c 當中：

#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/pio.h"
#include "hardware/clocks.h"
#include "blink.pio.h" // 編譯 pio 後自動產生 header 檔

void blink(PIO pio, uint sm, uint offset, uint pin, uint freq);

int main()
{
	stdio_init_all();
	PIO pio = pio0;
	
	uint offset = pio_add_program(pio, &blink_program);
	blink(pio, 0, offset, 2, 2000);
	while (true)
	{
		printf("test");
		sleep_ms(200);
	}
}

void blink(PIO pio, uint sm, uint offset, uint pin, uint freq)
{
	float div = clock_get_hz(clk_sys) / freq;
	blink_program_init(pio, sm, offset, pin, div); // 在 PIO 裡頭宣告的函數
	pio_sm_set_enabled(pio, sm, true);
}

為了拿到 clock 資訊與 pio 相關的操作，需要引入 hardware/pio.h 與 hardware/clocks.h。編譯並上傳程式碼之後，可以發現 LED 正在閃爍，而且在 serial 上也持續有 test 字串，代表兩個功能的確是完全分開互不影響的！

結語

這篇文章主要是對 PIO 有初步的認識跟語法介紹，下一篇會嘗試使用 PIO 來實作常見的通訊協定如 UART，或是使用 PIO 與 DHT11 溫度感測器溝通，加深對 PIO 的掌握度。PIO 對我來說是相當新穎的概念，據我所知也是第一次看到有微控制器是這樣設計的。

這樣的設計除了可以避免處理器在通訊協定上耗費太多資源，開發者也不必受限於硬體支援，可以直接透過 PIO 實現自己想要的功能，因此我非常期待 PIO 能夠帶來的應用。除此之外，官方也有單獨販賣 RP2040，甚至可以參考官方文件做出一個板子³。

官方文件⁴其實寫得相當詳細，推薦大家可以閱讀一下，可以清楚地感受到為什麼要這樣子設計。如果對 sdk 的函數有任何問題可以到這裡查詢。

← 如何用 WebGL 畫線用 AVR 做 USB 應用的考察 →

如果覺得這篇文章對你有幫助的話，可以考慮下面的連結請我喝一杯 ☕ 可以讓我平凡的一天變得閃閃發光 ✨

☕Buy me a coffee

Raspberry Pi pico PIO 初探

目錄

GPIO 是什麼？

Peripherals

資料傳輸仰賴精確的時間控制

Bit Banging

PIO（Programmable GPIO）

簡介

State Machine 介紹

IO mapping

IRQ（Interrupt Request）

PIO 組合語言介紹

等待功能（delay）

Side-Set

SET

IN

OUT

PULL

PUSH

JMP

WAIT

MOV

IRQ

指定程式碼執行位置

整合主程式碼與 PIO（C/C++）

結語

目錄

Raspberry Pi pico PIO 初探

目錄

GPIO 是什麼？

Peripherals

資料傳輸仰賴精確的時間控制

Bit Banging

PIO（Programmable GPIO）

簡介

State Machine 介紹

IO mapping

IRQ（Interrupt Request）

PIO 組合語言介紹

等待功能（delay）

Side-Set

SET

IN

OUT

PULL

PUSH

JMP

WAIT

MOV

IRQ

指定程式碼執行位置

整合主程式碼與 PIO（C/C++）

結語

Footnotes

目錄