I suddenly thought of recording interesting topics, especially those that can be visualized. If I have extra energy, I will write notes for each topic every day. Anyway, I find the problem of Advent Of Code Day10 quite interesting, so I'll record it first.
Part1
The first part of the problem is relatively monotonous. It requires implementing a CPU that can only execute two instructions: noop
and addx
. This CPU has an X register and calculates the value of the register at a specific cycle. By following the instructions and calculating based on the input, the answer should be obtained quickly. The problem description actually involves some hardware-related knowledge, which is not relevant to solving the problem, but I find it interesting. We'll discuss it in more detail later.
class CPU {
x: number;
cycles: number;
constructor() {
this.x = 1;
this.cycles = 0;
}
tick() {
this.cycles += 1;
}
noop() {
this.tick();
}
addx(arg) {
this.tick();
this.tick();
this.x += parseInt(arg);
}
}
Part2
The second part is more interesting. The register in this case represents the position of a sprite, and the cycle number represents the current position on the screen. The problem description provides a link - Racing the Beam. After adding the code for screen drawing logic, it should be complete.
The logic for determining whether to draw #
can also be done using bitwise operations. For example, using 0x111000 >> register
to offset represents the position of the sprite, and then performing an AND operation with the current cycle will give the character on the current screen. In my case, I took a simpler approach and compared them one by one:
class CPU {
// ...
draw() {
const column = this.cycles % 40;
if (column === 0 && this.cycles !== 0) {
this.screen += "\n";
}
if (this.x - 1 === column || this.x === column || this.x + 1 === column) {
this.screen += "#";
return "#";
} else {
this.screen += ".";
return ".";
}
}
}
I created a simple visualization where you can input instructions in a text box and press Execute to draw according to the instructions. (Note: Only addx
and noop
are supported.)
After finishing it, I realized that while turning instructions into a screen image is straightforward, the reverse process is much more challenging. That is, given an image, generating the corresponding instructions. To achieve this, you need to calculate the cycle number while moving the position for generating the next pixel, and it should not be too early or too late. I looked at the discussion on Reddit and found someone with the same idea, who wrote Python code to generate the corresponding instructions from an ASCII image.
Atari 2600
Regarding the Atari 2600 mentioned in the problem, early game consoles did not have additional video frame buffers to use, and the CPU's memory was too small (only 128KB) to fill the entire screen. Therefore, it was necessary to reuse memory (RAM) as much as possible.
In programs that require graphical calculations, a frame buffer is usually used to share the workload with the CPU. The CPU only needs to put the corresponding pixel into the frame buffer, and the GPU will handle the rest, such as drawing at the right time or encoding/decoding into the appropriate output format. However, in the absence of a frame buffer, the image must be computed in real-time, and the engineer can only move data within the limited memory.
The problem lies in the fact that running instructions on the CPU also takes time (cycles). If memory movement occurs too late or too early, it can result in dropped frames. Therefore, memory must be moved at the correct time. This is similar to racing the electron beam of a CRT monitor, which is why it is called "Racing the Beam."
Understanding these limitations and then looking back at the sprites on the Atari (such as Pac-Man), it is truly a miracle to create a game with such limited memory and a slow CPU. Fortunately, I am not an engineer from that era, or I might have become unemployed.
Looking at the Wikipedia page, the Atari 2600 was developed using the MOS 6507, which was specially modified by MOS Technology for the development of the Atari game console. It was half the price of the 6502.
Two years ago, after finishing reading the story of Nintendo, I became curious and bought a MOS 6502. However, the 6502 does not have built-in ROM, and as far as I understand, the 6502 requires Parallel ROM to read data. Nowadays, it is almost only possible to buy Serial EEPROM, so my plan to play with the 6502 has been put on hold. After reading this story, I feel motivated again. (But I still don't know where to buy Parallel EEPROM.)
Clock Circuit
To ensure the proper operation of the CPU, an oscillator is usually used to generate a fixed frequency. But why do we need this oscillator? It wasn't until I played Turing Complete that I truly understood this.
Inside the CPU, there are many circuits, some of which are simple, like adders and logic gates, whose output depends solely on the current input. In terms of functions, they can be represented as . In digital logic, these circuits are referred to as Combinatorial.
There is another type of circuit that has memory, such as registers, which are composed of flip-flops. These circuits have memory and their output depends not only on the input but also on the current state. In digital logic, these circuits are referred to as Sequential. Having memory is one of the necessary conditions for achieving Turing completeness, although it may seem obvious.
Returning to the need for an oscillator, its main purpose is to ensure that the state changes mentioned earlier occur at the same time. To achieve this, a unified clock source is required. For example, suppose I want to execute an instruction that adds the values of registers r1 and r2 and stores the result in register r1.
add r1, r2
In actual circuits, the values of r1 and r2 are simultaneously fetched and sent to the adder circuit, and after obtaining the result, it is written back to r1, rather than executing in sequence:
- Fetch the value of r1
- Fetch the value of r2
- Send them to the adder circuit
- Obtain the result and write it to r1
To ensure that the adder circuit receives the correct inputs, a unified clock trigger signal is required to ensure that the changes in the registers occur consistently.
However, I should also emphasize that except for low-power or embedded CPUs, modern CPUs (including Intel and AMD) have extremely complex architectures. They have done many magical things to maximize CPU performance, unlike the simplicity described earlier. For example:
- Instruction Pipeline: CPU instruction execution can be divided into four stages: Fetch, Decode, Execution, and Write Back. Under certain conditions, we can fetch the next instruction before the write back stage.
- Out of order execution: Instructions can be executed in a random order based on the circuit state, rather than in the order specified by the instructions.
- Branch prediction: Guessing which branch a section of code will take based on certain techniques.
Execution Cycle
For applications that require precise timing, the execution cycle is crucial because incorrect timing can lead to completely different results. You can find more details on this matter in two of my previous articles.
- Building an Air Quality Monitoring Application with Arduino and ESP32 (Part 2) - Data Communication with UART
- Exploring Raspberry Pi Pico PIO
In general, applications (or communication protocols) that require precise timing control have dedicated hardware to handle this. However, sometimes it may be due to hardware limitations (such as limited UART) or the desired protocol not being supported by the available hardware. In such cases, implementing it using the CPU can lead to the problem of bit banging, which burdens the CPU and affects performance.