2022 Advent Of Code: Cathode-Ray Tube

Written byKalanKalan
💡

If you have any questions or feedback, pleasefill out this form

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

I suddenly thought about recording interesting topics, especially those that can be visualized. If I have some extra energy, I plan to write daily notes on these topics. Anyway, I find the problem from Advent Of Code Day 10 quite interesting, so I wanted to document it.

Part 1

The first part of the problem is relatively straightforward; it involves implementing a CPU that can only execute two instructions (noop and addx). This CPU has an X register, and the task is to calculate the value of the register at specific cycles. Following the instruction descriptions and computing based on the input should yield quick answers. The problem description contains several hardware-related concepts that, while not crucial for solving the problem, I find fascinating and will delve into later.

class CPU {
  x: number;
  cycles: number;

  constructor() {
    this.x = 1;
    this.cycles = 0;
  }

  tick() {
    this.cycles += 1;
  }

  noop() {
    this.tick();
  }

  addx(arg) {
    this.tick();
    this.tick();
    this.x += parseInt(arg);
  }
}

Part 2

The second part is much more interesting. This register actually represents the position of a sprite, and the cycle count indicates where the current screen rendering is at. The problem description includes a link – Racing the Beam. After adding the screen drawing logic to the code, it should work fine.

The logic to determine whether to draw a # can also be accomplished using bitwise operations. For example, using 0x111000 >> register to represent the sprite's position after shifting, then performing an AND operation with the current cycle gives the character for the current screen. I opted for a simpler approach by directly comparing values one by one:

class CPU {
  // ...

  draw() {
    const column = this.cycles % 40;
    if (column === 0 && this.cycles !== 0) {
      this.screen += "\n";
    }

    if (this.x - 1 === column || this.x === column || this.x + 1 === column) {
      this.screen += "#";
      return "#";
    } else {
      this.screen += ".";
      return ".";
    }
  }
}

Here, I've created a simple visualization. After entering commands in the text box and pressing Execute, the drawing will occur according to the commands. (Note: only addx and noop are supported.)

Advent Of Code Day 10 Simulation

After completing this, I realized that while turning commands into screen visuals is straightforward, doing the reverse is much more challenging. That is, given an image, generating the corresponding commands. Achieving this requires calculating the cycle count while moving to the next pixel generation position, and it must be done at the right time. I saw a discussion on Reddit and found that someone had the same idea, where they wrote a Python script that generates corresponding commands from ASCII images.

Atari 2600

As mentioned in the problem, the Atari 2600 was an early gaming console that did not have an additional video frame buffer to utilize, and the CPU's memory was too limited (only 128KB) to fill the entire screen. Therefore, memory (RAM) had to be reused as much as possible.

In applications that require drawing computations, frame buffers are typically used to offload work from the CPU, allowing it to send pixel data to the frame buffer, while the GPU handles the rest, such as drawing at the correct time or encoding/decoding to the corresponding output format. However, without frame buffer assistance, images must be calculated in real time, and engineers can only move data around within limited memory.

The challenge is that running CPU instructions also takes time (cycles), and if memory movement occurs too late or too early, it can lead to frame drops. Thus, memory must be moved at the correct timing. It's akin to racing against the CRT's electron beam, which is why the term Racing the Beam was coined.

After understanding these constraints, looking back at the sprites on the Atari (like Pac-Man), I find it miraculous that games could be developed under such limited memory and slow CPUs. Thankfully, I wasn't an engineer during that era; otherwise, I might have been out of a job.

According to Wikipedia, the Atari 2600 was developed using the MOS 6507, which was a modified version specifically for the Atari console, being half the cost of the 6502.

Two years ago, after finishing the story of Nintendo, I got curious and bought a MOS 6502. However, the 6502 doesn't have built-in ROM, and from what I understand, it requires Parallel ROM to read data. Nowadays, it's almost impossible to find Parallel EEPROM, so my plans to experiment with the 6502 have been on hold. After reading this story, I feel a bit motivated again. (But still, I don't know where to buy Parallel EEPROM.)

Clock Circuit

To ensure the CPU operates correctly, a fixed frequency oscillator is typically required. But why do we need this oscillator? It wasn't until I played Turing Complete that I truly understood this.

Inside a CPU, there are many circuits. Some are quite simple, like adders and logic gates, where the output solely depends on the current input, represented in function terms as f(x)=yf(x) = y. In digital logic, these are referred to as Combinatorial circuits.

Another type of circuit has memory, such as registers, which are composed of flip-flop circuits. These types of circuits have memory, meaning their output depends not only on the input but also on the current state. In digital logic, this is represented as Sequential circuits. The presence of memory might seem obvious, but it's one of the necessary conditions for achieving Turing completeness.

Returning to the question of why an oscillator is needed, it mainly ensures that the state changes mentioned above occur at the same moment, which requires a unified clock source. For instance, if I want to execute an instruction that adds the values of registers r1 and r2 and stores the result in r1:

add r1, r2

In an actual circuit, the values of r1 and r2 are simultaneously fetched and sent to the adder circuit, and the result is written back to r1, rather than executing in a sequential manner:

  • Fetch value from r1
  • Fetch value from r2
  • Send to adder circuit
  • Write result back to r1

To ensure that the inputs to the adder circuit are accurate, a unified clock triggering signal is needed to guarantee consistent changes in the registers.

However, it's worth emphasizing that besides low-power or embedded CPUs, modern CPUs (including Intel and AMD) have extremely complex architectures. To maximize CPU performance, many intricate techniques are employed that aren't as straightforward as previously discussed. For example:

  1. Instruction Pipeline: The execution of instructions can be divided into four main stages: Fetch, Decode, Execution, and Write Back. Under certain conditions, we can begin fetching the next instruction before the current one has been written back.
  2. Out of order execution: A sequence of instructions can be executed randomly by the CPU based on circuit states (not following the original order).
  3. Branch Predictor: A technique to intelligently guess which branch of instructions will be taken.

Execution Cycles

For applications that require timing, execution cycles are crucial, as incorrect timing can lead to entirely different outcomes. More details on this topic can be found in two previous articles I wrote.

Generally, applications (or transmission protocols) that require precise timing control have dedicated hardware to handle these tasks. However, sometimes hardware limitations (like those found in UART) or a lack of hardware support for the desired protocol can lead to challenges. In such cases, implementing through the CPU may encounter issues related to bit banging, causing excessive CPU load and impacting performance.

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

Buy me a coffee