Kalan's Blog

Kalan 頭像照片,在淡水拍攝,淺藍背景

四零二曜日電子報上線啦!訂閱訂起來

Software Engineer / Taiwanese / Life in Fukuoka
This blog supports RSS feed (all content), you can click RSS icon or setup through third-party service. If there are special styles such as code syntax in the technical article, it is still recommended to browse to the original website for the best experience.

Current Theme light

我會把一些不成文的筆記或是最近的生活雜感放在短筆記,如果有興趣的話可以來看看唷!

Please notice that currenly most of posts are translated by AI automatically and might contain lots of confusion. I'll gradually translate the post ASAP

2022 Advent Of Code: Cathode-Ray Tube

I suddenly thought of recording interesting topics, especially those that can be visualized. If I have extra energy, I will write notes for each topic every day. Anyway, I find the problem of Advent Of Code Day10 quite interesting, so I'll record it first.

Part1

The first part of the problem is relatively monotonous. It requires implementing a CPU that can only execute two instructions: noop and addx. This CPU has an X register and calculates the value of the register at a specific cycle. By following the instructions and calculating based on the input, the answer should be obtained quickly. The problem description actually involves some hardware-related knowledge, which is not relevant to solving the problem, but I find it interesting. We'll discuss it in more detail later.

class CPU {
  x: number;
  cycles: number;

  constructor() {
    this.x = 1;
    this.cycles = 0;
  }

  tick() {
    this.cycles += 1;
  }

  noop() {
    this.tick();
  }

  addx(arg) {
    this.tick();
    this.tick();
    this.x += parseInt(arg);
  }
}

Part2

The second part is more interesting. The register in this case represents the position of a sprite, and the cycle number represents the current position on the screen. The problem description provides a link - Racing the Beam. After adding the code for screen drawing logic, it should be complete.

The logic for determining whether to draw # can also be done using bitwise operations. For example, using 0x111000 >> register to offset represents the position of the sprite, and then performing an AND operation with the current cycle will give the character on the current screen. In my case, I took a simpler approach and compared them one by one:

class CPU {
  // ...

  draw() {
    const column = this.cycles % 40;
    if (column === 0 && this.cycles !== 0) {
      this.screen += "\n";
    }

    if (this.x - 1 === column || this.x === column || this.x + 1 === column) {
      this.screen += "#";
      return "#";
    } else {
      this.screen += ".";
      return ".";
    }
  }
}

I created a simple visualization where you can input instructions in a text box and press Execute to draw according to the instructions. (Note: Only addx and noop are supported.)

Advent Of Code Day 10 Simulation

After finishing it, I realized that while turning instructions into a screen image is straightforward, the reverse process is much more challenging. That is, given an image, generating the corresponding instructions. To achieve this, you need to calculate the cycle number while moving the position for generating the next pixel, and it should not be too early or too late. I looked at the discussion on Reddit and found someone with the same idea, who wrote Python code to generate the corresponding instructions from an ASCII image.

Atari 2600

Regarding the Atari 2600 mentioned in the problem, early game consoles did not have additional video frame buffers to use, and the CPU's memory was too small (only 128KB) to fill the entire screen. Therefore, it was necessary to reuse memory (RAM) as much as possible.

In programs that require graphical calculations, a frame buffer is usually used to share the workload with the CPU. The CPU only needs to put the corresponding pixel into the frame buffer, and the GPU will handle the rest, such as drawing at the right time or encoding/decoding into the appropriate output format. However, in the absence of a frame buffer, the image must be computed in real-time, and the engineer can only move data within the limited memory.

The problem lies in the fact that running instructions on the CPU also takes time (cycles). If memory movement occurs too late or too early, it can result in dropped frames. Therefore, memory must be moved at the correct time. This is similar to racing the electron beam of a CRT monitor, which is why it is called "Racing the Beam."

Understanding these limitations and then looking back at the sprites on the Atari (such as Pac-Man), it is truly a miracle to create a game with such limited memory and a slow CPU. Fortunately, I am not an engineer from that era, or I might have become unemployed.

Looking at the Wikipedia page, the Atari 2600 was developed using the MOS 6507, which was specially modified by MOS Technology for the development of the Atari game console. It was half the price of the 6502.

Two years ago, after finishing reading the story of Nintendo, I became curious and bought a MOS 6502. However, the 6502 does not have built-in ROM, and as far as I understand, the 6502 requires Parallel ROM to read data. Nowadays, it is almost only possible to buy Serial EEPROM, so my plan to play with the 6502 has been put on hold. After reading this story, I feel motivated again. (But I still don't know where to buy Parallel EEPROM.)

Clock Circuit

To ensure the proper operation of the CPU, an oscillator is usually used to generate a fixed frequency. But why do we need this oscillator? It wasn't until I played Turing Complete that I truly understood this.

Inside the CPU, there are many circuits, some of which are simple, like adders and logic gates, whose output depends solely on the current input. In terms of functions, they can be represented as f(x)=yf(x) = y. In digital logic, these circuits are referred to as Combinatorial.

There is another type of circuit that has memory, such as registers, which are composed of flip-flops. These circuits have memory and their output depends not only on the input but also on the current state. In digital logic, these circuits are referred to as Sequential. Having memory is one of the necessary conditions for achieving Turing completeness, although it may seem obvious.

Returning to the need for an oscillator, its main purpose is to ensure that the state changes mentioned earlier occur at the same time. To achieve this, a unified clock source is required. For example, suppose I want to execute an instruction that adds the values of registers r1 and r2 and stores the result in register r1.

add r1, r2

In actual circuits, the values of r1 and r2 are simultaneously fetched and sent to the adder circuit, and after obtaining the result, it is written back to r1, rather than executing in sequence:

  • Fetch the value of r1
  • Fetch the value of r2
  • Send them to the adder circuit
  • Obtain the result and write it to r1

To ensure that the adder circuit receives the correct inputs, a unified clock trigger signal is required to ensure that the changes in the registers occur consistently.

However, I should also emphasize that except for low-power or embedded CPUs, modern CPUs (including Intel and AMD) have extremely complex architectures. They have done many magical things to maximize CPU performance, unlike the simplicity described earlier. For example:

  1. Instruction Pipeline: CPU instruction execution can be divided into four stages: Fetch, Decode, Execution, and Write Back. Under certain conditions, we can fetch the next instruction before the write back stage.
  2. Out of order execution: Instructions can be executed in a random order based on the circuit state, rather than in the order specified by the instructions.
  3. Branch prediction: Guessing which branch a section of code will take based on certain techniques.

Execution Cycle

For applications that require precise timing, the execution cycle is crucial because incorrect timing can lead to completely different results. You can find more details on this matter in two of my previous articles.

In general, applications (or communication protocols) that require precise timing control have dedicated hardware to handle this. However, sometimes it may be due to hardware limitations (such as limited UART) or the desired protocol not being supported by the available hardware. In such cases, implementing it using the CPU can lead to the problem of bit banging, which burdens the CPU and affects performance.

Prev

HTML and CSS can solve many problems, but JavaScript is also very important.

Next

2022 Advent Of Code (day9) - Rope Bridge

If you found this article helpful, please consider buy me a drink ☕️ It'll make my ordinary day shine✨

Buy me a coffee