數值穩定與誤差

Written byKalanKalan
💡

If you have any questions or feedback, pleasefill out this form

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

Numerical Stability and Errors

The equation (0.1 + 0.2) == 0.3 seems very straightforward. However, due to the way computers store floating-point numbers, this equation does not hold true in many programming languages, resulting in false.

But why does the equation (0.5 + 0.25) == 0.75 hold true?

When performing floating-point arithmetic, some degree of error is inevitable. Next, we’ll discuss the origins of these errors and what can be done to mitigate them in calculations.

How Floating-Point Numbers Are Stored in Computers

Floating-point numbers are represented in computers using bits, as specified by the IEEE754 standard. Single-precision floating-point numbers use 32 bits, while double-precision uses 64 bits. The representation is divided into a sign bit for indicating positive or negative values, a fractional part, and an exponent part.

  • Sign Bit (1 bit): 0 for positive; 1 for negative
  • Fractional Part
  • Exponent Part
TypeSizeExponent PartFractional Part
Single32 bit8 bit23 bit
Double64 bit11 bit52 bit

Taking single precision as an example, the exponent part uses 8 bits, allowing for a range of 0 to 255. However, to represent negative numbers, the IEEE754 standard requires an offset to be added, which determines the final number stored in the exponent part. For single precision, this offset is 127.

For example, the representation of -14.75 in floating-point would be:

  • Sign Bit: Negative, so it is 1
  • Exponent Part: First, convert 14.75 to binary 1110.11. After normalization, it becomes 1.11011 * 2^3, so the exponent is 3 plus 127, which equals 130, represented in binary as 10000010
  • Fractional Part: 11011, with remaining bits as 0

Thus, the floating-point representation of -14.75 is: 1 10000010 11011000000000000000000 000022

From this, we can understand where errors originate. Since the exponent is negative, the resulting numbers fall between 0 < x < 1. Therefore, if a number cannot be expressed as 2^-x, it can only be approximated closely, not exactly. The previously mentioned 0.5 + 0.25 works without error because these numbers can be expressed as 2^-1 + 2^-2.

Why Store This Way?

Using scientific notation helps us manage numbers of varying scales while maintaining a certain level of precision. For instance, 0.00000012345 and 1234567890000 can be expressed as 1.2345E-7 and 1.23456789E12, respectively. In computers, numbers are usually stored in floating-point format, similar to scientific notation, but with decimal representation switched to binary.

Real numbers are infinite, but computer storage is limited, so errors are unavoidable regardless of how we store numbers. What we can do is find a trade-off between precision and the range of representable numbers.

Significant Figures

Significant figures help us gauge precision. Numbers like 0.001 or 0.0135 can be written as 1×1031 \times 10^{-3} and 1.35×1021.35\times10^{-2}. Here, 0.001 has 1 significant figure, while 1.35 has 3. The more significant figures, the higher the precision.

A simple rule for determining significant figures from Wikipedia states:

  • All non-zero digits are significant
  • Zeros between non-zero digits are significant
  • Leading zeros are always insignificant
  • For numbers requiring a decimal point, trailing zeros (zeros after the last non-zero digit) are significant
  • For numbers not requiring a decimal point, trailing zeros may or may not be significant, depending on additional notation or error messages.

Cancellation of Significant Digits

When subtracting two floating-point numbers with very similar absolute values, the majority of digits may cancel out, leaving many zeros, which reduces the number of significant figures. This phenomenon is known as cancellation of significant digits.

For example: (1.234567890 - 1.234567889) results in 0.000000001, but due to insufficient precision, it could be rounded to 0.

This is something to be particularly mindful of in numerical computations. For example, consider the double angle and half angle formulas:

sin2(θ2)=1cosθ2\sin^2(\frac{\theta}{2}) = \frac{1-\cos\theta}{2}

When the angle is small, the cosine value is very close to 1, making subtraction from 1 prone to significant figure loss. For instance, with an angle of 1 degree, using the double angle formula (with 6 significant figures):

1cos1°2=10.9998472=0.0000765=7.65×105\frac{1-\cos1\degree}{2} = \frac{1 - 0.999847}{2} = 0.0000765 = 7.65 \times 10^{-5}

If we directly use the right-hand formula and look up the value:

sin2(1°2)=0.008726532=0.00007615232=7.61523×105\sin^2(\frac{1\degree}{2}) = 0.00872653^{2} = 0.00007615232 = 7.61523 \times 10^{-5}

The discrepancy between the two results is significant. It's crucial to be careful in numerical calculations. To avoid loss of significant figures, consider the following methods:

  • Avoid arithmetic operations between two numbers that are very close in absolute value

  • Use alternative formulas for calculations (like the double angle formula mentioned above)

    • Essentially, this means avoiding arithmetic between two numbers that are close in absolute value.
  • Increase precision

Conclusion

More experienced engineers are likely aware that floating-point arithmetic can introduce errors and understand why 0.1 + 0.2 != 0.3. This article delves into the storage methods for decimals, IEEE754 standards, and the loss of significant figures.

The double angle formula is often encountered during high school trigonometry classes; for me, it's just a formula to apply.

However, in real-world applications, calculations are performed by computers, and everyday scenarios rarely involve nice, round angles like 30 or 60 degrees. Teachers also seldom mention the issue of significant figure loss associated with double angle formulas.

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

Buy me a coffee