Post

Discrete Digital Deception: Bits Fooling You

How tech plays the game, making fake look the same.

Introduction: Welcome to the Matrix!

What’s your screen time today?

We’re living in an increasingly digital world, where nothing is truly continuous. Everything you see, hear, and experience digitally is an approximation, a clever lie designed to trick your senses. Let’s dive into the digital world, where reality is cut into tiny, discrete pieces to fit the rigid framework of 1s and 0s.

From how your computer adds numbers to how your favorite song streams, everything in the digital realm is a deception orchestrated to satisfy our limited human perception. Let’s dive into this fascinating world and uncover the tricks of the digital trade.


Examples of Everyday Digital Trickery


Lights, Camera, Illusion!

Let’s talk about videos. The smoothness of a video is an illusion created by displaying a sequence of still images (frames) rapidly. At 120 frames per second (fps), the video presents a sequence of still images so rapidly that your brain perceives them as smooth motion. Our eyes can’t detect the individual frames beyond a certain threshold, usually around 24–60 fps for most humans. However, your pet might not share this experience because animals like dogs and cats have higher flicker fusion thresholds, meaning they perceive the gaps between frames more easily and see the screen as flickering chaos.

huh-dog

But the illusion isn’t just about the fps of the video. Irrespective of whether you are watching a video or reading this page, the screen also has a refresh rate measured in hertz (Hz). This determines how many times per second the screen updates the displayed content. For example, a screen with a 60Hz refresh rate updates 60 times per second, while a 120Hz screen updates twice as fast. If the fps of the video exceeds the refresh rate of the screen, those extra frames are either skipped or blended, which might degrade the smoothness of motion. Conversely, if the screen’s refresh rate is higher than the fps of the video, it introduces smoother motion, but only up to the limitations of the video content.

In essence, fps is how the software delivers the frames, while refresh rate is how the hardware displays them. For the smoothest experience, they need to work in harmony.

Even a single frame in that video isn’t “continuous” like reality. What looks real to you is just an arrangement of tiny pixels. More pixels = higher resolution = sharper illusion. But zoom in on any pixel, and you realize it’s just a tiny blob of color, hardly the Mona Lisa!

Think of pixels like LEGO bricks. A low-res image is like using chunky blocks to build a castle. It gets the idea across but looks rough. A high-resolution image is like using tiny LEGO pieces for presenting subtle details. Either way, it’s still just LEGO!

wojak-and-chad


Your Ears Are Just as Gullible

Did you know your favorite song is just numbers? Yep, digital audio records sound by taking snapshots of it, typically 44,000 times per second (44.1 kHz). Each snapshot is a single value, and when played back in sequence, it tricks your ears into hearing a smooth melody.

Here’s the twist: even at a live concert with analog instruments, the sound you hear isn’t entirely pure. For a large audience, microphones capture the sound waves, convert them into electrical signals, and loudspeakers recreate them. While the final sound mimics the real thing, it remains an interpretation, one step removed from nature’s true analog form.


The Math: When 1.1 + 2.2 ≠ 3.3

Let’s check with numbers. In the digital world, even basic math can be misleading. If you’ve ever tried to add 1.1 and 2.2 in Python (or any other programming language), you’ll notice the result isn’t exactly 3.3.

1
1.1 + 2.2 == 3.3

Output:

1
False

math-lady

The digital world relies on the IEEE 754 Standard for Floating-Point Arithmetic, a method of representing fractions digitally. This system introduces rounding errors because computers can only store a finite number of decimal places. Think of it as trying to pour a gallon of water into a pint-sized jar. Instead of dealing with precise values, computers approximate. The result? Slight inaccuracies like the one above. Computers use binary to represent numbers, and binary isn’t great at storing fractions exactly. So, what you’re seeing is the closest approximation.

Think of public transport, like a bus or train, with predetermined stops. Your destination might not match a stop exactly, so you get off at the nearest one and walk the rest of the way. Similarly, computers approximate numbers to the nearest value they can represent, just wish it’s a perfect match.


How IEEE 754 Impacts Machine Learning

When computers store real numbers, they don’t store them exactly as we work with them. Instead, they use a system called floating-point arithmetic, which is like a scientific notation for computers. For example, the number $123.45$ in scientific notation is written as $1.2345×10^{2}$ Similarly, IEEE 754 represents a floating-point number in binary as:

\(\text{Number} = (-1)^{\text{Sign}} \times \text{Mantissa} \times 2^{\text{Exponent}}\) where:

  • Sign: 0 for positive, 1 for negative.
  • Exponent: Scales the number up or down (stored with a bias to include negative values).
  • Mantissa (Fraction): Stores the actual digits, normalized to begin with 1 (hidden bit).

Example: Representing 0.1 in IEEE 754

In base-$2$, $0.1$ is a repeating binary fraction:

\[\text{0.1}_{10} = 0.0001100110011..._2\]

Normalize it so it looks like $1.<\text{mantissa}> \times 2^{<\text{exponent}>}$:

\[\text{0.1} = 1.1001100110011... \times 2^{-4}\]

For a $32$-bit float:

  • Sign (1 bit): $0$ (positive)
  • Exponent (8 bits): Add bias of $2^{8-1}-1 = 2^7-1 = 127$ to $-4$: $-4 + 127 = 123 \rightarrow 01111011$
  • Mantissa (23 bits): Take the first $23$ bits of $1001100110011…$.

Result:

\[\text{Binary Representation: } 0\ 01111011\ 10011001100110011001100\]

In decimal, this becomes approximately:

\[\text{0.1} \approx 0.10000000149011612\]

Let’s see how this approximation causes issues in Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Representing 0.1 in Python
import struct

# Convert 0.1 to IEEE 754 binary representation
binary_0_1 = struct.unpack('!I', struct.pack('!f', 0.1))[0]
binary_str = f"{binary_0_1:032b}"  # Format as 32-bit binary

# Print binary representation and real value
print("Binary Representation of 0.1 (IEEE 754):", binary_str)
print("Actual Stored Value of 0.1:", struct.unpack('!f', struct.pack('!I', binary_0_1))[0])

# Adding 0.1 three times
result = 0.1 + 0.1 + 0.1
print("\nResult of 0.1 + 0.1 + 0.1:", result)
print("Is result equal to 0.3?:", result == 0.3)

Output:

1
2
3
4
5
Binary Representation of 0.1 (IEEE 754): 00111101110011001100110011001100
Actual Stored Value of 0.1: 0.10000000149011612

Result of 0.1 + 0.1 + 0.1: 0.30000000000000004
Is result equal to 0.3?: False

This confirms the binary representation of $0.1$ is only an approximation in IEEE 754. Repeated addition of this approximate value accumulates the tiny error, resulting in a sum slightly larger than $0.3$.

In machine learning and numerical computations, such errors can:

  • Impact Equality Comparisons: Direct comparisons (e.g., a == b) may fail due to tiny differences caused by floating-point errors.

  • Break Algorithms: Algorithms relying on precise equality checks (e.g., sorting, clustering) may behave unexpectedly.

How to Mitigate This?

Use math.isclose:

Instead of direct equality, use a tolerance:

1
2
3
import math

print("Is result approximately equal to 0.3?:", math.isclose(result, 0.3))

Output:

1
Is result approximately equal to 0.3?: True

Use the decimal Module:

Python’s decimal module provides precise arithmetic for situations where exact results are needed:

1
2
3
4
5
from decimal import Decimal

result = Decimal('0.1') + Decimal('0.1') + Decimal('0.1')
print("Result with Decimal:", result)
print("Is result equal to 0.3?:", result == Decimal('0.3'))

Output:

1
2
Result with Decimal: 0.3
Is result equal to 0.3?: True

In machine learning, we frequently standardize data to scale features to a mean of 0 and a standard deviation of 1. However, this process is not immune to the quirks of the IEEE 754 floating-point standard, which governs how numbers are represented in digital systems. These quirks can introduce precision errors that disrupt calculations, particularly in large-scale or small-scale datasets.

In machine learning workflows, these floating-point quirks can cause:

  • Distorted Feature Scaling: Features with extreme magnitudes (very large or small) lose accuracy during preprocessing.
  • Poor Convergence in Models: Gradient-based optimizers rely on precise calculations, and any error in feature scaling propagates during training.
  • Amplified Noise in Sparse Data: Sparse datasets may introduce unexpected biases due to exaggerated small values after standardization.

These issues manifest prominently during standardization due to the formula:

\[z = \frac{x - \mu}{\sigma}\]

Where:

  • $x$ is the data point,
  • $\mu$ is the mean,
  • $\sigma$ is the standard deviation.

When $\mu$ or $\sigma$ is very large or very small, floating-point precision errors can cause distortion.


Amplification of Precision Errors

If $\sigma$ (standard deviation) is very small or $\mu$ (mean) involves large numbers, significant digits may be lost during subtraction or division.

1
2
3
4
5
6
7
8
9
import numpy as np

data = np.array([1e10 + 1e-5, 1e10 + 2e-5, 1e10 + 3e-5])
mean = np.mean(data)
std = np.std(data)

# Standardization
standardized = (data - mean) / std
print("Standardized Data:", standardized)

The mean of these values is:

\[\mu = \frac{(1 \times 10 ^ {10} + 1 \times 10 ^ {-5}) + (1 \times 10 ^ {10} + 2 \times 10 ^ {-5}) + (1 \times 10 ^ {10} + 3 \times 10 ^ {-5})}{3} = 1 \times 10 ^ {10} + 2 \times 10 ^ {-5}\]

The standard deviation measures the spread of the data points. Since the values differ by equal increments (1e-5):

\[\sigma = 1 \times 10 ^ {-5}\]

For each data point, the standardized value is:

\[z = \frac{x - \mu}{\sigma}\]

Substituting the values:

  • For $x = 1 \times 10 ^ {10} + 1 \times 10 ^ {-5}$: \(z = \frac{(1 \times 10 ^ {10} + 1 \times 10 ^ {-5}) - (1 \times 10 ^ {10} + 2 \times 10 ^ {-5})}{1 \times 10 ^ {-5}} = -1\)
  • For $x = 1 \times 10 ^ {10} + 2 \times 10 ^ {-5}$: \(z = \frac{(1 \times 10 ^ {10} + 2 \times 10 ^ {-5}) - (1 \times 10 ^ {10} + 2 \times 10 ^ {-5})}{1 \times 10 ^ {-5}} = 0\)
  • For $x = 1 \times 10 ^ {10} + 3 \times 10 ^ {-5}$: \(z = \frac{(1 \times 10 ^ {10} + 3 \times 10 ^ {-5}) - (1 \times 10 ^ {10} + 2 \times 10 ^ {-5})}{1 \times 10 ^ {-5}} = 1\)

Thus, you would ideally expect the output to be:

1
Standardized Data: [-1, 0, 1]

Well, run the code yourself and be amazed to see the output as:

1
Standardized Data: [-1.31982404 -0.21997067  1.09985336]

Due to the IEEE 754 floating-point standard, large numbers like $1 \times 10 ^ {10}$ lose precision when subtracted from similar large numbers (e.g., $(1 \times 10 ^ {10}) - (1 \times 10 ^ {10})$). This phenomenon, called catastrophic cancellation, causes the computation to lose significant digits, introducing inaccuracies into the standardized result.


Errors in Sparse or Noisy Data

Sparse datasets with very small values (in the order of $10 ^ {-9}$ , $10 ^ {-10}$) are particularly prone to precision issues. Standardizing these values often magnifies errors.

1
2
3
4
5
6
data = np.array([1e-9, 2e-9, 3e-9])
mean = np.mean(data)
std = np.std(data)

standardized = (data - mean) / std
print("Standardized Data:", standardized)

Output:

1
Standardized Data: [-1.22474487  0.          1.22474487]

The tiny differences between the values will appear exaggerated after standardization. This behavior arises because floating-point representation cannot accurately handle extremely small values.

When numbers are extremely small, rounding errors dominate, making the transformation potentially misleading.


Understanding these limitations helps us make better decisions when working with digital data, ensuring our machine learning models remain robust and reliable.


Final Thoughts: The Future of Illusions

The digital world is all about tricking your senses, and it’s only going to get better at it. With AI, hyper-realistic graphics, and 3D audio, we’re headed toward a future where it’ll be nearly impossible to distinguish real from digital. While this is exciting, it also makes you wonder: how much of our perception is based on what’s real, and how much is just clever engineering?

As Albert Einstein said,“The human mind has first to construct forms, independently, before we can find them in things” highlights how perception is not passive but an active construction process, susceptible to errors and illusions.

The next time you watch a movie, listen to music, or even do some math on your computer, remember: it’s all discrete, it’s all outward appearance, but wow, does it work.

this-is-fine-dog


This post is licensed under CC BY 4.0 by the author.