How Human Perception of Brightness and Color Shapes Video Encoding Strategies

2 months ago 26

Reducing the file size by just 5% will allow 100 users to enjoy seamless playback of streaming.

In the digital content world, video has become the dominant format, and with that, video compression is more critical than ever. Speaking at India’s Biggest GenAI Summit for developers, MLDS 2025, Arvind Sasikumar, co-founder and CTO at Quinn, shared insights on optimising video transcoding to balance efficiency, quality, and playback performance.

The benefits of video compression are clear, smaller file sizes lead to reduced storage costs and lower data transfer expenses. But beyond these obvious advantages, compression directly impacts user experience. One fundamental aspect that often goes unnoticed is that video buffering is a binary state: a video is either playing smoothly or buffering. There’s no middle ground.

Sasikumar explained, “Consider an example. If 100 users each have a 1.9 Mbps internet connection, but the video they are watching has a 2 Mbps bitrate, every user will experience buffering. However, by reducing the file size by just 5%, all 100 users can enjoy seamless playback. This demonstrates why compression is not just about reducing numbers, it’s about eliminating interruptions that degrade the viewing experience.”

There are two primary types of compression: lossy and lossless. Lossy compression reduces file size by removing some data through predictive algorithms. This is the preferred method for video and audio because human perception can compensate for minor quality losses. Lossless compression retains all original data, ensuring no quality loss. This is ideal for text and data files where precision is crucial but is not practical for video due to high storage requirements.

It’s All About Perception

Humans don’t perceive pixels individually; instead, we process visual information contextually. Compression algorithms leverage this by reducing redundant data without noticeable quality loss.

Sasikumar mentioned that one common technique is chroma subsampling, where brightness (luminance) is prioritised over color detail. Since the human eye is more sensitive to brightness than color, encoding schemes like 4:2:0 cut data usage by nearly half without significant perceptual impact.

Beyond individual frames, modern video encoding techniques exploit similarities between consecutive frames to achieve higher compression rates. Instead of storing each frame as a separate image, encoders analyse differences between frames and store only the changes.

A robust compression algorithm must optimise three key areas: compressing individual frames efficiently, minimising redundant data between frames, and leveraging human perception to maintain quality at lower bitrates. One of the most effective strategies is motion estimation and motion compensation.

Instead of encoding pixel-by-pixel differences, encoders track objects as they move across frames and store them as motion vectors. This significantly reduces the amount of data required to represent motion.

He explained with an example, if a ball moves across the screen while the background remains static, the algorithm records only the ball’s movement rather than re-encoding the entire frame. This principle underpins most modern video encoding formats. However, the accuracy of motion estimation affects compression efficiency.

Role of I-Frames in Video Encoding

I-frames (Intra-coded frames) are key reference points in video encoding. Their placement plays a crucial role in ensuring smooth playback and efficient compression. The first frame should be an I-frame to ensure seamless decoding. Periodic I-frames improve quality and facilitate efficient seeking. When a scene transition occurs, pixel values change abruptly, making motion-based encoding struggle. Using an I-frame at scene transitions prevents quality loss.

Motion significantly influences compression strategies. Slow-motion videos require less data since frame-to-frame changes are minimal. Fast-motion videos demand more data but also allow for higher compression, as human perception cannot detect fine details in rapid movement. Without perceptual compression techniques, fast-motion videos could be ten times larger than slow-motion ones. However, optimised encoding can reduce this difference to just two to three times.

Further, lighting, contrast, and color variations also impact compression efficiency. Since human vision perceives brightness and color differently, encoding strategies must consider these perceptual factors to optimise quality across various scenes.

What’s Next?

Traditional metrics like PSNR (Peak Signal-to-Noise Ratio) measure pixel-level differences between original and compressed frames. However, PSNR does not always align with human perception. To address this, Netflix developed VMAF (Video Multi-Method Assessment Fusion), an open-source perceptual quality metric.

VMAF evaluates how viewers perceive quality by incorporating subjective ratings. This helps find the optimal balance between compression efficiency and visual fidelity.

Additionally, to achieve high-quality video compression while optimising resources, segmenting videos into chunks allows for parallel processing and efficient re-encoding. Also, minimising re-encoding is important because encoding is inherently lossy, and unnecessary re-encodes degrade quality.