Why Self-Supervised Learning Thrives on Redundant Data

7 months ago 76

“This point comes up frequently, but it doesn't quite make sense,” said an unimpressed Francois Chollet.

What’s keeping researchers awake at night? It’s the nagging question of whether AI systems can truly learn like humans simply by looking at data. Maybe self-supervised learning guru Yann LeCun has the answers.

In a recent discussion on X, LeCun was summoned to talk about the critical role of redundancy in self-supervised learning (SSL). According to him, SSL thrives on data redundancy, enabling it to uncover structure and patterns within the input.

If the data has redundancy, which means there are repeated or similar parts, SSL can use it to learn useful structures and insights. However, “highly compressed data has no redundancy and appears random. SSL can not learn anything from random data,” said LeCun.

Self-Supervised Learning thrives on redundancy.
The more redundancy in the data, the more structure can be learned.

Highly compressed data has no redundancy and appears random.
SSL can not learn anything from random data.
Conversely, highly redundant data is completely…

— Yann LeCun (@ylecun) September 7, 2024

He highlighted that highly compressed data, devoid of redundancy, appears random and cannot be learned from. On the other hand, while completely predictable, highly redundant data lacks the novelty that SSL needs to extract useful information.

The balance lies somewhere in between, where enough redundancy allows SSL to model the structure but still leaves room for learning from less predictable aspects.

LeCun has been talking about this ever since we know. While speaking with AIM two years ago, he said that the average human has the ability to process about ten images per second in a span of 100 milliseconds. By the time humans are five years old, they have already seen about a billion frames.

What is the Contention?

“In four years, a child has seen 50 times more data than the biggest LLMs,” LeCun emphasised. Not everyone agrees, though.

While everyone sees poor quality of data as a bottleneck in making intelligent systems, LeCun believes otherwise. He believes that the main issue is not the unavailability of data but how learning systems can take advantage of the available data. The same point has been reiterated several times by LeCun and others who agree with him.

“This point comes up frequently, but it doesn’t quite make sense,” said an unimpressed Francois Chollet, the creator of Keras and another deep learning guru, starting a debate around how much information humans learn through vision.

Chollet emphasised that while redundancy is useful, one must also consider the “post-compression” measure of information, as raw data isn’t always a good indicator of meaningful information. He pointed out that while visual feed data may appear to have a high bandwidth, much of it is autocorrelated and redundant in time and space, thus reducing its true informational value.

LeCun, who has been a big proponent of AI achieving animal-like intelligence before going for human intelligence, emphasised that learning simply cannot happen without a degree of redundancy in the data. The more redundancy there is, the more structure SSL can harness. He backed up his argument by referring to the human visual system.

Vision is Not All You Need?

LeCun explained that while the human eye has about 60 million photosensors, four layers of neurons reduce this raw data down to a million optic nerve fibres. This compression, LeCun argues, reduces excessive redundancy while allowing for essential features to be captured by the brain.

This underscores the vast difference in the bandwidth between text, which LeCun calls “too low”, and visual data, which is more redundant and thus ideal for self-supervised learning. In his view, video data offers the right balance of redundancy, making it a much richer modality for training models compared to text.

But this is not well agreed about. Chollet questioned LeCun’s estimates of the bandwidth of the human visual system, arguing that the raw optical input is far less than what LeCun suggests. According to Chollet, the true bandwidth is under 1 MB/s, which is significantly lower than LeCun’s estimate of 20MB/s.

“My point is just: the claim that the information bandwidth of the human visual system is 20MB/s (based on optic nerve count) is pure nonsense,” said Chollet. To which LeCun said, “What is pure nonsense is claiming that the relevant quantity is the number of bits after compression.”

The question remains intact: which one has more information about the world: a POV video of a four-year-old or all the text on the internet? “If only raw data matters, then why are blind children intelligent at all?” asked Chollet. To which LeCun said they learn through touch, which has high bandwidth and is also extremely redundant.

Though the word “redundancy” sounds counterintuitive for AI models, it turns out that it is actually helpful when talking about teaching AI how humans do, by looking at consistency and similar examples again and again. At the same time, the argument for data quality over redundancy holds ground when it comes to building text-based models.