Challenges on the path to AGI
Context
Over the last few months – I have been increasing my bandwidth to learn and understand the development a layer below the hype of AI. So while I was a reasonable user of chatgpt from version 3.0 – I was limited by listening to some podcasts and reading some tweets on AI development. Now I am reading some of the papers and books on deep learning where in I really started from brushing up my linear algebra, matrix multiplications and coding with python. In your 40s – its not easy to go back to skills which you haven’t practised since college – but its still been so fun.
The pace of change is so fast here that I typically bookmark 8-10 pcs for later reading and am never able to get to it. So I thought I will start publishing these bookmarks as a daily digest so that I was forced to go a little deeper but I still felt that while I was spending a bit more time trying to understand these developments – but it did not feel meaningfully productive. So I am now trying to write longer pieces on topics where I would like to go deeper.
Two Key Challenges
So the first topic I have picked up is to try and understand where are we with the path to AGI. This is a very relevant for business practitioners as we potentially may experience a trough in expectations from the coming years. While leading enterprises and early adopters of AI have found that AI has been beneficial in many ways to accelerate a certain kind of tasks – it still has meant a productivity improvement of only a certain percentage. The percentage at a prosumer level would be much higher – a clear example would be certain type of boiler plate programming which tools like Replit, Cursor, Windsurf or even Github Co-pilot have been great at solving. It still hasn’t been scalable enough to replace swathes of human developers. The challenge has been two folds in creating generalizable intelligence which can work across novel tasks outside of the training data.
Data Efficiency and High-Quality Data
As AI models grow, they need vast amounts of high-quality data to generalize well, but research suggests we're hitting limits. High-quality human-generated data is becoming scarce, and simply adding more data often yields diminishing returns. This means models might overfit, struggling with new, unseen scenarios, which aligns with your point about data efficiency being a challenge.
Reward Functions for Non-Deterministic Tasks For tasks with unpredictable outcomes, like creative reasoning, designing reward functions is tough. The evidence leans toward this being a significant hurdle, as models can misinterpret rewards, leading to unintended behaviors. This complexity makes it hard to define clear success metrics, supporting your assessment.
Overall, these challenges are well-recognized in AI research, though ongoing work on synthetic data and new scaling methods like test-time compute offers hope. Your assessment captures these critical issues effectively.
Comprehensive Analysis of AI Scaling Challenges
This section provides a detailed examination of the challenges in scaling AI with increasing compute, focusing on the availability of high-quality data, data efficiency, and the constraints in creating reward functions for non-deterministic tasks. The analysis draws on recent discussions and research to offer a thorough understanding, aligning with the user's assessment and expanding on the complexities involved.
Background on AI Scaling and Generalization
AI scaling refers to the practice of improving model performance by increasing computational resources, data, and model parameters. The generalization of intelligence implies that AI systems should apply learned knowledge to new, unseen situations effectively. Recent advancements, such as large language models (LLMs), have relied heavily on scaling, but this approach has met significant challenges, particularly as computational power continues to grow exponentially.
Challenge 1: Availability of High-Quality Data and Data Efficiency
One of the primary bottlenecks in AI scaling is the availability of high-quality data. As models scale, they require vast datasets to train on, but the supply of high-quality, human-generated data is finite and at risk of exhaustion. According to a recent article from Our World in Data, today's largest public datasets are about ten times bigger than what most AI models currently use, containing hundreds of trillions of words. However, the exponential growth in data needs—doubling every 9-10 months since 2010 for training data and tripling yearly for LLM datasets—suggests we may soon face a data scarcity issue.
The challenge is not just quantity but quality. High-quality data ensures models generalize well, but as noted in the Exponential View article, there's a log-linear relationship between concept frequency in training data and model performance. This means exponentially more data is needed for linear improvements, especially for rare concepts, which are critical for generalization. The risk of overfitting also increases with larger models, as they can become too optimized for training data, struggling with new scenarios, as highlighted in the Medium article on language model scaling laws.
To illustrate, consider the following table summarizing key data-related challenges:
Recommended by LinkedIn
Innovations like synthetic data generation and learning from alternative data modalities are being explored, as mentioned in the Exponential View article, to address these issues, but they are not yet mature solutions.
Challenge 2: Constraints in Creating Reward Functions for Non-Deterministic Tasks
The second challenge you identified—creating reward functions for non-deterministic tasks—is equally significant, particularly for reinforcement learning (RL) and generative AI applications. Non-deterministic tasks, such as creative writing, multi-step reasoning, or robotics in dynamic environments, involve outcomes that vary unpredictably, making it hard to define clear success metrics.
Reward functions are central to RL, where they guide the agent to maximize cumulative rewards. However, designing these functions for non-deterministic tasks is complex. The reward function design requires a deep understanding of the desired objective and often involves trial and error. Poorly designed rewards can lead to biased outcomes or exploitation, where models find loopholes to achieve high scores without fulfilling the intended goal, a phenomenon known as reward hacking.
The LessWrong post on AI safety further elaborates on reward misspecification, or the outer alignment problem, where the specified reward function does not align with the intended objective. This is particularly problematic in non-deterministic environments, where the same action in the same state can lead to different outcomes. For example, a reward function might need to account for time-varying outcomes, but treating time as a state characteristic can make states non-identical, complicating the learning process.
The GeeksforGeeks overview of RL also notes that the effectiveness of RL depends heavily on the reward function, and poorly designed rewards can lead to suboptimal behaviors, especially in non-deterministic settings. Hence creating reward functions for such tasks is constrained, impacting the ability to scale AI intelligence effectively.
To further illustrate, consider the following table on reward function challenges:
Recent shifts, such as the focus on test-time compute mentioned in the TechCrunch article, suggest a move towards allowing models more time to "think" during inference, which could help address some non-deterministic challenges by breaking down problems into smaller, more manageable steps. However, this is still an emerging strategy and does not fully resolve the reward function design issue.
Interconnection and Broader Implications
These two challenges—data efficiency and reward function design—are interconnected. For instance, the lack of high-quality data can exacerbate the difficulty in training models for non-deterministic tasks, as models need diverse, representative data to learn effective reward signals. Conversely, the inability to create robust reward functions limits the ability to leverage available data efficiently, as models may not learn the right behaviors even with sufficient data.
The Foundation Capital article suggests that AI scaling may be hitting an S-curve, where additional inputs of data, compute, and model size yield increasingly modest gains, reinforcing the need for new approaches beyond traditional scaling. Hence, both challenges highlight the limitations of current scaling strategies and the need for innovations in data efficiency and reward modelling.
It's worth noting that the difficulties in scaling AI are multifaceted. Other significant bottlenecks include:
Conclusion
The generalization of intelligence with increasing compute faces challenges in data availability, data efficiency, and creating reward functions for non-deterministic tasks. While scaling has driven significant AI advancements, these hurdles are critical and actively debated in the field. Ongoing efforts, such as synthetic data generation and test-time compute, offer potential solutions, but they are not yet fully realized, underscoring the complexity of scaling AI intelligence effectively.
Key Citations