Hi! Here's your Friday, November 29, 2024 edition of AI in the News. Today, let's dive into "reasoning models".
Let’s start with a recap of the latest developments:
Alibaba launched Qwen with Questions (QwQ), a 32-billion-parameter open-source reasoning model, this week (VentureBeat)
- It outperforms OpenAI’s o1-preview on the AIME and MATH benchmarks, but is less effective on LiveCodeBench coding tasks.
- The model emphasizes reflection and self-questioning, leading to improved problem-solving capabilities, as noted in their blog post: “When given time to ponder, question, and reflect, the model’s understanding of mathematics and programming blossoms.”
- QwQ is available under an Apache 2.0 license for commercial use, and its open nature allows users to understand its reasoning process, despite the lack of detailed training documentation.
Last week, DeepSeek made headlines by releasing R1 (TechCrunch)
- DeepSeek is also a reasoning AI model that competes with OpenAI’s o1, capable of self-fact-checking and extended processing time for complex queries.
- The model performs comparably to o1 on the AIME and MATH benchmarks but struggles with certain logic problems and can be easily jailbroken.
- DeepSeek-R1 is backed by High-Flyer Capital Management, which aims for “superintelligent” AI.”
In September, OpenAI released o1, its first model with ‘reasoning’ abilities (The Verge)
- o1 is designed to solve complex problems and write code more effectively than previous models, though it is slower and more expensive than GPT-4.
- The model uses a new training methodology involving reinforcement learning and a “chain of thought” approach, resulting in improved accuracy and reduced hallucinations, although the issue persists.”
🤔 Which leaves us with this question: What’s a reasoning model?
- These models showcase advanced reasoning capabilities through “chain-of-thought reasoning,” which breaks down complex tasks into simpler steps. They can be useful for tasks in coding, math, or science, for example.
- A good example is the “strawberry test,” which consists of counting how many “r”s are in the word. Various “ordinary” models fail this test, but reasoning models don’t (although it’s important to note that some non-reasoning models can perfectly find the answer).
- Most models are built in two stages: pre-training on large general datasets and fine-tuning with curated, expert-annotated data for specific tasks. "There are indications that some of the o1 AI models were trained on extensive examples of chain-of-thought reasoning that have been annotated by experts."
"This raises questions about the extent to which self-improvement, rather than expert-guided training, contributes to its capabilities."
These models have different approaches regarding the transparency of their chain of thoughts. For example, “the reflection that o1 performs upon its reasoning is not available to be examined, depriving users of insights into the system’s functioning,” whereas DeepSeek displays it.
If you want to explore some of these models:
Briefly Noted
- They claim that OpenAI's AI models are using their content without permission, violating copyright laws.
- The suit seeks up to $20,000 in statutory damages per article used by OpenAI, which could put the total value of the suit in the range of billions of dollars.
- The organizations argue that this practice undermines their business and threatens the future of journalism, stating, "We deserve to be compensated for our work."