xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

Fine-Tuned by Genloop - #4

Dear Readers,

Welcome to Edition 4 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. Last week, we released a deep dive on Text-to-SQL, packed with insights from our enterprise experience. The response has been incredible! If you haven’t checked it out yet, we've got a summary waiting for you in our top blogs section.

In this edition, we cover xAI’s launch of Grok 3, Perplexity’s open-sourcing of DeepSeek-R1, OpenAI’s roadmap for GPT-4.5 and 5, and key takeaways from the Paris AI Summit.

GenAI is evolving at lightning speed—let’s dive into the biggest developments from the past two weeks!

🌟 AI Industry Highlights

1. xAI Unveils Grok 3 with Advanced Reasoning

xAI on Monday unveiled its updated Grok 3 artificial intelligence model, as the Elon Musk-led startup pushes to keep pace with competitors' advanced reasoning and search capabilities.

Key developments:

Performance Claims: xAI states Grok 3 outperforms Google's Gemini, OpenAI's GPT-4o, Anthropic's Claude 3.5, and DeepSeek's V3 across math, science, and coding benchmarks
New Features: The model introduces advanced web searching with "deep search," online game coding capabilities, and a "big brain" mode for complex reasoning
Immediate Availability: Now available to X Premium+ subscribers ($40/month) or directly through Grok's standalone platforms

Musk referred to Grok 3 as "kind of a beta" and promised rapid improvements. He also teased an upcoming voice mode similar to conversational features in competing apps. The release comes amid Musk's growing AI ambitions, including his recent $97 billion offer to buy OpenAI and his promise to open-source Grok 2's code when Grok 3 is "mature and stable" in the coming months.

2. Perplexity Open-Sources Uncensored DeepSeek-R1 Model

Perplexity has open-sourced R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information. While the original DeepSeek-R1 achieved performance close to state-of-the-art reasoning models like o1 and o3-mini, it was limited by its refusal to respond to sensitive topics, especially those censored by the Chinese Communist Party.

Key points:

Censorship Limitations: The original model would ignore questions about sensitive topics and respond with canned CCP talking points
Post-Training Approach: Perplexity collected ~40k multilingual prompts on 300 censored topics, ensuring users had explicitly given permission to train on this data
Implementation Challenge: A major hurdle was gathering factual responses with valid chain-of-thought reasoning traces for censored prompts

This development helps unlock R1's powerful reasoning capabilities while mitigating bias and censorship, making advanced AI reasoning more widely accessible.

3. OpenAI’s GPT-4.5 and GPT-5 Roadmap

OpenAI has revealed plans for its next-generation models, confirming that GPT-4.5 (codename: Orion) will be its last non-chain-of-thought model, paving the way for the upcoming GPT-5, which promises to unify reasoning and language capabilities.

What’s Changing?

GPT-4.5 (Orion) – The final iteration before a fundamental shift towards deep reasoning models.
GPT-5 – A router model that intelligently delegates tasks to the appropriate sub-models.
o3 will no longer be a standalone model, instead merging into GPT-5 within ChatGPT.

This move validates Ilya Sutskever’s earlier prediction that pre-training alone is no longer enough. Scaling compute has reached its limits, and the industry must explore new paradigms. However, what happens to controls and determinism requirements like SLAs in enterprise applications? Given a question, can I not be sure how soon the model will answer? We are yet to see. There is more work to be done.

4. Google Makes Gemini 2.0 Available to All

Google has made its latest AI model, dubbed Gemini 2.0, available to all. The Gemini 2.0 lineup includes three models:

Gemini 2.0 Flash – A high-performance yet cost-effective model, now generally available.
Gemini 2.0 Flash-Lite – A budget-friendly variant aimed at wider accessibility.
Gemini 2.0 Pro – The most advanced model, optimized for coding and complex reasoning tasks.

Key Upgrades:

2-million token context window in Gemini 2.0 Pro.
Built-in tool use, including Google Search integration.
Enhanced multimodal capabilities, enabling understanding of images, video, and audio.
Improved cost efficiency, eliminating pricing differences between short and long prompts.

Notably, Google’s experimental “thinking” model saw significant gains, scoring 73.3% on AIME (an advanced math competition) and 74.2% on GPQA Diamond (complex science questions). It is currently the most used model of the week on OpenRouter.

5. World Powers Shift AI Regulation at Paris Summit

The AI Action Summit in Paris highlighted growing global divides over AI governance. Unlike past summits that focused on existential risks, this event saw a pivot toward investment and competition.

Key Takeaways:

The U.S. and U.K. refused to sign agreements on global AI governance, military AI restrictions, and algorithmic bias.
Only 26 out of 60 nations agreed to limit autonomous military AI, signaling a lack of global consensus.
France pledged $114B to AI startups and infrastructure, while the EU announced a $210B initiative to boost technological self-sufficiency.
The EU withdrew the AI “liability directive”, opting for a pro-business stance to compete with the U.S. and China.

Why It Matters:

The regulatory shift signals a focus on AI-driven economic growth rather than restrictive oversight.
Governments are moving beyond doomsday AI narratives and toward practical strategies for managing security, bias, and innovation.

We are optimistically following how these policies shape the AI landscape.

6. Humane's AI Pin Discontinued as HP Buys Assets for $116M

Humane announced on Tuesday that HP has acquired most of its assets for $116 million, bringing an abrupt end to its short-lived AI Pin. This serves as a stark reminder that just applying AI to anything doesn't automatically make it successful - product-market fit and real utility remain essential.

Key points:

Complete Shutdown: After February 28, AI Pins will no longer connect to Humane's servers, disabling calling, messaging, AI queries/responses, and cloud access
HP Acquisition Focus: HP is acquiring Humane's engineers, product managers, and technology (including its CosmOS AI operating system)
New Direction: The Humane team will form "HP IQ," an AI innovation lab focused on building intelligent ecosystems across HP products and services

This acquisition marks a dramatic shift from Humane's original aspirations. The company had previously sought between $750 million and $1 billion in acquisition offers last May. The AI Pin faced significant challenges since its April 2024 launch, including disappointing reviews, more returns than sales by last summer, battery fire concerns, and a $200 price drop in October.

📚 Featured Blog Posts

We've got two fascinating reads that showcase how the AI landscape is evolving:

1. Text to SQL: The Ultimate Guide for 2025

Text-to-SQL is a popular GenAI use case, where we see enterprises struggling to achieve high accuracy despite trying multiple approaches. We discovered a more effective solution through fine-tuning.

Key points:

Current Approaches Fall Short: Using top models like O1, RAG with GPT-4o, or agents hit an 85% accuracy ceiling with 20+ second response times
Fine-Tuning Breakthrough: Fine-tuning open-weight LLMs on business-specific query-SQL pairs achieved 95% accuracy with under 7-second responses
Simpler Engineering: The approach eliminated complex failure recovery needs while retaining domain memory

We've compiled a comprehensive comparison of all approaches to help you choose the best solution for your needs. We're happy to discuss specifics in a 1-1 chat. Feel free to schedule a time here.

Read the complete guide

2. Highlights of NeurIPS 2024

The 38th NeurIPS Conference reaffirmed its position as the leading AI research event, drawing record attendance with over 4,000 accepted papers, 56 workshops, and 14 tutorials at the Vancouver Convention Center. We've documented our key learnings and highlights to share with you. Better late than never!

Key points:

Sutskever's Bold Prediction: OpenAI co-founder declared "pre-training as we know it will unquestionably end" since "we have but one internet," suggesting alternative data generation approaches
Groundbreaking Research: Best Paper Awards recognized innovations in visual autoregressive modeling, neural networks with higher-order derivatives, and LLM training improvements
AI-Assisted Publishing: Experimental "Checklist Assistant" helped 70% of authors improve submissions while highlighting both strengths and limitations of AI in academic publishing

Read our full conference breakdown

🔬 Research Corner

Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:

1. SmolLM2 Training Report

Hugging Face's SmolLM2, a 1.7B parameter language model, achieves remarkable performance through a data-centric training strategy. The team placed significant emphasis on data quality, employing 18 customized SLMs for data processing.

Key highlights:

Massive Data Training: SmolLM2 is trained on 11T tokens (5.5T tokens, 2 epochs), enabling it to outperform similarly sized models like Qwen2.5-1.5B and Llama3.2-1B
Iterative Data Rebalancing: Dataset adjustments after each phase optimize generalization and prevent overfitting to low-quality sources, with higher-quality datasets introduced in later stages
Strategic Dataset Development: Three new datasets (FineMath, Stack-Edu, and SmolTalk) were introduced to improve reasoning and instruction-following capabilities

This research highlights how smaller models can remain competitive with strategic data selection and training methodologies. We believe enterprises will soon feasibly train their own SLMs from scratch for domain-adapted advantages.

2. AlphaGeometry2: AI Surpassing Olympiad Gold Medalists

Google DeepMind's AlphaGeometry2 represents a major leap in AI-driven mathematical reasoning. This new version significantly improves on the original, now solving 84% of International Math Olympiad (IMO) geometry problems—outperforming an average IMO gold medalist.

Key highlights that caught our attention:

Expanded Problem Scope: AlphaGeometry2 extends its domain language to handle more complex geometry problems, including locus theorems, linear equations, and non-constructive proofs, increasing IMO problem coverage from 66% to 88%
Optimized Architecture: A faster C++-based symbolic engine, refined rule set, and novel multi-tree search with knowledge sharing have boosted problem-solving efficiency, improving the solve rate from 54% to 84%
Advanced Auto-Formalization: The system converts natural language problems into structured format using Gemini models in a two-step process - generating multiple formalized versions with few-shot prompting, then refining them into a final structured representation

This work showcases how AI is advancing beyond pattern recognition into structured mathematical reasoning, bringing us closer to AI systems capable of higher-level abstract thinking.

Looking Forward

The AI landscape is experiencing an unprecedented surge in development, and its trajectory promises to become even more captivating in the coming days. We are witnessing remarkable technical advancements. However, the true challenge lies in harnessing domain intelligence on top of general intelligence, in developing models that possess a deep understanding of business domains. Our text-to-SQL study underscores the pivotal role that this aspect will play in putting GenAI to production.

Thank you for reading! Share your thoughts with us, and don't forget to subscribe to stay updated on the latest in LLM customization.

About Genloop

Genloop delivers customized LLMs that provide unmatched cost, control, simplicity, and performance for production enterprise applications. Please visit genloop.ai, catch us on Linkedin, or email founder@genloop.ai for more details.

Stay Curious,

The Genloop Team

Fine-Tuned by Genloop - #4

🌟 AI Industry Highlights

1. xAI Unveils Grok 3 with Advanced Reasoning

Key developments:

2. Perplexity Open-Sources Uncensored DeepSeek-R1 Model

Key points:

3. OpenAI’s GPT-4.5 and GPT-5 Roadmap

What’s Changing?

4. Google Makes Gemini 2.0 Available to All

Key Upgrades:

5. World Powers Shift AI Regulation at Paris Summit

Key Takeaways:

Why It Matters:

6. Humane's AI Pin Discontinued as HP Buys Assets for $116M

Key points:

Recommended by LinkedIn

📚 Featured Blog Posts

1. Text to SQL: The Ultimate Guide for 2025

Key points:

2. Highlights of NeurIPS 2024

Key points:

🔬 Research Corner

1. SmolLM2 Training Report

2. AlphaGeometry2: AI Surpassing Olympiad Gold Medalists

Key highlights that caught our attention:

Looking Forward

About Genloop

Fine-Tuned by Genloop

499 followers

More articles by Ayush Gupta

Qwen3's Hybrid Thinking Capabilities Set New Standards

The Intelligence Edge: Why Custom LLMs Go Beyond Privacy

Lessons from China’s DeepSeek Adoption

Llama4 and DeepSeek Inference keep the Open-Source Momentum

GPT-4o Image Generator Goes Viral While Raising Copyright Concerns

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

Qwen and DeepSeek Lead AI Wave as OpenAI and Anthropic Falter

DeepSeek’s Impact Reshapes AI, Markets, and Global Power

OpenAI’s Stargate Bet while DeepSeek R1 Closes the Gap

US AI Export Restrictions Will Push the Rise of Domain Memory Agents

Insights from the community

Others also viewed

Experiments in summarization using LLMs

OpenAI’s $40 Billion Bet: How Data-Driven AI is Poised to Transform Our World"

OpenAI O3: AGI is Finally Here

Will Anthropic’s MCP Become the Backbone of Agentic AI ?

Weekly Dose of AI: OpenAI’s Open-Weight Move, Amazon’s Nova Act, & 5 Must-Try AI Tools

Trends in AI — February 2025: Reasoning Models

📫 AI in the News: Reasoning Models Make Waves—But What Are They Really?

Google Launches Gemini 2.5 Pro: A Major Leap in AI Reasoning Capabilities

Despite the drama, OpenAI is still the most impactful technology to watch.

Phased Approach | Are Mini Models Good for Business?

Explore topics