Fine-Tuning LLMs Using PEFT: Lessons from Fine-Tuning for Text-to-SQL Conversion
With Generative AI rapidly evolving, fine-tuning large language models (LLMs) is becoming a crucial tool for building domain-specific AI solutions. While general-purpose models like GPT-4 and Gemini perform remarkably well across a broad spectrum of tasks, they often struggle with specialized use cases that require deep contextual understanding—such as complex SQL generation from natural language queries.
At our company, we explored fine-tuning an open-source LLM using PEFT (Parameter-Efficient Fine-Tuning) for NL-to-SQL conversion. The goal? Build a model that not only understands SQL syntax but also handles complex queries, multiple joins, nested subqueries, and custom macros—tasks where even the best general-purpose models often fall short.
In this article, I’ll walk through:
✅ When to fine-tune instead of relying on pre-trained LLMs
✅ How PEFT (Parameter-Efficient Fine-Tuning) improves efficiency
✅ Key hyperparameters that impact fine-tuning performance
✅ Our real-world fine-tuning results and what worked best
👉 Note: This article has been reviewed and approved for publication by my company, ensuring it aligns with our best practices and insights gained from real-world projects.
Why Fine-Tune Instead of Using GPT-4 or Gemini?
Before diving into the fine-tuning details, let’s answer a fundamental question: why fine-tune at all?
✅ Cost-Efficiency: Running a 7B parameter LLM in-house is significantly cheaper than making repeated API calls to proprietary models. Fine-tuning allows better control over inference costs.
✅ Faster Inference: A fine-tuned model delivers responses 1.5x to 2x faster than GPT-4 for complex SQL generation tasks.
✅ Better Performance for Domain-Specific Tasks: Large models generalize well, but when the task requires deep domain-specific expertise (e.g., enterprise SQL patterns, custom macros, proprietary functions), fine-tuning yields far superior results.
❌ When Fine-Tuning is NOT Ideal: If the model has no prior exposure to a task (e.g., generating code in an entirely new programming language), fine-tuning might lead to hallucinations and conflicts with pre-trained knowledge.
In our case, since LLMs already understand SQL syntax, fine-tuning was the perfect approach to refine and enhance the model’s ability to generate complex queries.
PEFT Fine-Tuning Explained Simply (LoRA Approach)
Fine-tuning an entire LLM is computationally expensive. Instead, PEFT (Parameter-Efficient Fine-Tuning) allows us to update only a fraction of the model’s weights, making fine-tuning efficient.
🔹 Instead of updating the entire 1M+ parameter matrix (W) of the model, PEFT introduces two small matrices (A and B) with significantly fewer parameters.
🔹 During training, only A and B are updated, while the original model weights remain frozen.
🔹 After fine-tuning, we merge the learned matrices back into the original model.
💡 Result? Faster, more memory-efficient training while retaining the power of the base model. This method is particularly effective for LLM adaptation in enterprise settings.
Fine-Tuning Our LLM for Text-to-SQL
1️⃣ Loading the Model & Tokenizer
We used BitsAndBytesConfig for 4-bit quantization, reducing memory usage while keeping performance optimal.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_name,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
load_in_8bit=True,
use_flash_attention_2=True,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = 'right'
📌 Key takeaways:
2️⃣ Preprocessing & Preparing Training Data
We structured the dataset into three columns:
Recommended by LinkedIn
Each training example was formatted into a structured prompt, such as:
### Task
Generate a SQL query to answer the following question:
`How many services are available in each city?`
### Database Schema
CREATE TABLE Hotels(
HotelID INT PRIMARY KEY,
HotelName VARCHAR(30),
City VARCHAR(30)
);
### Answer
```sql
SELECT Hotels.City, COUNT(*)
FROM Hotels
GROUP BY Hotels.City;
### **3️⃣ Defining LoRA Adapters & Hyperparameters**
Choosing the **right hyperparameters** was crucial.
```python
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
peft_config = LoraConfig(
target_modules=["q_proj","v_proj","k_proj","o_proj","lm_head"],
lora_alpha=128,
lora_dropout=0.05,
r=128,
bias="none",
task_type="CAUSAL_LM"
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
📌 Key Learnings:
4️⃣ Training the Model
from transformers import TrainingArguments, DataCollatorForLanguageModeling
from trl import SFTTrainer
args = TrainingArguments(
output_dir="./output_model_dir",
num_train_epochs=4,
per_device_train_batch_size=2,
learning_rate=2e-5,
logging_steps=500,
save_steps=500,
evaluation_strategy="steps"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset['train'],
eval_dataset=dataset['test'],
peft_config=peft_config,
tokenizer=tokenizer,
args=args
)
trainer.train()
🛠 Lessons from training:
🚀 Results: How Our Fine-Tuned Model Performed
✅ More Accurate SQL Generation: The fine-tuned model outperformed GPT-4 on complex queries, including:
✅ 2x Faster Inference: Queries returned in 1 to 1.5 seconds, much faster than GPT-4.
✅ Better Handling of Enterprise-Specific Macros: Custom SQL functions were understood and applied correctly.
Input (Natural Language Query):
"Get the top 5 product categories that generated the highest revenue in the last quarter, including total sales and their percentage contribution to overall revenue."
Output (SQL Query):
WITH category_sales AS (
SELECT
c.category_name,
SUM(s.sales_amount) AS total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
WHERE s.sale_date BETWEEN '2024-10-01' AND '2024-12-31'
GROUP BY c.category_name
),
total_revenue AS (
SELECT SUM(total_sales) AS overall_sales FROM category_sales
)
SELECT
cs.category_name,
cs.total_sales,
ROUND((cs.total_sales / tr.overall_sales) * 100, 2) AS percentage_contribution
FROM category_sales cs
JOIN total_revenue tr
ORDER BY cs.total_sales DESC
LIMIT 5;
⚠️ Limitations & Next Steps
Final Thoughts
💡 Fine-tuning LLMs is a game-changer for domain-specific applications. If your use case involves structured queries, complex reasoning, or domain expertise, fine-tuning a PEFT-based model can outperform generic LLMs like GPT-4 and Gemini while being cheaper and faster.
If you’re exploring fine-tuning, start small, iterate fast, and tune your hyperparameters carefully! 🚀
Would love to hear from others working on fine-tuning LLMs—what has your experience been like? 👇✨
#LLM #AI #FineTuning #PEFT #SQL #MachineLearning #SQL #NLP