Generative AI: My Adventures with Vibe Coding
Image Created: ChatGPT

Generative AI: My Adventures with Vibe Coding

We all have been hearing a lot of it. Vibe coding is in the discussions everywhere, on how it will revolutionize the lives; the Generative AI can auto generate your code, and leave technical specialists in dust.

Vibe coding is all intuition driven, without a specific waterfall model for program development. There are no functional or technical specifications, just leave the person and the machine chat with each other in one or more steps, to generate what works.

To some, it carries negative connotations, that it is potentially a haphazard, undisciplined approach that can lead to inconsistent and buggy code, that would be hard to maintain, especially in large projects.

I wanted to give this all a spin, and see for myself what all this fun is about. The narrative below is all about my experiences and what I see coming out of this experiment.


The Setup

In order to begin testing, I decided to use Grok as the Generative AI tool of choice, and see where things would go.

For starters, I selected a simper version of reinforcement learning algorithm, that I felt Grok should not have much problem generating a perfectly executable code in Python. The algorithm I wanted to implement was a Deep Q Learning Network (DQN), and I wanted to train my agent to play tic-tac-toe, after learning the game by itself over 10000 iterations.

I decided to run all of this on a computer with i7 Intel CPU and a 32 GB RAM. No GPU's were used. For running the code, I used the Spyder 5.5.1 IDE running Python 3.12.7 within the Anaconda 3 distribution.


The Execution

I must set out by saying that I am by no means a Python expert, but have a general understanding of the language. I have some familiarity of the popular machine learning frameworks including TensorFlow and PyTorch. I also do have some awareness of popular libraries like Numpy, Pandas and Sklearn. I am not proficient writing long code lines from scratch, but can analyze what a written code would attempt to do. This make me a perfect guinea pig for this 'vibe coding' test.

To get started, I prompted the LLM with the shortest prompt: "can you vibe code a deep q learning network for playing tic tac toe?"

I was pleasantly surprised that it picked up PyTorch (which also happens to be my preferred framework), to build the architecture of this DQN. The processing was very quick, and the response to my prompt from Grok was:

"I'll create a Python implementation of a Deep Q-Learning (DQN) network for playing Tic-Tac-Toe. This will include a Tic-Tac-Toe environment and a DQN agent that learns to play optimally through self-play. The code uses PyTorch for the neural network and NumPy for board state management.

The approach will:

  • Represent the 3x3 Tic-Tac-Toe board as a 9-element vector (0 for empty, 1 for X, -1 for O)
  • Use a neural network with two hidden layers to approximate the Q-function
  • Implement experience replay and epsilon-greedy exploration
  • Train the agent by playing against itself
  • Include a simple environment class to handle game mechanics"

This led to less than 1 minute for Grok to generate the entire code for implementing this DQN. Here is the code that was generated, for you that may be interested in the technique:

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import random
from collections import deque

# Tic-Tac-Toe Environment
class TicTacToe:
    def __init__(self):
        self.board = np.zeros(9)  # 0: empty, 1: X, -1: O
        self.current_player = 1   # X starts

    def reset(self):
        self.board = np.zeros(9)
        self.current_player = 1
        return self.board

    def available_actions(self):
        return [i for i in range(9) if self.board[i] == 0]

    def make_move(self, action):
        if action in self.available_actions():
            self.board[action] = self.current_player
            self.current_player = -self.current_player
            return True
        return False

    def check_winner(self):
        # Check rows, columns, diagonals
        lines = [
            [0, 1, 2], [3, 4, 5], [6, 7, 8],  # Rows
            [0, 3, 6], [1, 4, 7], [2, 5, 8],  # Columns
            [0, 4, 8], [2, 4, 6]              # Diagonals
        ]
        for line in lines:
            if abs(sum(self.board[line])) == 3:
                return self.board[line[0]]  # 1 if X wins, -1 if O wins
        if len(self.available_actions()) == 0:
            return 0  # Draw
        return None  # Game ongoing

    def get_state(self):
        return self.board.copy()

# DQN Neural Network
class DQN(nn.Module):
    def __init__(self, input_size=9, hidden_size=128, output_size=9):
        super(DQN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# DQN Agent
class DQNAgent:
    def __init__(self):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model = DQN().to(self.device)
        self.target_model = DQN().to(self.device)
        self.target_model.load_state_dict(self.model.state_dict())
        self.optimizer = optim.Adam(self.model.parameters(), lr=0.001)
        self.memory = deque(maxlen=10000)
        self.gamma = 0.99  # Discount factor
        self.epsilon = 1.0  # Exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.batch_size = 32
        self.target_update = 100  # Update target network every 100 steps

    def select_action(self, state, env, training=True):
        state = torch.FloatTensor(state).to(self.device)
        available_actions = env.available_actions()
        
        if training and random.random() < self.epsilon:
            return random.choice(available_actions)
        
        with torch.no_grad():
            q_values = self.model(state)
            # Mask invalid actions
            q_values = q_values.cpu().numpy()
            q_values[[i for i in range(9) if i not in available_actions]] = -float('inf')
            return np.argmax(q_values)

    def store_transition(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def train(self):
        if len(self.memory) < self.batch_size:
            return
        
        batch = random.sample(self.memory, self.batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)

        states = torch.FloatTensor(states).to(self.device)
        actions = torch.LongTensor(actions).to(self.device)
        rewards = torch.FloatTensor(rewards).to(self.device)
        next_states = torch.FloatTensor(next_states).to(self.device)
        dones = torch.FloatTensor(dones).to(self.device)

        # Current Q values
        current_q = self.model(states).gather(1, actions.unsqueeze(1)).squeeze(1)
        
        # Next Q values from target model
        with torch.no_grad():
            next_q = self.target_model(next_states).max(1)[0]
            target_q = rewards + (1 - dones) * self.gamma * next_q

        # Compute loss
        loss = nn.MSELoss()(current_q, target_q)

        # Optimize
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        # Decay epsilon
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

# Training Loop
def train_dqn(episodes=10000):
    env = TicTacToe()
    agent = DQNAgent()
    step_counter = 0

    for episode in range(episodes):
        state = env.reset()
        done = False
        total_reward = 0

        while not done:
            # X's turn (agent)
            action = agent.select_action(state, env)
            env.make_move(action)
            next_state = env.get_state()
            winner = env.check_winner()

            if winner is not None:
                done = True
                if winner == 1:  # X wins
                    reward = 1
                elif winner == -1:  # O wins
                    reward = -1
                else:  # Draw
                    reward = 0
            else:
                # O's turn (random opponent)
                opponent_action = random.choice(env.available_actions())
                env.make_move(opponent_action)
                next_state = env.get_state()
                winner = env.check_winner()
                reward = 0
                if winner == -1:  # O wins
                    reward = -1
                    done = True
                elif winner == 0:  # Draw
                    reward = 0
                    done = True

            agent.store_transition(state, action, reward, next_state, done)
            agent.train()
            state = next_state
            total_reward += reward
            step_counter += 1

            # Update target network
            if step_counter % agent.target_update == 0:
                agent.target_model.load_state_dict(agent.model.state_dict())

        if episode % 1000 == 0:
            print(f"Episode {episode}, Total Reward: {total_reward}, Epsilon: {agent.epsilon:.3f}")

    return agent

# Function to play a game against the trained agent
def play_game(agent):
    env = TicTacToe()
    state = env.reset()
    done = False

    while not done:
        print_board(state)
        if env.current_player == 1:
            action = agent.select_action(state, env, training=False)
            env.make_move(action)
        else:
            print("Your turn (O). Enter position (0-8):")
            action = int(input())
            while action not in env.available_actions():
                print("Invalid move. Try again:")
                action = int(input())
            env.make_move(action)

        state = env.get_state()
        winner = env.check_winner()
        if winner is not None:
            done = True
            print_board(state)
            if winner == 1:
                print("X (AI) wins!")
            elif winner == -1:
                print("O (You) win!")
            else:
                print("It's a draw!")

def print_board(board):
    chars = {1: 'X', -1: 'O', 0: ' '}
    for i in range(0, 9, 3):
        print(f"{chars[board[i]]} | {chars[board[i+1]]} | {chars[board[i+2]]}")
        if i < 6:
            print("-" * 9)

if __name__ == "__main__":
    # Train the agent
    trained_agent = train_dqn(episodes=10000)
    # Play a game against the trained agent
    print("Let's play Tic-Tac-Toe! You are O, AI is X.")
    play_game(trained_agent)
        

Did it Run?

At first no! Upon execution of the code as is; the interpreter threw an error. I copied the error right back into Grok, and it realized there was a mistake. It detected the errant line and provided the correct code string for the fix.

It took about 5 minutes to resolve this error, and I was able to get the program to execute this time. The agent took about 20 minutes to train itself on the CPU, and was ready to play soon after. I played a few iterations with the agent, and was pretty impressed with a 50-50 approximates between me and the agent. Everyone's mileage would vary. It would also depend on how many episodes one lets the agent train.

Findings

This experience validated my sense that the technology does have capability to get the code generation off the ground without programmers, but is prone to errors. This was one of the simplest programs that I was able to get running relatively quickly.

For more complex architectures, even with DQN (like the Lunar Lander on Open AI Gym); there were a lot more errors generated. Fixing one error at a time would cause cascading errors in other classes within the program. Feeling brave I also attempted more advanced Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) architectures to see how far I could get. I figured that the number of errors I was encountering running was even worse. It required much more level of intervention to get my code to work. I had to figure out the corrections myself, as the LLM went repeating itself after sometime.

Conclusion

Vibe coding stands quite in contrast with structured programming approaches that encourage planning, testing protocols, specifications and collaboration. Larger the project, the more imperative kicks in for the teams to formalize the approach.

While the intuition is valuable for an experienced developer, relying solely on vibes in a complex project may not be a good idea. I still see areas where this approach could give a rapid start in designing the simple elements that would constitute a more complex project.


To view or add a comment, sign in

More articles by Anand Kand Srivastava

Insights from the community

Others also viewed

Explore topics