This document describes a vision-based hand gesture recognition system using convolutional neural networks. The system captures images of hand gestures using a camera, pre-processes the images, and classifies the gestures using a CNN model. The CNN architecture includes convolutional layers, max pooling layers, dropout layers, and fully connected layers. The system was trained on a dataset of images representing 7 different hand gestures. Testing achieved over 90% accuracy in recognizing the gestures. This vision-based approach allows for natural human-computer interaction without physical devices.