This document summarizes the LLaMa model, which is an open and efficient foundation language model. [1] LLaMa achieves state-of-the-art performance on various tasks while being trained exclusively on publicly available data and requiring only a single GPU for inference, making it more accessible than other large models. [2] Key aspects of LLaMa include pre-normalization, SwiGLU activation, rotary embeddings, and efficient implementation techniques. It was trained on 1.4 trillion tokens of publicly available data using 2048 A100 GPUs over 5 months. [3] Evaluation shows LLaMa outperforms other models on common sense reasoning, question answering, reading comprehension,