This document provides an overview of Hadoop, including its architecture, installation, configuration, and commands. It describes the challenges of large-scale data that Hadoop addresses through distributed processing and storage across clusters. The key components of Hadoop are HDFS for storage and MapReduce for distributed processing. HDFS stores data across clusters and provides fault tolerance through replication, while MapReduce allows parallel processing of large datasets through a map and reduce programming model. The document also outlines how to install and configure Hadoop in pseudo-distributed and fully distributed modes.