Data pre-processing is an important step that includes cleaning, normalization, transformation, feature extraction and selection to produce the final training set. It addresses real-world data issues like incompleteness, noise, and inconsistencies. The main tasks are data cleaning, integration, transformation, reduction, and discretization. Data cleaning fills in missing values and identifies/removes outliers. Data is normalized, aggregated, and generalized. Reduction decreases attributes and tuples through binning, clustering, sampling and other techniques. Data mining tools include traditional programs, dashboards to monitor business performance, and text mining tools to extract structured and unstructured data.