Statistics

Statistics

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).[7] Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the collection of data leading to a test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative").[8] Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.[7]

Statistical measurement processes are also prone to error in regards to the data that they generate. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

To view or add a comment, sign in

More articles by Vanshika Munshi

  • Key Data Engineer Skills and Responsibilities

    Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…

  • What Is Financial Planning? Definition, Meaning and Purpose

    Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…

  • What is Power BI?

    The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…

  • Abinitio Graphs

    Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…

  • Abinitio Interview Questions

    1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…

  • Big Query

    BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…

  • Responsibilities of Abinitio Developer

    Job Description Project Role : Application Developer Project Role Description : Design, build and configure…

  • Abinitio Developer

    Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…

  • Data Engineer

    Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…

  • Pyspark

    What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

Insights from the community

Others also viewed

Explore topics