SlideShare a Scribd company logo
MET444-DATA ANALYTICS FOR
ENGINEERS
DATA
• Data is a collection of values that convey information. It can
describe quantity, quality, facts, statistics, or other basic units of
meaning.
• Data can be collected and used to help decision-making. It can also
be information in an electronic form that can be stored and used by
a computer.
Data analytics for engineers-  introduction
+
Data analytics for engineers-  introduction
DA in manufacturing
Data analytics for engineers-  introduction
Data analytics for engineers-  introduction
Evolution of Analytic
scalability
Analytic scalability
• Analytic scalability is the ability to use data to understand and solve a
large variety of problems. And because problems come in many
forms, analytics must be flexible enough to address problems in
different ways. This might include the use of statistical tools and
forecasting.
• In analytic scalability, we have to pull the data together in a separate
analytics environment and then start performing analysis
Traditional Analytic Architecture
• Traditional analytics collects data from heterogeneous data sources and we had to pull
all data together into a separate analytics environment to do analysis which can be an
analytical server or a personal computer with more computing capability. In such
environments, shipping of data becomes a must, which might result in issues related
with security of data and its confidentiality.
Modern In-Database Architecture
• Data from heterogeneous sources are collected, transformed and loaded
into data warehouse for final analysis by decision makers.
• The processing stays in the database where the data has been consolidated.
• The data is presented in aggregated form for querying.
• Queries from users are submitted to OLAP (online analytical processing)
engines for execution. Such in-database architectures are tested for their
query throughput rather than transaction throughput as in traditional
database environments..
Modern In-Database Architecture
Grid Computing
• Grid computing is a form of distributed computing whereby a "super
and virtual computer" is composed of a cluster of networked, loosely
coupled computers, acting in concert to perform very large tasks.
• Grid computing (Foster and Kesselman, 1999) is a growing technology
that facilitates the executions of large-scale resource intensive
applications on geographically distributed computing resources.
• Facilitates flexible, secure, coordinated large scale resource sharing
among dynamic collections of individuals, institutions, and resource
Parallel Computing and Distributed
Computing
• Parallel Computing:
In parallel computing multiple processors performs multiple tasks assigned to
them simultaneously. Memory in parallel systems can either be shared or distributed.
Parallel computing provides concurrency and saves time and money.
• Distributed Computing:
In distributed computing we have multiple autonomous computers which
seems to the user as single system. In distributed systems there is no shared memory
and computers communicate with each other through message passing. In distributed
computing a single task is divided among different computers.
Parallel Computing VS Distributed Computing
Parallel Computing Distributed Computing
Many operations are performed
simultaneously
System components are located at
different locations
Single computer is required Uses multiple computers
Multiple processors perform multiple
operations
Multiple computers perform multiple
operations
It may have shared or distributed
memory
It have only distributed memory
Processors communicate with each
other through bus
Computer communicate with each other
through message passing.
Massively Parallel Processing (MPP)
• Massive Parallel Processing (MPP) is the ―shared nothing‖ approach of
parallel computing. It is a type of computing wherein the process is being
done by many CPUs working in parallel to execute a single program.
• One of the most significant differences between a Symmetric Multi-
Processing or SMP and Massive Parallel Processing is that with MPP, each
of the many CPUs has its own memory to assist it in preventing a
possible hold up that the user may experience with using SMP when all
of the CPUs attempt to access the memory simultaneously.
Massively Parallel Processing (MPP)
Execution of tasks in MPP
The Cloud Computing
• Cloud computing is the delivery of computing services over the Internet.
Cloud services allow individuals and businesses to use software and
hardware that are managed by third parties at remote locations.
• Examples of cloud services include online file storage, social networking
sites, webmail, and online business applications. The cloud computing
model allows access to information and computer resources from
anywhere that a network connection is available.
• Cloud computing provides a shared pool of resources, including data
storage space, networks, computer processing power, and specialized
corporate and user applications.
Characteristic features of cloud
1) Mask the underlying infrastructure from the user
2) Be elastic to scale on demand
3) On a pay-per-use basis
4) National Institute of Standards and Technology (NIST)
5) On-demand self-service
6) Broad network access
7) Resource pooling
8) Rapid elasticity
9) Measured service
Two types of cloud environment
1. Public Cloud
– The services and infrastructure are provided off-site over the internet
– Greatest level of efficiency in shared resources
– Less secured and more vulnerable than private clouds
2. Private Cloud
– Infrastructure operated solely for a single organization
– The same features of a public cloud
– Offer the greatest level of security and control
– Necessary to purchase and own the entire cloud infrastructure
Data analytics for engineers-  introduction
STATISTICAL CONCEPTS
Sampling Fundamentals
• Sampling may be defined as the selection of some part of an aggregate or
totality on the basis of which a judgment or inference about the aggregate
or totality is made.
• It is the process of obtaining information about an entire population by
examining only a part of it.
• In most of the research work and surveys, the usual approach happens to be
to make generalisations.
NEED FOR SAMPLING
1. Sampling can save time and money. A sample study is usually less
expensive than a census study and produces results at a relatively faster
speed.
2. Sampling may enable more accurate measurements for a sample study is
generally conducted by trained and experienced investigators.
3. Sampling remains the only way when population contains infinitely many
members.
4. Sampling remains the only choice when a test involves the destruction of
the item under study.
5. Sampling usually enables to estimate the sampling errors and, thus,
assists in obtaining information concerning some characteristic
SOME FUNDAMENTAL DEFINITIONS
• Universe/Population:
• From a statistical point of view, the term ‘Universe’ refers to the total of the items or
units in any field of inquiry, whereas the term ‘population’ refers to the total of items
about which information is desired. The attributes that are the object of study are
referred to as characteristics and the units possessing them are called as elementary
units.
• The population or universe can be finite or infinite. The population is said to be finite if
it consists of a fixed number of elements so that it is possible to enumerate it in its
totality. For instance, the population of a city, the number of workers in a factory are
examples of finite populations.
• An infinite population is that population in which it is theoretically impossible to
observe all the elements. Thus, in an infinite population the number of items is infinite
i.e., we cannot have any idea about the total number of items. The number of stars in a
sky, possible rolls of a pair of dice are examples of infinite population.
Sampling frame
• The elementary units or the group or cluster of such units may form
the basis of sampling process in which case they are called as
sampling units. A list containing all such sampling units is known as
sampling frame. Thus sampling frame consists of a list of items from
which the sample is to be drawn.
Sampling design:
• A sample design is a definite plan for obtaining a sample from the
sampling frame. It refers to the technique or the procedure the
researcher would adopt in selecting some sampling units from which
inferences about the population is drawn.
• Sampling design is determined before any data are collected
Statistics(s) and parameter(s)
• A statistic is a characteristic of a sample, whereas a parameter is a
characteristic of a population. Thus, when we work out certain measures
such as mean, median, mode or the like ones from samples, then they are
called statistic(s) for they describe the characteristics of a sample.
• But when such measures describe the characteristics of a population, they
are known as parameter(s).
• For instance, the population mean is a parameter, whereas the sample
mean ( X ) is a statistic. To obtain the estimate of a parameter from a
statistic constitutes the prime objective of sampling analysis.
Sampling error
• Sample surveys do imply the study of a small portion of the
population and as such there would naturally be a certain amount of
inaccuracy in the information collected. This inaccuracy may be
termed as sampling error or error variance.
• In other words, sampling errors are those errors which arise on
account of sampling and they generally happen to be random
variations (in case of random sampling) in the sample estimates
around the true population values
Mean, Median, Mode, Standard deviation
• The mean, median and mode are all estimates of where the "middle"
of a set of data is.
• These values are useful when creating groups or bins to organize
larger sets of data.
• The standard deviation is the average distance between the actual
data and the mean.
Mean
• The mean (also know as average), is obtained by dividing the sum of
observed values by the number of observations, “n”. Although data
points fall above, below, or on the mean, it can be considered a good
estimate for predicting subsequent data points
Median
• The median is the middle value of a set of data containing an odd
number of values, or the average of the two middle values of a set of
data with an even number of values. The median is especially helpful
when separating data into two equal sized bins.
Mode
• The mode of a set of data is the value which occurs most frequently
• Standard Deviation
• The standard deviation gives an idea of how close the entire set of
data is to the average value. Data sets with a small standard deviation
have tightly grouped, precise data. Data sets with large standard
deviations have data spread out over a wide range of values
Data analytics for engineers-  introduction
Ad

More Related Content

Similar to Data analytics for engineers- introduction (20)

Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptxMachine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
Relational Database management notes for mca.pptx
Relational Database management notes for mca.pptxRelational Database management notes for mca.pptx
Relational Database management notes for mca.pptx
Madhu855237
 
Data Processing in Fundamentals of IT
Data Processing in Fundamentals of ITData Processing in Fundamentals of IT
Data Processing in Fundamentals of IT
SanthiNivas
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
MedicReS
 
Introduction about Applications of data mining
Introduction about Applications of data miningIntroduction about Applications of data mining
Introduction about Applications of data mining
RamaKrishnaErroju
 
CS3352-Foundations of Data Science Notes.pdf
CS3352-Foundations of Data Science Notes.pdfCS3352-Foundations of Data Science Notes.pdf
CS3352-Foundations of Data Science Notes.pdf
Builders Engineering College
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
Neeraj Tewari
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
Vijay Susheedran C G
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
GautamDematti1
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
phongnguyen312110237
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
Piyumi Sendanayaka
 
Fundamentals of Analytics and Statistic (1).pptx
Fundamentals of Analytics and Statistic (1).pptxFundamentals of Analytics and Statistic (1).pptx
Fundamentals of Analytics and Statistic (1).pptx
adwaithcj7
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
Karan Deep Singh
 
Fundamentals of data science: digital data
Fundamentals of data science: digital dataFundamentals of data science: digital data
Fundamentals of data science: digital data
lokeshsd14
 
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Laboratorio di Cultura Digitale, labcd.humnet.unipi.it
 
Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
Sunderland City Council
 
The Use of Data and Datasets in Data Science
The Use of Data and Datasets in Data ScienceThe Use of Data and Datasets in Data Science
The Use of Data and Datasets in Data Science
Damian T. Gordon
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
Rohit Mittal
 
Preprocessing_exploring_and_Visualization.pptx
Preprocessing_exploring_and_Visualization.pptxPreprocessing_exploring_and_Visualization.pptx
Preprocessing_exploring_and_Visualization.pptx
Eric Amarasinghe
 
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptxMachine_Learning_VTU_6th_Semester_Module_2.1.pptx
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
MaheshKini3
 
Relational Database management notes for mca.pptx
Relational Database management notes for mca.pptxRelational Database management notes for mca.pptx
Relational Database management notes for mca.pptx
Madhu855237
 
Data Processing in Fundamentals of IT
Data Processing in Fundamentals of ITData Processing in Fundamentals of IT
Data Processing in Fundamentals of IT
SanthiNivas
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
MedicReS
 
Introduction about Applications of data mining
Introduction about Applications of data miningIntroduction about Applications of data mining
Introduction about Applications of data mining
RamaKrishnaErroju
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
Neeraj Tewari
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
Vijay Susheedran C G
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
phongnguyen312110237
 
Fundamentals of Analytics and Statistic (1).pptx
Fundamentals of Analytics and Statistic (1).pptxFundamentals of Analytics and Statistic (1).pptx
Fundamentals of Analytics and Statistic (1).pptx
adwaithcj7
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
Karan Deep Singh
 
Fundamentals of data science: digital data
Fundamentals of data science: digital dataFundamentals of data science: digital data
Fundamentals of data science: digital data
lokeshsd14
 
The Use of Data and Datasets in Data Science
The Use of Data and Datasets in Data ScienceThe Use of Data and Datasets in Data Science
The Use of Data and Datasets in Data Science
Damian T. Gordon
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
Rohit Mittal
 
Preprocessing_exploring_and_Visualization.pptx
Preprocessing_exploring_and_Visualization.pptxPreprocessing_exploring_and_Visualization.pptx
Preprocessing_exploring_and_Visualization.pptx
Eric Amarasinghe
 

More from RINUSATHYAN (20)

Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...
RINUSATHYAN
 
Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...
RINUSATHYAN
 
customer analytics Consumer analytics is the process of using data to underst...
customer analytics Consumer analytics is the process of using data to underst...customer analytics Consumer analytics is the process of using data to underst...
customer analytics Consumer analytics is the process of using data to underst...
RINUSATHYAN
 
Lean manufacturing Waste types to eliminate waste and drive efficiency.. pptx
Lean manufacturing Waste types  to eliminate waste and drive efficiency.. pptxLean manufacturing Waste types  to eliminate waste and drive efficiency.. pptx
Lean manufacturing Waste types to eliminate waste and drive efficiency.. pptx
RINUSATHYAN
 
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptxPERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
RINUSATHYAN
 
Regression is A statistical procedure used to find relationships among a set...
Regression is  A statistical procedure used to find relationships among a set...Regression is  A statistical procedure used to find relationships among a set...
Regression is A statistical procedure used to find relationships among a set...
RINUSATHYAN
 
VALUE_ENGINEERING ppt for Industrial engineering ad management
VALUE_ENGINEERING ppt for Industrial engineering ad managementVALUE_ENGINEERING ppt for Industrial engineering ad management
VALUE_ENGINEERING ppt for Industrial engineering ad management
RINUSATHYAN
 
Facilities Planning-Facilities Planning.
Facilities Planning-Facilities Planning.Facilities Planning-Facilities Planning.
Facilities Planning-Facilities Planning.
RINUSATHYAN
 
BREAK EVEN ANALYSIS BREAK EVEN ANALYSIS
BREAK EVEN ANALYSIS  BREAK EVEN ANALYSISBREAK EVEN ANALYSIS  BREAK EVEN ANALYSIS
BREAK EVEN ANALYSIS BREAK EVEN ANALYSIS
RINUSATHYAN
 
CLUTCH introduction, types of clutches, uses etc
CLUTCH introduction, types of clutches, uses etcCLUTCH introduction, types of clutches, uses etc
CLUTCH introduction, types of clutches, uses etc
RINUSATHYAN
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
RINUSATHYAN
 
apriori.pdf
apriori.pdfapriori.pdf
apriori.pdf
RINUSATHYAN
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.ppt
RINUSATHYAN
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.ppt
RINUSATHYAN
 
forecasting
forecastingforecasting
forecasting
RINUSATHYAN
 
regression
regressionregression
regression
RINUSATHYAN
 
Sampling Distributions
Sampling DistributionsSampling Distributions
Sampling Distributions
RINUSATHYAN
 
Decision tree
Decision treeDecision tree
Decision tree
RINUSATHYAN
 
disc brake
disc brakedisc brake
disc brake
RINUSATHYAN
 
CLUTCH
CLUTCHCLUTCH
CLUTCH
RINUSATHYAN
 
Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...
RINUSATHYAN
 
Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...Intellectual Property Rights, encompassing legal protections for creations of...
Intellectual Property Rights, encompassing legal protections for creations of...
RINUSATHYAN
 
customer analytics Consumer analytics is the process of using data to underst...
customer analytics Consumer analytics is the process of using data to underst...customer analytics Consumer analytics is the process of using data to underst...
customer analytics Consumer analytics is the process of using data to underst...
RINUSATHYAN
 
Lean manufacturing Waste types to eliminate waste and drive efficiency.. pptx
Lean manufacturing Waste types  to eliminate waste and drive efficiency.. pptxLean manufacturing Waste types  to eliminate waste and drive efficiency.. pptx
Lean manufacturing Waste types to eliminate waste and drive efficiency.. pptx
RINUSATHYAN
 
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptxPERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
PERFORMANCE ANALYSIS OF TURBOCHARGED TWO WHEELER EFi ENGINE.pptx
RINUSATHYAN
 
Regression is A statistical procedure used to find relationships among a set...
Regression is  A statistical procedure used to find relationships among a set...Regression is  A statistical procedure used to find relationships among a set...
Regression is A statistical procedure used to find relationships among a set...
RINUSATHYAN
 
VALUE_ENGINEERING ppt for Industrial engineering ad management
VALUE_ENGINEERING ppt for Industrial engineering ad managementVALUE_ENGINEERING ppt for Industrial engineering ad management
VALUE_ENGINEERING ppt for Industrial engineering ad management
RINUSATHYAN
 
Facilities Planning-Facilities Planning.
Facilities Planning-Facilities Planning.Facilities Planning-Facilities Planning.
Facilities Planning-Facilities Planning.
RINUSATHYAN
 
BREAK EVEN ANALYSIS BREAK EVEN ANALYSIS
BREAK EVEN ANALYSIS  BREAK EVEN ANALYSISBREAK EVEN ANALYSIS  BREAK EVEN ANALYSIS
BREAK EVEN ANALYSIS BREAK EVEN ANALYSIS
RINUSATHYAN
 
CLUTCH introduction, types of clutches, uses etc
CLUTCH introduction, types of clutches, uses etcCLUTCH introduction, types of clutches, uses etc
CLUTCH introduction, types of clutches, uses etc
RINUSATHYAN
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
RINUSATHYAN
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.ppt
RINUSATHYAN
 
artificial-neural-networks-rev.ppt
artificial-neural-networks-rev.pptartificial-neural-networks-rev.ppt
artificial-neural-networks-rev.ppt
RINUSATHYAN
 
Sampling Distributions
Sampling DistributionsSampling Distributions
Sampling Distributions
RINUSATHYAN
 
Ad

Recently uploaded (20)

SOC2_Tools_and_Goals SOC 2 Type 2 Checklist
SOC2_Tools_and_Goals SOC 2 Type 2 ChecklistSOC2_Tools_and_Goals SOC 2 Type 2 Checklist
SOC2_Tools_and_Goals SOC 2 Type 2 Checklist
9905234521
 
Full document for AI powered resume Analyzer
Full document for AI powered resume AnalyzerFull document for AI powered resume Analyzer
Full document for AI powered resume Analyzer
4213SWARNABCSE
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)
efs14135
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
Comprehensive Guide to Distribution Line Design
Comprehensive Guide to Distribution Line DesignComprehensive Guide to Distribution Line Design
Comprehensive Guide to Distribution Line Design
Radharaman48
 
Jeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe - A Dedicated Senior Software EngineerJeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe
 
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
Pierre Celestin Eyock
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Supplier_PFMEA_Workshop_rev 22_04_27.pptx
Supplier_PFMEA_Workshop_rev 22_04_27.pptxSupplier_PFMEA_Workshop_rev 22_04_27.pptx
Supplier_PFMEA_Workshop_rev 22_04_27.pptx
dariojaen1977
 
Tech innovations management entreprenuer
Tech innovations management entreprenuerTech innovations management entreprenuer
Tech innovations management entreprenuer
Subramanyambharathis
 
digital computing plotform synopsis.pptx
digital computing plotform synopsis.pptxdigital computing plotform synopsis.pptx
digital computing plotform synopsis.pptx
ssuser2b4c6e1
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
Full_Cybersecurity_Project_Report_30_Pages.pdf
Full_Cybersecurity_Project_Report_30_Pages.pdfFull_Cybersecurity_Project_Report_30_Pages.pdf
Full_Cybersecurity_Project_Report_30_Pages.pdf
Arun446808
 
4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx
maairapayongayong
 
1.10 Functions in C++,call by value .pdf
1.10 Functions in C++,call by value .pdf1.10 Functions in C++,call by value .pdf
1.10 Functions in C++,call by value .pdf
VikasNirgude2
 
SOC2_Tools_and_Goals SOC 2 Type 2 Checklist
SOC2_Tools_and_Goals SOC 2 Type 2 ChecklistSOC2_Tools_and_Goals SOC 2 Type 2 Checklist
SOC2_Tools_and_Goals SOC 2 Type 2 Checklist
9905234521
 
Full document for AI powered resume Analyzer
Full document for AI powered resume AnalyzerFull document for AI powered resume Analyzer
Full document for AI powered resume Analyzer
4213SWARNABCSE
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)
efs14135
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
Comprehensive Guide to Distribution Line Design
Comprehensive Guide to Distribution Line DesignComprehensive Guide to Distribution Line Design
Comprehensive Guide to Distribution Line Design
Radharaman48
 
Jeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe - A Dedicated Senior Software EngineerJeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe - A Dedicated Senior Software Engineer
Jeff Menashe
 
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
860556374-10280271.pptx PETROLEUM COKE CALCINATION PLANT
Pierre Celestin Eyock
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Supplier_PFMEA_Workshop_rev 22_04_27.pptx
Supplier_PFMEA_Workshop_rev 22_04_27.pptxSupplier_PFMEA_Workshop_rev 22_04_27.pptx
Supplier_PFMEA_Workshop_rev 22_04_27.pptx
dariojaen1977
 
Tech innovations management entreprenuer
Tech innovations management entreprenuerTech innovations management entreprenuer
Tech innovations management entreprenuer
Subramanyambharathis
 
digital computing plotform synopsis.pptx
digital computing plotform synopsis.pptxdigital computing plotform synopsis.pptx
digital computing plotform synopsis.pptx
ssuser2b4c6e1
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
Full_Cybersecurity_Project_Report_30_Pages.pdf
Full_Cybersecurity_Project_Report_30_Pages.pdfFull_Cybersecurity_Project_Report_30_Pages.pdf
Full_Cybersecurity_Project_Report_30_Pages.pdf
Arun446808
 
4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx
maairapayongayong
 
1.10 Functions in C++,call by value .pdf
1.10 Functions in C++,call by value .pdf1.10 Functions in C++,call by value .pdf
1.10 Functions in C++,call by value .pdf
VikasNirgude2
 
Ad

Data analytics for engineers- introduction

  • 2. DATA • Data is a collection of values that convey information. It can describe quantity, quality, facts, statistics, or other basic units of meaning. • Data can be collected and used to help decision-making. It can also be information in an electronic form that can be stored and used by a computer.
  • 4. +
  • 10. Analytic scalability • Analytic scalability is the ability to use data to understand and solve a large variety of problems. And because problems come in many forms, analytics must be flexible enough to address problems in different ways. This might include the use of statistical tools and forecasting. • In analytic scalability, we have to pull the data together in a separate analytics environment and then start performing analysis
  • 11. Traditional Analytic Architecture • Traditional analytics collects data from heterogeneous data sources and we had to pull all data together into a separate analytics environment to do analysis which can be an analytical server or a personal computer with more computing capability. In such environments, shipping of data becomes a must, which might result in issues related with security of data and its confidentiality.
  • 12. Modern In-Database Architecture • Data from heterogeneous sources are collected, transformed and loaded into data warehouse for final analysis by decision makers. • The processing stays in the database where the data has been consolidated. • The data is presented in aggregated form for querying. • Queries from users are submitted to OLAP (online analytical processing) engines for execution. Such in-database architectures are tested for their query throughput rather than transaction throughput as in traditional database environments..
  • 14. Grid Computing • Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks. • Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources. • Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource
  • 15. Parallel Computing and Distributed Computing • Parallel Computing: In parallel computing multiple processors performs multiple tasks assigned to them simultaneously. Memory in parallel systems can either be shared or distributed. Parallel computing provides concurrency and saves time and money. • Distributed Computing: In distributed computing we have multiple autonomous computers which seems to the user as single system. In distributed systems there is no shared memory and computers communicate with each other through message passing. In distributed computing a single task is divided among different computers.
  • 16. Parallel Computing VS Distributed Computing Parallel Computing Distributed Computing Many operations are performed simultaneously System components are located at different locations Single computer is required Uses multiple computers Multiple processors perform multiple operations Multiple computers perform multiple operations It may have shared or distributed memory It have only distributed memory Processors communicate with each other through bus Computer communicate with each other through message passing.
  • 17. Massively Parallel Processing (MPP) • Massive Parallel Processing (MPP) is the ―shared nothing‖ approach of parallel computing. It is a type of computing wherein the process is being done by many CPUs working in parallel to execute a single program. • One of the most significant differences between a Symmetric Multi- Processing or SMP and Massive Parallel Processing is that with MPP, each of the many CPUs has its own memory to assist it in preventing a possible hold up that the user may experience with using SMP when all of the CPUs attempt to access the memory simultaneously.
  • 20. The Cloud Computing • Cloud computing is the delivery of computing services over the Internet. Cloud services allow individuals and businesses to use software and hardware that are managed by third parties at remote locations. • Examples of cloud services include online file storage, social networking sites, webmail, and online business applications. The cloud computing model allows access to information and computer resources from anywhere that a network connection is available. • Cloud computing provides a shared pool of resources, including data storage space, networks, computer processing power, and specialized corporate and user applications.
  • 21. Characteristic features of cloud 1) Mask the underlying infrastructure from the user 2) Be elastic to scale on demand 3) On a pay-per-use basis 4) National Institute of Standards and Technology (NIST) 5) On-demand self-service 6) Broad network access 7) Resource pooling 8) Rapid elasticity 9) Measured service
  • 22. Two types of cloud environment 1. Public Cloud – The services and infrastructure are provided off-site over the internet – Greatest level of efficiency in shared resources – Less secured and more vulnerable than private clouds 2. Private Cloud – Infrastructure operated solely for a single organization – The same features of a public cloud – Offer the greatest level of security and control – Necessary to purchase and own the entire cloud infrastructure
  • 25. Sampling Fundamentals • Sampling may be defined as the selection of some part of an aggregate or totality on the basis of which a judgment or inference about the aggregate or totality is made. • It is the process of obtaining information about an entire population by examining only a part of it. • In most of the research work and surveys, the usual approach happens to be to make generalisations.
  • 26. NEED FOR SAMPLING 1. Sampling can save time and money. A sample study is usually less expensive than a census study and produces results at a relatively faster speed. 2. Sampling may enable more accurate measurements for a sample study is generally conducted by trained and experienced investigators. 3. Sampling remains the only way when population contains infinitely many members. 4. Sampling remains the only choice when a test involves the destruction of the item under study. 5. Sampling usually enables to estimate the sampling errors and, thus, assists in obtaining information concerning some characteristic
  • 27. SOME FUNDAMENTAL DEFINITIONS • Universe/Population: • From a statistical point of view, the term ‘Universe’ refers to the total of the items or units in any field of inquiry, whereas the term ‘population’ refers to the total of items about which information is desired. The attributes that are the object of study are referred to as characteristics and the units possessing them are called as elementary units. • The population or universe can be finite or infinite. The population is said to be finite if it consists of a fixed number of elements so that it is possible to enumerate it in its totality. For instance, the population of a city, the number of workers in a factory are examples of finite populations. • An infinite population is that population in which it is theoretically impossible to observe all the elements. Thus, in an infinite population the number of items is infinite i.e., we cannot have any idea about the total number of items. The number of stars in a sky, possible rolls of a pair of dice are examples of infinite population.
  • 28. Sampling frame • The elementary units or the group or cluster of such units may form the basis of sampling process in which case they are called as sampling units. A list containing all such sampling units is known as sampling frame. Thus sampling frame consists of a list of items from which the sample is to be drawn.
  • 29. Sampling design: • A sample design is a definite plan for obtaining a sample from the sampling frame. It refers to the technique or the procedure the researcher would adopt in selecting some sampling units from which inferences about the population is drawn. • Sampling design is determined before any data are collected
  • 30. Statistics(s) and parameter(s) • A statistic is a characteristic of a sample, whereas a parameter is a characteristic of a population. Thus, when we work out certain measures such as mean, median, mode or the like ones from samples, then they are called statistic(s) for they describe the characteristics of a sample. • But when such measures describe the characteristics of a population, they are known as parameter(s). • For instance, the population mean is a parameter, whereas the sample mean ( X ) is a statistic. To obtain the estimate of a parameter from a statistic constitutes the prime objective of sampling analysis.
  • 31. Sampling error • Sample surveys do imply the study of a small portion of the population and as such there would naturally be a certain amount of inaccuracy in the information collected. This inaccuracy may be termed as sampling error or error variance. • In other words, sampling errors are those errors which arise on account of sampling and they generally happen to be random variations (in case of random sampling) in the sample estimates around the true population values
  • 32. Mean, Median, Mode, Standard deviation • The mean, median and mode are all estimates of where the "middle" of a set of data is. • These values are useful when creating groups or bins to organize larger sets of data. • The standard deviation is the average distance between the actual data and the mean.
  • 33. Mean • The mean (also know as average), is obtained by dividing the sum of observed values by the number of observations, “n”. Although data points fall above, below, or on the mean, it can be considered a good estimate for predicting subsequent data points
  • 34. Median • The median is the middle value of a set of data containing an odd number of values, or the average of the two middle values of a set of data with an even number of values. The median is especially helpful when separating data into two equal sized bins.
  • 35. Mode • The mode of a set of data is the value which occurs most frequently • Standard Deviation • The standard deviation gives an idea of how close the entire set of data is to the average value. Data sets with a small standard deviation have tightly grouped, precise data. Data sets with large standard deviations have data spread out over a wide range of values
  翻译: