SlideShare a Scribd company logo
Kokila Rudresh
Devangana Khokhar
Divide-and-ConquerTestinginData
AnalyticsDomain
V o d Q A 2 0 1 5
Data Analytics: An Introduction
Collection
Processing Modelling Inference Visualization
Data Analytics: Use Cases
Business Intelligence
Social Networks
Astronomy and
Astrophysics
Robotics and Artificial
Intelligence Life Sciences
Finance and Stock
Market
Medical Imaging
Computer Graphics
Computer Vision
Energy Exploration
Data Analytics: Why Testing is Important
Volume
Domain
Complexity
Variety
Computations
Testing
Thou shalt not leave the application untested!
Data Analytics: Testing Challenges
Data
Validation
Model
Implementation
Business
Perspective
Data Analytics: Typical System Implementation
Extract
Transform
Load
Source
Data
Simulation AggregationETL VisualizationRaw Data
Format
Consistency
Completeness
Divide-and-Conquer Testing
Extract
Transform
Load
Source
Data
Pre-ETL Validations
Divide-and-Conquer Testing
Extract
Transform
Load
Source
Data
Post-ETL Tests
Meta-data
Data transformation
Data quality checks
Business-specific validations
Divide-and-Conquer Testing
Extract
Transform
Load
Source
Data
Simulation Validations
Model Validation
Implementation
Computation
Divide-and-Conquer Testing
Extract
Transform
Load
Source
Data
Aggregation Validations
Data Hierarchy
Data Scope
Summarized Values
Divide-and-Conquer Testing
Extract
Transform
Load
Source
Data
UI Validations
Information Representation
Data Format
Result Intuitiveness
Learnings
ANALYSE
CODETEST
Initial Data Flow
• Pre defined data
template
• Pre-ETL data validations
Domain Knowledge
• KT Sessions involving SME’s
• Core computations
Business Involvement
• Test data closer to real
time data
• User flows prioritization
Learnings
Implementation
• Alternate implementation
• SME validation
Computation
• Addressing the right
problem
• Computational Factors
ANALYSE
CODETEST
Learnings
Testing Process
• Step wise data
validation
• Defect investigation
Test Automation
• Data combinations
• Xml test data
Test Execution
• CI test execution
• Execution frequency
Test Data
• Data distribution
• Edge case data
Testing Tools
• Spreadsheet gear
• Excel macros
ANALYSE
CODETEST
Domain
Context
Integrating
Business
Use-cases
Design and
Testing
Challenges
Testing
Approach
Learnings
Summary
kokila@thoughtworks.com
devangk@thoughtworks.com @DevanganaK
Ad

More Related Content

Viewers also liked (20)

Data Analytics Project Plan
Data Analytics Project PlanData Analytics Project Plan
Data Analytics Project Plan
Jelilat Adesiyan
 
Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015
Thoughtworks
 
Gauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harrisGauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harris
VodqaBLR
 
Test automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystemsTest automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystems
VodqaBLR
 
Introduction to Gauge
Introduction to GaugeIntroduction to Gauge
Introduction to Gauge
vodqancr
 
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Luke Stevens
 
How to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsHow to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and tools
Mayank Solanki
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
QA InfoTech
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
vodqancr
 
Strategies for Distributed Agile Testing
Strategies for Distributed Agile TestingStrategies for Distributed Agile Testing
Strategies for Distributed Agile Testing
Anand Bagmar
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
Impetus Technologies
 
Mind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing DaysMind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing Days
Huib Schoots
 
Client-Side Performance Testing
Client-Side Performance TestingClient-Side Performance Testing
Client-Side Performance Testing
Anand Bagmar
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
Qualitest
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
ROHIT KHARABE
 
What is Agile Testing?
What is Agile Testing?What is Agile Testing?
What is Agile Testing?
Anand Bagmar
 
Client-side Performance Testing
Client-side Performance TestingClient-side Performance Testing
Client-side Performance Testing
Thoughtworks
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
Anand Bagmar
 
ATAGTR2017 Analytics Testing
ATAGTR2017 Analytics TestingATAGTR2017 Analytics Testing
ATAGTR2017 Analytics Testing
Agile Testing Alliance
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
Data Analytics Project Plan
Data Analytics Project PlanData Analytics Project Plan
Data Analytics Project Plan
Jelilat Adesiyan
 
Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015
Thoughtworks
 
Gauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harrisGauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harris
VodqaBLR
 
Test automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystemsTest automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystems
VodqaBLR
 
Introduction to Gauge
Introduction to GaugeIntroduction to Gauge
Introduction to Gauge
vodqancr
 
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Luke Stevens
 
How to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsHow to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and tools
Mayank Solanki
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
QA InfoTech
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
vodqancr
 
Strategies for Distributed Agile Testing
Strategies for Distributed Agile TestingStrategies for Distributed Agile Testing
Strategies for Distributed Agile Testing
Anand Bagmar
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
Impetus Technologies
 
Mind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing DaysMind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing Days
Huib Schoots
 
Client-Side Performance Testing
Client-Side Performance TestingClient-Side Performance Testing
Client-Side Performance Testing
Anand Bagmar
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
Qualitest
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
ROHIT KHARABE
 
What is Agile Testing?
What is Agile Testing?What is Agile Testing?
What is Agile Testing?
Anand Bagmar
 
Client-side Performance Testing
Client-side Performance TestingClient-side Performance Testing
Client-side Performance Testing
Thoughtworks
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
Anand Bagmar
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 

Similar to Divide and-conquer approach towards data analytics testing (20)

MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
ClarkTony
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
ClarkTony
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Machine learning operations model book mlops
Machine learning operations model book mlopsMachine learning operations model book mlops
Machine learning operations model book mlops
RuyPerez1
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
DWBI Testing and Analytics Testing Services
DWBI Testing and Analytics Testing ServicesDWBI Testing and Analytics Testing Services
DWBI Testing and Analytics Testing Services
CODETRU Software Solutions
 
Latest Oracle Data Science Professional (1Z0-1110-24) Exam Dumps 2024 updated
Latest Oracle Data Science Professional (1Z0-1110-24)  Exam Dumps 2024 updatedLatest Oracle Data Science Professional (1Z0-1110-24)  Exam Dumps 2024 updated
Latest Oracle Data Science Professional (1Z0-1110-24) Exam Dumps 2024 updated
SkillCertProExams
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
sivam_1
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
llangit
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS
 
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
DevOps.com
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
RTTS
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
Slava Kokaev
 
MDL UGM April 2007
MDL UGM April 2007MDL UGM April 2007
MDL UGM April 2007
Chris Waller
 
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
CdactX Technologies, Ltd.
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ - Consortium for IT Software Quality
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
ClarkTony
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
ClarkTony
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Machine learning operations model book mlops
Machine learning operations model book mlopsMachine learning operations model book mlops
Machine learning operations model book mlops
RuyPerez1
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
Latest Oracle Data Science Professional (1Z0-1110-24) Exam Dumps 2024 updated
Latest Oracle Data Science Professional (1Z0-1110-24)  Exam Dumps 2024 updatedLatest Oracle Data Science Professional (1Z0-1110-24)  Exam Dumps 2024 updated
Latest Oracle Data Science Professional (1Z0-1110-24) Exam Dumps 2024 updated
SkillCertProExams
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
sivam_1
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
llangit
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS
 
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
DevOps.com
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
RTTS
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
Slava Kokaev
 
MDL UGM April 2007
MDL UGM April 2007MDL UGM April 2007
MDL UGM April 2007
Chris Waller
 
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
CdactX Technologies, Ltd.
 
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQAutomate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
Automate ETL Testing, Data Warehouse & Migration Testing The Agile Way - iceDQ
iceDQ
 
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ - Consortium for IT Software Quality
 
Ad

Recently uploaded (20)

Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Gojek Clone App for Multi-Service Business
Gojek Clone App for Multi-Service BusinessGojek Clone App for Multi-Service Business
Gojek Clone App for Multi-Service Business
XongoLab Technologies LLP
 
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdfHow to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
victordsane
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.pptPassive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
IES VE
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdfHow to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
victordsane
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
sequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineeringsequencediagrams.pptx software Engineering
sequencediagrams.pptx software Engineering
aashrithakondapalli8
 
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.pptPassive House Canada Conference 2025 Presentation [Final]_v4.ppt
Passive House Canada Conference 2025 Presentation [Final]_v4.ppt
IES VE
 
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studiesTroubleshooting JVM Outages – 3 Fortune 500 case studies
Troubleshooting JVM Outages – 3 Fortune 500 case studies
Tier1 app
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business StageA Comprehensive Guide to CRM Software Benefits for Every Business Stage
A Comprehensive Guide to CRM Software Benefits for Every Business Stage
SynapseIndia
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Adobe Media Encoder Crack FREE Download 2025
Adobe Media Encoder  Crack FREE Download 2025Adobe Media Encoder  Crack FREE Download 2025
Adobe Media Encoder Crack FREE Download 2025
zafranwaqar90
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Sequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptxSequence Diagrams With Pictures (1).pptx
Sequence Diagrams With Pictures (1).pptx
aashrithakondapalli8
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Ad

Divide and-conquer approach towards data analytics testing

Editor's Notes

  • #3: Data Analytics : Process of collecting and examining the data with the goal of discovering useful information. Exploratory data analytics : log file analysis Driven by a specific problem statement : Market Basket Analysis Not always a decision making system, but sometimes a decision support system. Process : Collection: data gathered from various sources like online sources, survey data, satellites in raw format etc. Processing: Organize data in standard format Analysis: Build mathematical models fitting the existing data; Use these models to infer results for new data Visualization: Results communicated in the form of tables, graphs and charts
  • #4: Lets take some examples : Banks : analyze withdrawal and spending patterns to prevent fraud or identity theft E-commerce : companies examine the navigation patterns to determine the customers buying patterns based upon their previous purchases Energy : Industries are looking into how energy consumptions and operation costs could be optimized within a facility Yes, Data analysis is the lifeline of any business, No business can sustain without analyzing the available data. Data analytics is used in many industries to allow companies and organization to make better business decisions
  • #5: Testing plays a very crucial role in building a data analytics product. Lifeline of any progressive business Critical in making informed decisions for business planning Complexities of domain, computation, volume variety needs to be tackled with a planned testing approach
  • #6: Data Validation : Ensuring that the data is of right quality throughout the process Various stages of data flow : gathering, representing, cleansing and transforming etc Model Implementation : This is very crucial part and in depth domain knowledge is needed Validate if the model chosen is relevant for the respective domain Understanding the statistical model thoroughly with every parameters involved in computation Validating if the computations are implemented as required with right understanding Business perspective : Data is available, analysis is performed and some results are out. Now how to share it with the business ? Need to have a clear vision on business problem that we are trying to solve Its very important to have the business perspective here to ensure that the data represented serves the purpose What kind of charts/graphs are to be displayed, what level of data aggregations are required and is the UI intuitive ?
  • #7: Raw Data : Gather data in raw format ETL : Process and organize data: Extract data from multiple sources Transform into the required format. Load the data into database Simulation: Initial Analysis resulting in modeling which in turn results in model parameters Models implementation : Applying the statistical models or algorithms & computations Aggregation : Data analysis and computations happens at the granular level data needs to be aggregated at various hierarchies & different levels as per the business requirement Visualization : Communicate results of the analyzed data through visualization techniques Effective visual communication through tables, graphs and charts
  • #8: Format : Is the data provided in the required format - csv or excel format How many files or worksheet, what sort of data in each sheet , data types Text casing, data formats, number formats etc Consistency : * Data needs to be consistent across eg: there is a sales data in a particular city, but the city entry is not present in the reference data, a cheque is cleared , but no corresponding money transaction Completeness : Data is complete as expected : every data has mandatory and optional aspect. Like in a customer data name, phone & email are mandatory & address might be optional For example, In an retail data, an inventory table might show 5 units reduced, whereas the corresponding sales data might not reflect the sales of the same, so some data might be missing here.
  • #9: Post-ETL Validations : Meta Data: Ensuring the data model design is aligned with the real world domain Includes testing of data type check, data length check and index/constraint check Validating the data modelling : dimensions & facts Transformation : Validate whether the data values transformed are the expected data values. Validating the data transformation rules and source to target mapping Usually performed by validate counts, aggregates and actual data between the source and target Quality : Includes the data checks (text case, special characters, number checks/ precision, date format etc) Data constraints checks – ensuring the data transformation is according to the model like foreign key constraints, unique key constraints, null value etc To ensure all the expected data is loaded in the DB completely Business Specific : Business-side validations, domain specific, possible values Client agnostic as well as client-specific data checks
  • #10: Model Validation : Validating if the model chosen is relevant to the domain Performed by applying a model with past historic data Uses statistical metrics like R2 etc. Implementation : Understanding the logic behind the model/algorithms Getting the right values for the model parameters Computation : Validating the core analytics engine’s step wise computation
  • #11: Aggregation : Data should be aggregated at the required hierarchy level Relevant data as per scope has to be considered for aggregation Summarized values as per the computation for the above selected data should be validated
  • #12: UI Validations : Ensuring the correct data representation in the for of tables, charts and graphs Validating the format of representation – units, scale, alignment, unit conversion etc Usability testing aspect w.r.t the tables, graphs, chart : color combinations, filtering, UI interaction etc
  • #13: Initial Client data flow : setting predefined data template pre-validations before data handover Domain Knowledge : Domain intensive : KT sessions within team and validating the understanding with SME‘s Mimicking the simulation calculations in excel with a smaller dataset to thoroughly understanding Business involvement : Providing the test dataset closer to the real time data Prioritizing the test scenarios to get real user experience
  • #14: Implementation No easy way to come up with expected data, so decided on parallel implementation Business involvement in testing the model implementation Computation/performance Understanding the transformations, data explosions, data representation & the table joins Analyzing the factors involved in computation which influence the time/memory
  • #15: Test data : what subset of data would suffice to get the best data distribution, bridging gap between ideal & real world data coming up with edge case dataset Testing process : Testing data at every stage of data transformation Defect investigation with QA/Dev pairing Tools : Choice of tools to fit the purpose and intended for the users of the tool Spreadsheet gear, Excel macros, App manager Automation : DB structure varies per client, Generic (metadata SQLs) and Client specific tests, too many data combinations – so data driven framework Xml test data to segregate the data for various Clients Execution : Due to h/w, memory and time constraints, cautiously organize the test execution in CI Though automation was implemented at every stage, we cautiously decided on, to what extent automation coverage is required at each stage and accordingly decided the test execution frequency Divide & conquer QA/Dev pairing Data combination : system used by multiple users with differing background – varying metadata Test data in xml to support this 20% of possible dataset to cover 80% of the common use cases SME involvement in edge case Automation at every layer : cautious in deciding to what extent of automation Execution frequency : resource usage & computation time and SME availability Choice of tools
  翻译: