SlideShare a Scribd company logo
How to handle Dynamic Width File in Spark
Dynamic WidthFile is a common type of source fromMainframe sources;The Belowdemonstrationis one of the efficient
ways to handle dynamic widthFile usingScala, Spark RDDandDataframe. Check thiscode, Execute in your REPL.
Source File
Schema of the File
Code to be Executed
Dataframe Schema
Registeringas Temp Table and Show the Data
ImplementingAnalytical Queryinto the temptable
SELECT id,fname,lname,CAST(sum(subject_wise_marks.marks)/numberofsubjectasDouble) as
percentage FROMscore LATERALVIEW explode(subjectwisemarks) marks_tableas
subject_wise_marksgroupbyid,fname,lname,numberofsubject;
Result
Ad

More Related Content

What's hot (19)

Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
Berlin buzzwords 2018
Berlin buzzwords 2018Berlin buzzwords 2018
Berlin buzzwords 2018
Matija Gobec
 
Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
Dr Anjan Krishnamurthy
 
Cloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deploymentCloud Strategy Architecture for multi country deployment
Cloud Strategy Architecture for multi country deployment
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2
Mahesh Vallampati
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Talend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training CurriculumTalend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training Curriculum
Bharat Khanna
 
11i Logs
11i Logs11i Logs
11i Logs
Mahesh Vallampati
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Databricks
 
Excel Database Function
Excel Database FunctionExcel Database Function
Excel Database Function
Anita Shah
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
Mark Kromer
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
Sakthi Dasans
 
Data Quality, Correctness and Dynamic Transformations using Spark and Scala
Data Quality, Correctness and Dynamic Transformations using Spark and ScalaData Quality, Correctness and Dynamic Transformations using Spark and Scala
Data Quality, Correctness and Dynamic Transformations using Spark and Scala
Subhasish Guha
 
Lists
ListsLists
Lists
Ghaffar Khan
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
koolkampus
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
Berta Hermida Plaza
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
Mark Kromer
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
Berlin buzzwords 2018
Berlin buzzwords 2018Berlin buzzwords 2018
Berlin buzzwords 2018
Matija Gobec
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2
Mahesh Vallampati
 
Talend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training CurriculumTalend Open Studio For Data Integration Training Curriculum
Talend Open Studio For Data Integration Training Curriculum
Bharat Khanna
 
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Databricks
 
Excel Database Function
Excel Database FunctionExcel Database Function
Excel Database Function
Anita Shah
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
Mark Kromer
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
Sakthi Dasans
 
Data Quality, Correctness and Dynamic Transformations using Spark and Scala
Data Quality, Correctness and Dynamic Transformations using Spark and ScalaData Quality, Correctness and Dynamic Transformations using Spark and Scala
Data Quality, Correctness and Dynamic Transformations using Spark and Scala
Subhasish Guha
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
koolkampus
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
Berta Hermida Plaza
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
Mark Kromer
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
 

Viewers also liked (14)

javier lasa: calculos videoweb2010
javier lasa: calculos videoweb2010javier lasa: calculos videoweb2010
javier lasa: calculos videoweb2010
Gonzalo Martín
 
Imaflora ras final
Imaflora ras finalImaflora ras final
Imaflora ras final
Leonardo Assad Aoun
 
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Introspecta Taller Orientación Vocacional
 
Introdução a linguagem Swift
Introdução a linguagem SwiftIntrodução a linguagem Swift
Introdução a linguagem Swift
Gabriel Rodrigues
 
nd-grad-cert
nd-grad-certnd-grad-cert
nd-grad-cert
hammadi ilyes ahmed
 
Oficiais Aprovados
Oficiais AprovadosOficiais Aprovados
Oficiais Aprovados
Hugo Machado
 
How society and technologies influence User Interfaces
How society and technologies influence User InterfacesHow society and technologies influence User Interfaces
How society and technologies influence User Interfaces
Marianne Abreu
 
IBD 2016 QI symposium 2-23-2016
IBD 2016 QI symposium 2-23-2016IBD 2016 QI symposium 2-23-2016
IBD 2016 QI symposium 2-23-2016
Tiawana Thompson, MBA
 
La filosofía gracia antigua
La filosofía gracia antiguaLa filosofía gracia antigua
La filosofía gracia antigua
cintiazapanaquispe
 
update of IBD 2016 by Mohammed Hussien Ahmed
 update of IBD 2016 by Mohammed Hussien Ahmed  update of IBD 2016 by Mohammed Hussien Ahmed
update of IBD 2016 by Mohammed Hussien Ahmed
Kafrelsheiekh University
 
Definition Nature Scope and Significance of Economics, Business Economics - D...
Definition Nature Scope and Significance of Economics, Business Economics - D...Definition Nature Scope and Significance of Economics, Business Economics - D...
Definition Nature Scope and Significance of Economics, Business Economics - D...
Divyansh Agrawal
 
Prashant Vichare Resume
Prashant Vichare ResumePrashant Vichare Resume
Prashant Vichare Resume
Prashant Vichare
 
Amit_Kumar_CV
Amit_Kumar_CVAmit_Kumar_CV
Amit_Kumar_CV
Amit Kumar
 
GiIT 4th CRC 2017.
GiIT 4th CRC 2017.GiIT 4th CRC 2017.
GiIT 4th CRC 2017.
Shaikhani.
 
javier lasa: calculos videoweb2010
javier lasa: calculos videoweb2010javier lasa: calculos videoweb2010
javier lasa: calculos videoweb2010
Gonzalo Martín
 
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Descubre tu Vocación: Licenciatura Filosofía | Panorama laboral | ¿Cuánto gan...
Introspecta Taller Orientación Vocacional
 
Introdução a linguagem Swift
Introdução a linguagem SwiftIntrodução a linguagem Swift
Introdução a linguagem Swift
Gabriel Rodrigues
 
Oficiais Aprovados
Oficiais AprovadosOficiais Aprovados
Oficiais Aprovados
Hugo Machado
 
How society and technologies influence User Interfaces
How society and technologies influence User InterfacesHow society and technologies influence User Interfaces
How society and technologies influence User Interfaces
Marianne Abreu
 
update of IBD 2016 by Mohammed Hussien Ahmed
 update of IBD 2016 by Mohammed Hussien Ahmed  update of IBD 2016 by Mohammed Hussien Ahmed
update of IBD 2016 by Mohammed Hussien Ahmed
Kafrelsheiekh University
 
Definition Nature Scope and Significance of Economics, Business Economics - D...
Definition Nature Scope and Significance of Economics, Business Economics - D...Definition Nature Scope and Significance of Economics, Business Economics - D...
Definition Nature Scope and Significance of Economics, Business Economics - D...
Divyansh Agrawal
 
GiIT 4th CRC 2017.
GiIT 4th CRC 2017.GiIT 4th CRC 2017.
GiIT 4th CRC 2017.
Shaikhani.
 
Ad

Dynamic Width File in Spark_2016

  • 1. How to handle Dynamic Width File in Spark Dynamic WidthFile is a common type of source fromMainframe sources;The Belowdemonstrationis one of the efficient ways to handle dynamic widthFile usingScala, Spark RDDandDataframe. Check thiscode, Execute in your REPL. Source File Schema of the File Code to be Executed
  • 2. Dataframe Schema Registeringas Temp Table and Show the Data ImplementingAnalytical Queryinto the temptable SELECT id,fname,lname,CAST(sum(subject_wise_marks.marks)/numberofsubjectasDouble) as percentage FROMscore LATERALVIEW explode(subjectwisemarks) marks_tableas subject_wise_marksgroupbyid,fname,lname,numberofsubject; Result
  翻译: