SlideShare a Scribd company logo
APACHE HIVE
(Apache Hadoop Sub Project)


Agenda:
 Story – Making of Apache Hive
 What is Apache Hive
 Physical Layout
 Hive CLI
 Hive QL
Introduction to Apache Hive
Can Elephants Fly?




Concern: Can hadoop be used more efficiently/fruitfully by developers?

                 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   3
Introduction to Apache Hive
Thinking…. ?
Step 1. Give him Wings




                                                        Mr. Hadoop energizing himself.




         © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved                           5
Thinking… ?
Step 2. Pray to Gravity

Thanks to gravity, sky never fell down on us ;)
But wait 2012 is not yet over. Keep Praying.




                     Mr. Hadoop enjoying his first air ride.

   “God did not create the universe, gravity did” - Stephen Hawking

                   © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   6
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   7
Upshot of the down-fall




              Victims                                                          Mr. Hadoo
                                                                                        p – The Fly
                                                                                                   ing Elephan
                                                                                                              t


Blame Gravity! The Fall will have a huge impact.




                           © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved                                  8
Introduction to Apache Hive
Saving Life…
                                  Step1. Shrink


BEFORE -




          ACME Elephant Shrinker


AFTER -


                        © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   10
Saving Life…
Step2. Genetic Engineering & a bit of magic
         BEFORE                                                     AFTER




                                             Mr. Hadoop

                                                                    Ms. Hive




                    Injecting Insecto-receptors



            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved              11
Introduction to Apache Hive
Behind the scenes…?




Hive was initially developed by Facebook.


 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   13
 Hive is a datawarehouse infrastructure built
  on top of hadoop.
 Supports analysis of large datasets stored in
  Hadoop compatible file systems like HDFS,
  Amazon S3 fs.
 Provides SQL-like query language called
  HiveQL.
 To accelerate queries, it provides indexing.


            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   14
   Warehouse directory in hdfs
     /user/hive/warehouse
   Tables ~ Subdirectories of warehouse
   Partitions ~ Subdirectories of corresponding
    Table directory.




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   15
 Hive Queries are implicitly converted to map-
  reduce code by hive engine.
 Compiler translates all the queries into a
  directed acyclic graph of map-reduce jobs.
 These map-reduce jobs are sent to hadoop
  for execution.



            © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   16
   /user/hive directory is created automatically as soon
    as hive session is started first time.
   /user/hive/warehouse directory shall be accessible
    by all.
     hadoop dfs -chmod –R 1777 /user/hive/warehouse
   Recommended to activate sticky bit if supported by
    the hadoop version installed on cluster.
   /tmp directory shall also be made as a sticky
    directory.
     hadoop dfs –chmod –R 1777 /tmp

                © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   17
   Hive CLI(Command Line Interface) can be
    invoked by hive command.
     % hive




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   18
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   19
Introduction to Apache Hive
 DML’s
  ▪ Select
 DDL’s
  ▪ SHOW TABLES
  ▪ CREATE TABLE
  ▪ ALTER TABLE
  ▪ DROP TABLE




          © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   21
Introduction to Apache Hive
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   23
   Normal Tables are created under warehouse
    directory. (source Data migrates to warehouse)
   Normal Tables are directly visible through hdfs
    directory browsing.
   On Dropping a normal table, the source data and
    table meta data both are deleted.
   External Tables read directly from hdfs files.
   External tables not visible in warehouse
    directory.
   On Dropping an external table, only the meta
    data is deleted but not the source data.

              © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   24
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   25
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   26
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   27
 Hive QL supports Joins on only equality
  expressions. Complex boolean expressions,
  inequality conditions are not supported.
 More than 2 tables can be joined.
 Number of map-reduce jobs generated for a
  join depend on the columns being used.
     If same col is used for all the tables, then n=1
     Otherwise n>1


                © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   28
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   29
 HiveQL Doesn’t follow SQL-92 standard
 Lack support
     No Materialized views
     No Transaction level support
     Limited Sub-query support




               © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   30
Hadoop – Entering into the new world!




    © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   31
Reach me




                    Tapan Avasthi
Associate Software Developer Intern, Travelocity Global
           tapan.avasthi@travelocity.com
             tapan.k.avasthi@gmail.com


        © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved   32
Ad

More Related Content

What's hot (20)

Hive paris
Hive parisHive paris
Hive paris
Szehon Ho
 
Hive on mesos Strata
Hive on mesos StrataHive on mesos Strata
Hive on mesos Strata
Szehon Ho
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
 
Oracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the CloudOracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the Cloud
EDB
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
DataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Adam Doyle
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
Modern Data Stack France
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
Pietro Michiardi
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
Andrejs Vorobjovs
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
Ravi Mutyala
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
Hive on mesos Strata
Hive on mesos StrataHive on mesos Strata
Hive on mesos Strata
Szehon Ho
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
Caserta
 
Oracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the CloudOracle Migration to Postgres in the Cloud
Oracle Migration to Postgres in the Cloud
EDB
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
DataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Adam Doyle
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
Modern Data Stack France
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
Pietro Michiardi
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
Hortonworks
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
Ravi Mutyala
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 

Similar to Introduction to Apache Hive (20)

Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Sematext Group, Inc.
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
scoopnewsgroup
 
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
 
Building infrastructure for Big Data
Building infrastructure for Big DataBuilding infrastructure for Big Data
Building infrastructure for Big Data
PromptCloud
 
Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013
Andy Hall
 
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
mfrancis
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management Simplicity
DataWorks Summit
 
Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)
Ontico
 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
DataWorks Summit
 
OWF12/Java Sacha labourey
OWF12/Java Sacha laboureyOWF12/Java Sacha labourey
OWF12/Java Sacha labourey
Paris Open Source Summit
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
Data Science London
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
DataWorks Summit
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Michael Arnold
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
Michael Chaize
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCrafteFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
Dropbox
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Cloudera, Inc.
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Sematext Group, Inc.
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
EMC
 
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
Track B-2: Advancing Collaboration & eLearning to Achieve Mission Goals, ...
scoopnewsgroup
 
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
 
Building infrastructure for Big Data
Building infrastructure for Big DataBuilding infrastructure for Big Data
Building infrastructure for Big Data
PromptCloud
 
Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013Node.js and Photoshop Generator - JSConf Asia 2013
Node.js and Photoshop Generator - JSConf Asia 2013
Andy Hall
 
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
mfrancis
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management Simplicity
DataWorks Summit
 
Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)Go daddy.com Cloud Storage Solution (Adam Knapp)
Go daddy.com Cloud Storage Solution (Adam Knapp)
Ontico
 
HBase and Hadoop at Adobe
HBase and Hadoop at AdobeHBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
DataWorks Summit
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
Data Science London
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Michael Arnold
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
Michael Chaize
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCrafteFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
eFolder Webinar: How One Partner Leverages Dell AppAssure and StorageCraft
Dropbox
 
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
Cloudera, Inc.
 
Ad

Recently uploaded (20)

Assurance Best Practices: Unlocking Proactive Network Operations
Assurance Best Practices: Unlocking Proactive Network OperationsAssurance Best Practices: Unlocking Proactive Network Operations
Assurance Best Practices: Unlocking Proactive Network Operations
ThousandEyes
 
Stretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacentersStretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacenters
ShapeBlue
 
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
UXPA Boston
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
RDM Training: Publish research data with the Research Data Repository
RDM Training: Publish research data with the Research Data RepositoryRDM Training: Publish research data with the Research Data Repository
RDM Training: Publish research data with the Research Data Repository
CSUC - Consorci de Serveis Universitaris de Catalunya
 
DNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in NepalDNF 2.0 Implementations Challenges in Nepal
DNF 2.0 Implementations Challenges in Nepal
ICT Frame Magazine Pvt. Ltd.
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
AI and Gender: Decoding the Sociological Impact
AI and Gender: Decoding the Sociological ImpactAI and Gender: Decoding the Sociological Impact
AI and Gender: Decoding the Sociological Impact
SaikatBasu37
 
John Carmack’s Slides From His Upper Bound 2025 Talk
John Carmack’s Slides From His Upper Bound 2025 TalkJohn Carmack’s Slides From His Upper Bound 2025 Talk
John Carmack’s Slides From His Upper Bound 2025 Talk
Razin Mustafiz
 
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStackProposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
ShapeBlue
 
Pushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K HostsPushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K Hosts
ShapeBlue
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025
Scott Keck-Warren
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Agentic AI, A Business Overview - May 2025
Agentic AI, A Business Overview - May 2025Agentic AI, A Business Overview - May 2025
Agentic AI, A Business Overview - May 2025
Peter Morgan
 
CloudStack + KVM: Your Local Cloud Lab
CloudStack + KVM:   Your Local Cloud LabCloudStack + KVM:   Your Local Cloud Lab
CloudStack + KVM: Your Local Cloud Lab
ShapeBlue
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Assurance Best Practices: Unlocking Proactive Network Operations
Assurance Best Practices: Unlocking Proactive Network OperationsAssurance Best Practices: Unlocking Proactive Network Operations
Assurance Best Practices: Unlocking Proactive Network Operations
ThousandEyes
 
Stretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacentersStretching CloudStack over multiple datacenters
Stretching CloudStack over multiple datacenters
ShapeBlue
 
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
Outcome Over Output: How UXers Can Leverage an Outcome-Based Mindset by Malin...
UXPA Boston
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
AI and Gender: Decoding the Sociological Impact
AI and Gender: Decoding the Sociological ImpactAI and Gender: Decoding the Sociological Impact
AI and Gender: Decoding the Sociological Impact
SaikatBasu37
 
John Carmack’s Slides From His Upper Bound 2025 Talk
John Carmack’s Slides From His Upper Bound 2025 TalkJohn Carmack’s Slides From His Upper Bound 2025 Talk
John Carmack’s Slides From His Upper Bound 2025 Talk
Razin Mustafiz
 
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStackProposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStack
ShapeBlue
 
Pushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K HostsPushing the Limits: CloudStack at 25K Hosts
Pushing the Limits: CloudStack at 25K Hosts
ShapeBlue
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025Reducing Bugs With Static Code Analysis php tek 2025
Reducing Bugs With Static Code Analysis php tek 2025
Scott Keck-Warren
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Agentic AI, A Business Overview - May 2025
Agentic AI, A Business Overview - May 2025Agentic AI, A Business Overview - May 2025
Agentic AI, A Business Overview - May 2025
Peter Morgan
 
CloudStack + KVM: Your Local Cloud Lab
CloudStack + KVM:   Your Local Cloud LabCloudStack + KVM:   Your Local Cloud Lab
CloudStack + KVM: Your Local Cloud Lab
ShapeBlue
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Ad

Introduction to Apache Hive

  • 1. APACHE HIVE (Apache Hadoop Sub Project) Agenda:  Story – Making of Apache Hive  What is Apache Hive  Physical Layout  Hive CLI  Hive QL
  • 3. Can Elephants Fly? Concern: Can hadoop be used more efficiently/fruitfully by developers? © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 3
  • 5. Thinking…. ? Step 1. Give him Wings Mr. Hadoop energizing himself. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 5
  • 6. Thinking… ? Step 2. Pray to Gravity Thanks to gravity, sky never fell down on us ;) But wait 2012 is not yet over. Keep Praying. Mr. Hadoop enjoying his first air ride. “God did not create the universe, gravity did” - Stephen Hawking © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 6
  • 7. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 7
  • 8. Upshot of the down-fall Victims Mr. Hadoo p – The Fly ing Elephan t Blame Gravity! The Fall will have a huge impact. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 8
  • 10. Saving Life… Step1. Shrink BEFORE - ACME Elephant Shrinker AFTER - © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 10
  • 11. Saving Life… Step2. Genetic Engineering & a bit of magic BEFORE AFTER Mr. Hadoop Ms. Hive Injecting Insecto-receptors © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 11
  • 13. Behind the scenes…? Hive was initially developed by Facebook. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 13
  • 14.  Hive is a datawarehouse infrastructure built on top of hadoop.  Supports analysis of large datasets stored in Hadoop compatible file systems like HDFS, Amazon S3 fs.  Provides SQL-like query language called HiveQL.  To accelerate queries, it provides indexing. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 14
  • 15. Warehouse directory in hdfs  /user/hive/warehouse  Tables ~ Subdirectories of warehouse  Partitions ~ Subdirectories of corresponding Table directory. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 15
  • 16.  Hive Queries are implicitly converted to map- reduce code by hive engine.  Compiler translates all the queries into a directed acyclic graph of map-reduce jobs.  These map-reduce jobs are sent to hadoop for execution. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 16
  • 17. /user/hive directory is created automatically as soon as hive session is started first time.  /user/hive/warehouse directory shall be accessible by all.  hadoop dfs -chmod –R 1777 /user/hive/warehouse  Recommended to activate sticky bit if supported by the hadoop version installed on cluster.  /tmp directory shall also be made as a sticky directory.  hadoop dfs –chmod –R 1777 /tmp © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 17
  • 18. Hive CLI(Command Line Interface) can be invoked by hive command.  % hive © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 18
  • 19. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 19
  • 21.  DML’s ▪ Select  DDL’s ▪ SHOW TABLES ▪ CREATE TABLE ▪ ALTER TABLE ▪ DROP TABLE © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 21
  • 23. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 23
  • 24. Normal Tables are created under warehouse directory. (source Data migrates to warehouse)  Normal Tables are directly visible through hdfs directory browsing.  On Dropping a normal table, the source data and table meta data both are deleted.  External Tables read directly from hdfs files.  External tables not visible in warehouse directory.  On Dropping an external table, only the meta data is deleted but not the source data. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 24
  • 25. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 25
  • 26. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 26
  • 27. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 27
  • 28.  Hive QL supports Joins on only equality expressions. Complex boolean expressions, inequality conditions are not supported.  More than 2 tables can be joined.  Number of map-reduce jobs generated for a join depend on the columns being used.  If same col is used for all the tables, then n=1  Otherwise n>1 © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 28
  • 29. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 29
  • 30.  HiveQL Doesn’t follow SQL-92 standard  Lack support  No Materialized views  No Transaction level support  Limited Sub-query support © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 30
  • 31. Hadoop – Entering into the new world! © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 31
  • 32. Reach me Tapan Avasthi Associate Software Developer Intern, Travelocity Global tapan.avasthi@travelocity.com tapan.k.avasthi@gmail.com © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 32
  翻译: