SlideShare a Scribd company logo
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Big Data & Hadoop
D. Praveen Kumar
Junior Research Fellow
Department of Computer Science & Engineering
Indian Institute of Technology (Indian School of Mines)
Dhanbad, Jharkhand, India
Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati.
March 25, 2017
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 1 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
1 Introduction
2 Big Data
3 Sources of Big Data
4 Tools
5 HDFS
6 Installation
7 Configuration
8 Starting & Stopping
9 Map Reduce
10 Execution
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 2 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data
Data means a value or set of values.
Examples:
march 1st 2017
20, 30, 40
ΨΦϕ
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 3 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Information
Meaningful or preprocessed data we called as Information.
Examples:
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 4 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data Types
The kind of data that may appear in a computer.
Examples: int
float
char
double
Abstract data types -user defined data types.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 5 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Traditional approaches
Traditional approaches to store and process the data
1 File system
2 RDBMS (Relational Database Management Systems)
3 Data Warehouse & Mining Tools
4 Grid Computing
5 Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 6 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =4
Transportation from railway station to your
home( one Auto/car is sufficient)
mom can prepare food or snacks without risk.
Your house is sufficient for Accommodation.
Facilities like bed, bathrooms, water and TV are
provided which you use.
You can talk to each other and crack jokes and
you can make them happy
Expenditure is nearly Rs.1000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 7 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =100
Transportation = 25 autos/car or two
buses
Food = catering.
Accommodation = Lodge.
Facilities = AC, TV, and all other facilities
Maintenance= somewhat difficult
Expenditure =nearly Rs. 90,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 8 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =10000
Transportation = 2500 autos or 500 buses
Food = catering.
Accommodation = all Lodges, function
halls and cottages in the town.
Facilities = AC, TV, and all other
facilities are somewhat difficult to provide.
Maintenance= more difficult
Expenditure =nearly Rs. 2,00,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 9 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Grid Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 10 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 11 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
GUESTS =10000000
Transportation=how many autos=?
Food =?
Accommodation =?
Facilities =?
Maintenance=?
Cost =?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 12 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Problems
Same we assume in computing environment
Difficult to handle a huge and ever growing amount of data
Processing of data can not be possible with few machines
distributing large data sets is difficult
Construction of online or offline models are very difficult
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 13 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Solution
A single solution to all these problems is
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 14 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
What is Big Data?
Big data refers to voluminous amounts of structured or
unstructured data that organizations can potentially mine and
analyze.
Big data is huge amount of large data sets characterized by
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 15 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Data generation
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 16 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
How Data generated
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 17 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Internet of Events
Internet is the main source to generating the wast amount of data.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 18 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
4 Internet of Events
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 19 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
4 Questions of Data Analysts
1 What happened?
2 Why did it happen?
3 What will happen?
4 What is the best that can happen?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 20 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Big Data Platforms and Analytical Software
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 21 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
Here we go with
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 22 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop History
Hadoop was created by Doug Cutting, creator of Lucene.
He also involved in a project called Nutch. (It is basic version
of hadoop)
Nutch is a combination of MapReduce and NDFS (Nutch
Distributed File System)
Later Nutch renamed to Hadoop. (Mapreduce + HDFS
(Hadoop Distributed File System))
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 23 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
Apache Hadoop is an open-source software framework for
distributed storage and distributed processing of very large data
sets on computer clusters built from commodity hardware.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 24 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop
The base Apache Hadoop framework is composed of the following
modules:
Hadoop Common contains libraries and utilities needed by
other Hadoop modules
Hadoop Distributed File System (HDFS) a distributed
file-system that stores data
Hadoop YARN a resource-management platform
Hadoop MapReduce for large scale data processing.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 25 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 26 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 27 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Goals
The design goals of HDFS
1 Very Large files
2 Streaming Data Access
3 Commodity Hardware
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 28 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Failed in
HDFS is Not FIT for
1 Lots of small files
2 Low latency database access
3 Multiple writers, arbitrary file modifications
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 29 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
HDFS- Concepts
1 Blocks
2 Namenodes
3 Datanodes
4 HDFS Federation
5 HDFS High Availability
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 30 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=
14.04)
Hadoop framework
Optional
Eclipse
Internet connection
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 31 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Java 7 & Installation
Hadoop requires a working Java installation. However, using
java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 32 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directory
gedit ~/.bashrc
Add below line at the end:
export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 33 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its
nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using following command
sudo apt-get install ssh
First, we have to generate DSA an SSH key for user.
ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 34 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of your
choice. I picked /usr/local/hadoop.
$ cd /usr/local
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 35 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 36 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Create temp file, DataNode & NameNode
Execute below commands to create NameNode
mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 37 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 38 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 9000 < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 39 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For
Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 40 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 41 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 42 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Formatting the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose all
the data currently in HDFS
To format the filesystem, run the command
hadoop namenode -format
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 43 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Starting single-node cluster
Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 44 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Stopping your single-node cluster
Run the command
stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 45 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Map-Reduce Framework
Map Reduce programming paradigm
It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodes
where data is already present
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 46 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Map-Reduce Computation Steps
The key-value pairs from each Map task are collected by a
master controller and sorted by key. The keys are divided
among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combine
all the values associated with that key in some way. The
manner of combination of values is determined by the code
written by the user for the Reduce function.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 47 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop - MapReduce
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 48 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Hadoop - MapReduce (Word Count) Example
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 49 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
MapReduce - WordCountMapper
In WordCountMapper class we perform the following operations
Read a line from file
Split line into Words
Assign Count 1 to each word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 50 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountMapper source code
public static class WordCountMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context ) throws
IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 51 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
MapReduce - WordCountReducer
In WordCountReducer class we perform the following operations
Sum the list of values
Assign sum to corresponding word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 52 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountReducer source code
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context ) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 53 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
WordCountJob
public class WordCountJob {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountJob.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 54 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Header Files to include
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 55 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Execution of Hadoop Program in Eclipse
Step1:
1 Starting Hadoop in terminal using command:
$ Start-all.sh
2 Use JPS command to check all services of Hadoop are started
or not.
Step 2: open Eclipse
Step 3: Go to file ⇒ New ⇒ Project
Select Java Project and click on Next button
Write project name and click on Finish button
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 56 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue...
Step 4: Right side it creates a project
1 Right click on Project ⇒ New ⇒ Class
2 Write Name of Class and then Click Finish
3 Write MapReduce program in that class
Step 5: Write JAVA Program
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 57 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue...
Step 6: Importing JAR files
1 Right click on Project and select properties (Alt+Enter)
2 Select Java Build Path ⇒ Click on Libraries, then click on add
external JARS
3 Select the following jars from Hadoop library.
/usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 58 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
Continue ....
Step 7: Set input file path
1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 8: Set input and output path
1 right click on source ⇒ Run As ⇒ Run Configuration ⇒
Argument
2 Enter your input and out put path with a single space
3 click on Run
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 59 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc
thank You
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 60 / 60
Ad

More Related Content

What's hot (20)

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
Juan Fabian
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewOracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Markus Michalewicz
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marin Dimitrov
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
Christopher Foot
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
Juan Fabian
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewOracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Markus Michalewicz
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
Christopher Foot
 

Similar to Hadoop basics (20)

Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
Bhadra Gowdra
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 
How To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? EdurekaHow To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? Edureka
Edureka!
 
Hadoop Online Training
Hadoop Online TrainingHadoop Online Training
Hadoop Online Training
Nagendra Kumar
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera, Inc.
 
解讀雲端大數據新趨勢
解讀雲端大數據新趨勢解讀雲端大數據新趨勢
解讀雲端大數據新趨勢
Jazz Yao-Tsung Wang
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Giovanna Roda
 
Big Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | EdurekaBig Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | Edureka
Edureka!
 
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator NeedsResearch IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
John Towns
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
Naoki (Neo) SATO
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
restassure
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
Hong-Linh Truong
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
KCC Software Ltd. & Easylearning.guru
 
Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17Resume_Ayush Gaur_v17
Resume_Ayush Gaur_v17
Ayush Redevil Gaur
 
Hareesh
HareeshHareesh
Hareesh
Hareesh Ravulapati
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
Bhadra Gowdra
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 
How To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? EdurekaHow To Become A Big Data Engineer? Edureka
How To Become A Big Data Engineer? Edureka
Edureka!
 
Hadoop Online Training
Hadoop Online TrainingHadoop Online Training
Hadoop Online Training
Nagendra Kumar
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera, Inc.
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Giovanna Roda
 
Big Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | EdurekaBig Data Engineer Skills and Job Description | Edureka
Big Data Engineer Skills and Job Description | Edureka
Edureka!
 
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator NeedsResearch IT @ Illinois: Establishing Service Responsive to Investigator Needs
Research IT @ Illinois: Establishing Service Responsive to Investigator Needs
John Towns
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
Naoki (Neo) SATO
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
restassure
 
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
TUW-ASE-Summer 2014: Data as a Service – Concepts, Design & Implementation, a...
Hong-Linh Truong
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
Ad

Recently uploaded (20)

2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Sets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledgeSets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledge
saumyasl2020
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Sets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledgeSets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledge
saumyasl2020
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Ad

Hadoop basics

  • 1. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Big Data & Hadoop D. Praveen Kumar Junior Research Fellow Department of Computer Science & Engineering Indian Institute of Technology (Indian School of Mines) Dhanbad, Jharkhand, India Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati. March 25, 2017 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 1 / 60
  • 2. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 1 Introduction 2 Big Data 3 Sources of Big Data 4 Tools 5 HDFS 6 Installation 7 Configuration 8 Starting & Stopping 9 Map Reduce 10 Execution Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 2 / 60
  • 3. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data Data means a value or set of values. Examples: march 1st 2017 20, 30, 40 ΨΦϕ Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 3 / 60
  • 4. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Information Meaningful or preprocessed data we called as Information. Examples: Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 4 / 60
  • 5. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data Types The kind of data that may appear in a computer. Examples: int float char double Abstract data types -user defined data types. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 5 / 60
  • 6. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Traditional approaches Traditional approaches to store and process the data 1 File system 2 RDBMS (Relational Database Management Systems) 3 Data Warehouse & Mining Tools 4 Grid Computing 5 Volunteer Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 6 / 60
  • 7. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =4 Transportation from railway station to your home( one Auto/car is sufficient) mom can prepare food or snacks without risk. Your house is sufficient for Accommodation. Facilities like bed, bathrooms, water and TV are provided which you use. You can talk to each other and crack jokes and you can make them happy Expenditure is nearly Rs.1000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 7 / 60
  • 8. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =100 Transportation = 25 autos/car or two buses Food = catering. Accommodation = Lodge. Facilities = AC, TV, and all other facilities Maintenance= somewhat difficult Expenditure =nearly Rs. 90,000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 8 / 60
  • 9. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =10000 Transportation = 2500 autos or 500 buses Food = catering. Accommodation = all Lodges, function halls and cottages in the town. Facilities = AC, TV, and all other facilities are somewhat difficult to provide. Maintenance= more difficult Expenditure =nearly Rs. 2,00,000/- Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 9 / 60
  • 10. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Grid Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 10 / 60
  • 11. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Volunteer Computing Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 11 / 60
  • 12. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc GUESTS =10000000 Transportation=how many autos=? Food =? Accommodation =? Facilities =? Maintenance=? Cost =? Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 12 / 60
  • 13. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Problems Same we assume in computing environment Difficult to handle a huge and ever growing amount of data Processing of data can not be possible with few machines distributing large data sets is difficult Construction of online or offline models are very difficult Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 13 / 60
  • 14. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Solution A single solution to all these problems is Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 14 / 60
  • 15. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc What is Big Data? Big data refers to voluminous amounts of structured or unstructured data that organizations can potentially mine and analyze. Big data is huge amount of large data sets characterized by Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 15 / 60
  • 16. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Data generation Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 16 / 60
  • 17. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc How Data generated Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 17 / 60
  • 18. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Internet of Events Internet is the main source to generating the wast amount of data. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 18 / 60
  • 19. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 4 Internet of Events Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 19 / 60
  • 20. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc 4 Questions of Data Analysts 1 What happened? 2 Why did it happen? 3 What will happen? 4 What is the best that can happen? Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 20 / 60
  • 21. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Big Data Platforms and Analytical Software Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 21 / 60
  • 22. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Here we go with Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 22 / 60
  • 23. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop History Hadoop was created by Doug Cutting, creator of Lucene. He also involved in a project called Nutch. (It is basic version of hadoop) Nutch is a combination of MapReduce and NDFS (Nutch Distributed File System) Later Nutch renamed to Hadoop. (Mapreduce + HDFS (Hadoop Distributed File System)) Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 23 / 60
  • 24. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 24 / 60
  • 25. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop The base Apache Hadoop framework is composed of the following modules: Hadoop Common contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) a distributed file-system that stores data Hadoop YARN a resource-management platform Hadoop MapReduce for large scale data processing. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 25 / 60
  • 26. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Components Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 26 / 60
  • 27. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop Components Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 27 / 60
  • 28. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Goals The design goals of HDFS 1 Very Large files 2 Streaming Data Access 3 Commodity Hardware Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 28 / 60
  • 29. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Failed in HDFS is Not FIT for 1 Lots of small files 2 Low latency database access 3 Multiple writers, arbitrary file modifications Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 29 / 60
  • 30. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc HDFS- Concepts 1 Blocks 2 Namenodes 3 Datanodes 4 HDFS Federation 5 HDFS High Availability Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 30 / 60
  • 31. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Requirements Necessary Java >= 7 ssh Linux OS (Ubuntu >= 14.04) Hadoop framework Optional Eclipse Internet connection Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 31 / 60
  • 32. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Java 7 & Installation Hadoop requires a working Java installation. However, using java 1.7 or more is recommended. Following command is used to install java in linux platform sudo apt-get install openjdk-7-jdk (or) sudo apt-get install default-jdk Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 32 / 60
  • 33. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Java PATH Setup We need to set JAVA path Open the .bashrc file located in home directory gedit ~/.bashrc Add below line at the end: export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 33 / 60
  • 34. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Installation & Configuration of SSH Hadoop requires SSH(Secure Shell) access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it. Install SSH using following command sudo apt-get install ssh First, we have to generate DSA an SSH key for user. ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 34 / 60
  • 35. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Download & Extract Hadoop Download Hadoop from the Apache Download Mirrors http://mirror.fibergrid.in/apache/hadoop/common/ Extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. $ cd /usr/local $ sudo tar xzf hadoop-2.7.2.tar.gz $ sudo mv hadoop-2.7.2 hadoop Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 35 / 60
  • 36. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add Hadoop configuration in .bashrc Add Hadoop configuration in .bashrc in home directory. export HADOOP INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP INSTALL/bin export PATH=$PATH:$HADOOP INSTALL/sbin export HADOOP MAPRED HOME=$HADOOP INSTALL export HADOOP HDFS HOME=$HADOOP INSTALL export HADOOP COMMON HOME=$HADOOP INSTALL export YARN HOME=$HADOOP INSTALL export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib" Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 36 / 60
  • 37. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Create temp file, DataNode & NameNode Execute below commands to create NameNode mkdir -p /usr/local/hadoopdata/hdfs/namenode Execute below commands to create DataNode mkdir -p /usr/local/hadoopdata/hdfs/datanode Execute below code to create the tmp directory in hadoop sudo mkdir -p /app/hadoop/tmp sudo chown hadoop1:hadoop1 /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 37 / 60
  • 38. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Files to Configure The following are the files we need to configure core-site.xml hadoop-env.sh mapred-site.xml hdfs-site.xml Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 38 / 60
  • 39. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in /usr/local/hadoop/etc/core-site.xml Add the following snippets between the < configuration > ... < /configuration > tags in the core-site.xml file. Add below property to specify the location of tmp < property > < name > hadoop.tmp.dir < /name > < value > /app/hadoop/tmp < /value > < /property > Add below property to specify the location of default file system and its port number. < property > < name > fs.default.name < /name > < value > hdfs : //localhost : 9000 < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 39 / 60
  • 40. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in /usr/local/hadoop/etc/hadoop-env.sh Un-Comment the JAVA HOME and Give Correct Path For Java. export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 40 / 60
  • 41. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add property in /usr/local/hadoop/etc/hadoop/mapred-site.xml In file we add The host name and port that the MapReduce job tracker runs at. Add following in mapred-site.xml : < property > < name > mapred.job.tracker < /name > < value > localhost : 54311 < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 41 / 60
  • 42. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Add properties in ... etc/hadoop/hdfs-site.xml In file hdfs-site.xml add following: Add replication factor < property > < name > dfs.replication < /name > < value > 1 < /value > < /property > Specify the NameNode < property > < name > dfs.namenode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/namenode < /value > < /property > Specify the DataNode < property > < name > dfs.datanode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/datanode < /value > < /property > Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 42 / 60
  • 43. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Formatting the HDFS filesystem via the NameNode The first step to starting up your Hadoop installation is Formatting the Hadoop file system We need to do this the first time you set up a Hadoop. Do not format a running Hadoop filesystem as you will lose all the data currently in HDFS To format the filesystem, run the command hadoop namenode -format Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 43 / 60
  • 44. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Starting single-node cluster Run the command: start-all.sh This will startup a NameNode,SecondaryNameNode, DataNode, ResourceManager and a NodeManager on your machine. A nifty tool for checking whether the expected Hadoop processes are running is jps hadoop1@hadoop1:/usr/local/hadoop$ jps 2598 NameNode 3112 ResourceManager 3523 Jps 2917 SecondaryNameNode 2727 DataNode 3242 NodeManager Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 44 / 60
  • 45. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Stopping your single-node cluster Run the command stop-all.sh To stop all the daemons running on your machine output will be like this. stopping NodeManager localhost: stopping ResourceManager stopping NameNode localhost: stopping DataNode localhost: stopping SecondaryNameNode Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 45 / 60
  • 46. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Map-Reduce Framework Map Reduce programming paradigm It relies basically on two functions, Map and Reduce Map Reduce used to manage many large-scale computations The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. The framework to effectively schedule tasks on the nodes where data is already present Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 46 / 60
  • 47. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Map-Reduce Computation Steps The key-value pairs from each Map task are collected by a master controller and sorted by key. The keys are divided among all the Reduce tasks, so all key-value pairs with the same key wind up at the same Reduce task. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some way. The manner of combination of values is determined by the code written by the user for the Reduce function. Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 47 / 60
  • 48. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop - MapReduce Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 48 / 60
  • 49. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Hadoop - MapReduce (Word Count) Example Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 49 / 60
  • 50. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc MapReduce - WordCountMapper In WordCountMapper class we perform the following operations Read a line from file Split line into Words Assign Count 1 to each word Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 50 / 60
  • 51. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountMapper source code public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 51 / 60
  • 52. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc MapReduce - WordCountReducer In WordCountReducer class we perform the following operations Sum the list of values Assign sum to corresponding word Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 52 / 60
  • 53. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountReducer source code public static class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 53 / 60
  • 54. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc WordCountJob public class WordCountJob { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count"); job.setJarByClass(WordCountJob.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 54 / 60
  • 55. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Header Files to include import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 55 / 60
  • 56. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Execution of Hadoop Program in Eclipse Step1: 1 Starting Hadoop in terminal using command: $ Start-all.sh 2 Use JPS command to check all services of Hadoop are started or not. Step 2: open Eclipse Step 3: Go to file ⇒ New ⇒ Project Select Java Project and click on Next button Write project name and click on Finish button Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 56 / 60
  • 57. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue... Step 4: Right side it creates a project 1 Right click on Project ⇒ New ⇒ Class 2 Write Name of Class and then Click Finish 3 Write MapReduce program in that class Step 5: Write JAVA Program Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 57 / 60
  • 58. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue... Step 6: Importing JAR files 1 Right click on Project and select properties (Alt+Enter) 2 Select Java Build Path ⇒ Click on Libraries, then click on add external JARS 3 Select the following jars from Hadoop library. /usr/local/Hadoop/share/Hadoop/common/libs /usr/local/Hadoop/share/Hadoop/hdfs/libs /usr/local/Hadoop/share/Hadoop/httpfs/libs /usr/local/Hadoop/share/Hadoop/mapreduce/libs /usr/local/Hadoop/share/Hadoop/yarn/libs /usr/local/Hadoop/share/Hadoop/tools/ Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 58 / 60
  • 59. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Continue .... Step 7: Set input file path 1 Create folder in home dir 2 copy text files in to that 3 Select path of Input Step 8: Set input and output path 1 right click on source ⇒ Run As ⇒ Run Configuration ⇒ Argument 2 Enter your input and out put path with a single space 3 click on Run Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 59 / 60
  • 60. Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc thank You Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 60 / 60
  翻译: