Integrating LVM with Hadoop and providing Elasticity to Data Node Storage
What is LVM?
LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.
The physical volumes are combined into logical volumes, with the exception of the /boot partition. The /boot partition cannot be on a logical volume group because the boot loader cannot read it. If the root (/) partition is on a logical volume, create a separate /boot partition which is not a part of a volume group. Since a physical volume cannot span over multiple drives, to span over more than one drive, create one or more physical volumes per drive.
The volume groups can be divided into logical volumes, which are assigned mount points, such as /home and / and file system types, such as ext2 or ext3. When "partitions" reach their full capacity, free space from the volume group can be added to the logical volume to increase the size of the partition. When a new hard drive is added to the system, it can be added to the volume group, and partitions that are logical volumes can be increased in size.
🌀 TASK DESCRIPTION :
🔅7.1: Elasticity Task -- a) Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
First, Set up the Hadoop cluster and start the instance which is configured as data node. Then follow the steps--
Step 1 : Installing the LVM software in the Data Node which will be integrated with the LVM
Command : yum install lvm2
Installation is done.
Step 2 : Launching 2 EBS volumes in the same availability zone at which instance is running
volume1 and volume2, both are in available state currently.
Step 3 : Attaching volumes to instance
volumes- /dev/sdf and /dev/sdg attached to the instance named "instance".
This can also be seen by connecting to the instance and running "fdisk -l" command in the terminal.
disk /dev/xvdf of size 5 GB and disk /dev/xvdg of size 6 GB have been added to the instance.
Step 4 : Creation of physical volumes
Converting both the EBS volumes to physical volumes using command "pvcreate".
We can see the complete information of respective physical volumes created with the command "pvdisplay".
Step 5 : Creation of volume group
Creating volume group with the two physical volumes previously created with the command "vgcreate". Here "myvg" is the name of the volume group.
With the command "vgdisplay", we can see the complete information of the created volume group. The total size of the volume group is 10.99 GB, combining the size of both the physical volumes i.e. physical volume /dev/xvdf of size 5 GB and physical volume /dev/xvdg of size 6 GB.
Step 6 : Creation of logical volume
After successful creation of volume group now creating the Logical volume of size according to the requirement, initially we are creating the logical volume of size 5GB. Then will increase it according to the need.
Command : lvcreate --size <lv_size> --name <lv_name> <vg_name>
Logical volume named "mylv" created of size 5GB. This can be seen using command "lvdisplay".
Now, running "vgdisplay" command will show that the volume group has 5.99 GB free size left as 5 GB size is allocated to the logical volume.
Step 7 : Formatting the logical volume
While formatting, we have to give the full path of the logical volume.
Step 8 : Mounting the volume to the data node folder
The logical volume successfully mounted to the folder "/slave" and now the total size of this folder became 4.9 GB of which available is 4.6 GB and rest of it contains some information.
Step 9 : Starting hadoop namenode services
Starting the instance which is configured as namenode.
Starting the namenode services --
Command : hadoop-daemon.sh start namenode
Step 10 : Starting hadoop datanode services
Command : hadoop-daemon.sh start datanode
Step 11 : Checking the contributed storage by the data node to the hadoop cluster
Command : hadoop dfsadmin -report
As the datanode folder is attached to the logical volume of size 4.9 GB, so it will contribute 4.86 GB to the hadoop cluster. We can also extend or reduce this contributed capacity by extending or reducing the size of the logical volume which is attached to the datanode folder.
Step 12 : Extending the size of the logical volume
command : lvextend --size <size_to_extend> <volume_path>
Now, the total size of the logical volume became 8 GB. Can verify using command "lvdisplay".
But, still the total size of the datanode folder is 4.9 GB. It has not extended yet. So we need to format this extended volume part so that this can also be used to store data.
Step 13 : Formatting the extended volume part
command : resize2fs <volume_path>
After formatting, the total size of the datanode folder i.e. "/slave" also extended to 7.9 GB.
Step 14 : Confirming the contribution of extended storage by the data node to the hadoop cluster.
command : hadoop dfsadmin -report
Now, data node is contributing 7.81 GB whereas was previously contributing 4.86 GB.
Thank You for reading..!!
Helping people's, who find something to be passionate about............... entrepreneurship/Independent business
4yCongratulations