SlideShare a Scribd company logo
HDFS: Optimization, Stabilization and
Supportability
April 13, 2016
Chris Nauroth
email: cnauroth@hortonworks.com
twitter: @cnauroth
© Hortonworks Inc. 2011
About Me
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Motivation
• HDFS engineers are on the front line for operational support of Hadoop.
– HDFS is the foundational storage layer for typical Hadoop deployments.
– Therefore, challenges in HDFS have the potential to impact the entire Hadoop ecosystem.
– Conversely, application problems can become visible at the layer of HDFS operations.
• Analysis of Hadoop Support Cases
– Support case trends reveal common patterns for HDFS operational challenges.
– Those challenges inform what needs to improve in the software.
• Software Improvements
– Optimization: Identify bottlenecks and make them faster.
– Stabilization: Prevent unusual circumstances from harming cluster uptime.
– Supportability: When something goes wrong, provide visibility and tools to fix it.
Thank you to the entire community of Apache contributors.
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Logging
• Logging requires a careful balance.
– Too little logging hides valuable operational information.
– Too much logging causes information overload, increased load and greater garbage collection overhead.
• Logging APIs
– Hadoop codebase currently uses a mix of logging APIs.
– Commons Logging and Log4J 1 require additional guard logic to prevent execution of expensive messages.
if (LOG.isDebugEnabled()) {
LOG.debug(“Processing block: “ + block); // expensive toString() implementation!
}
– SLF4J simplifies this.
LOG.debug(“Processing block: {}”, block); // calls toString() only if debug enabled
• Pitfalls
– Forgotten guard logic.
– Logging in a tight loop.
– Logging while holding a shared resource, such as a mutually exclusive lock.
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-12318: better logging of LDAP exceptions
• Failure to log full details of an authentication failure.
– Very simple patch, huge payoff.
– Include exception details when logging failure.
• Before:
throw new SaslException("PLAIN auth failed: " + e.getMessage());
• After:
throw new SaslException("PLAIN auth failed: " + e.getMessage(), e);
Page 5
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9434: Recommission a datanode with 500k blocks
may pause NN for 30 seconds
• Logging is too verbose
– Summary of patch: don’t log too much!
– Move detailed logging to trace level.
– It’s still accessible for edge case troubleshooting, but it doesn’t impact base operations.
• Before:
LOG.info("BLOCK* processOverReplicatedBlock: " +
"Postponing processing of over-replicated " +
block + " since storage + " + storage
+ "datanode " + cur + " does not yet have up-to-date " +
"block information.");
• After:
if (LOG.isTraceEnabled()) {
LOG.trace("BLOCK* processOverReplicatedBlock: Postponing " + block
+ " since storage " + storage
+ " does not yet have up-to-date information.");
}
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Troubleshooting
• Kerberos is hard.
– Many moving parts: KDC, DNS, principals, keytabs and Hadoop configuration.
– Management tools like Apache Ambari automate initial provisioning of principals, keytabs and configuration.
– When it doesn’t work, finding root cause is challenging.
• Metrics are vital for diagnosis of most operational problems.
– Metrics must be capable of showing that there is a problem. (e.g. RPC call volume spike)
– Metrics also must be capable of identifying the source of that problem. (e.g. user issuing RPC calls)
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-12426: kdiag
• Kerberos misconfiguration diagnosis.
– Attempts to diagnose multiple sources of potential Kerberos misconfiguration problems.
– DNS
– Hadoop configuration files
– KDC configuration
• kdiag: a command-line tool for diagnosis of Kerberos problems
– Automatically trigger Java diagnostics, such as -Dsun.security.krb5.debug.
– Prints various environment variables, Java system properties and Hadoop configuration options related to
security.
– Attempt a login.
– If keytab used, print principal information from keytab.
– Print krb5.conf.
– Validate kinit executable (used for ticket renewals).
Page 8
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-6982: nntop
• Find activity trends of HDFS operations.
– HDFS audit log contains a record of each file system operation to the NameNode.
– NameNode metrics contain raw counts of operations.
– Identifying load trends from particular users or particular operations has always required ad-hoc scripting to
analyze the above sources of information.
• nntop: HDFS operation counts aggregated per operation and per user within time windows.
– curl
'http://127.0.0.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState’
– Look for the “TopUserOpCounts” section in the returned JSON.
"ops": [
{
"totalCount": 1,
"opType": "delete",
"topUsers": [
{
"count": 1,
"user": "chris"
}
Page 9
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-7182: JMX metrics aren't accessible when NN is
busy
• Lock contention while attempting to query NameNode JMX metrics.
– JMX metrics are often queried in response to operational problems.
– Some metrics data required acquisition of a lock inside the NameNode. If another thread held this lock, then
metrics could not be accessed.
– During times of high load, the lock is likely to be held by another thread.
– At a time when the metrics are most likely to be needed, they were inaccessible.
– This patch addressed the problem by acquiring the metrics data without requiring the lock held.
Page 10
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Managing Load
• RPC call load.
– It’s too easy for a single inefficient job to overwhelm a cluster with too much RPC load.
– RPC servers accept calls into a single shared queue.
– Overflowing that queue causes increased latency and rejection of calls for all callers, not just the single inefficient
job that caused the problem.
– Load problems can be mitigated with enhanced admission control, client back-off and throttling policies
tailored to real-world usage patterns.
Page 11
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-10282: FairCallQueue
• Hadoop RPC Architecture
– Traditionally, Hadoop RPC internally admits incoming RPC calls into a single shared queue.
– Worker threads consume the incoming calls from that shared queue and process them.
– In an overloaded situation, calls spend more time waiting in the queue for a worker thread to become available.
– At the extreme, the queue overflows, which then requires rejecting the calls.
– This tends to punish all callers, not just the caller that triggered the unusually high load.
• RPC Congestion Control with FairCallQueue
– Replace single shared queue with multiple prioritized queues.
– Call is placed into a queue with priority selected based on the calling user’s current history.
– Calls are dequeued and processed with greater frequency from higher-priority queues.
– Under normal operations, when the RPC server can keep up with load, this is not noticeably different from the
original architecture.
– Under high load, this tends to deprioritize users triggering unusually high load, thus allowing room for other
processes to make progress. There is less risk of a single runaway job overwhelming a cluster.
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-10597: RPC Server signals backoff to clients
when all request queues are full
• Client-side backoff from overloaded RPC servers.
– Builds upon work of the RPC FairCallQueue.
– If an RPC server’s queue is full, then optionally send a signal to additional incoming clients to request backoff.
– Clients are aware of the signal, and react by performing exponential backoff before sending additional calls.
– Improves quality of service for clients when server is under heavy load. RPC calls that would have failed will
instead succeed, but with longer latency.
– Improves likelihood of server recovering, because client backoff will give it more opportunity to catch up.
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HADOOP-12916: Allow RPC scheduler/callqueue backoff
using response times
• More flexibility in back-off policies.
– Triggering backoff when the queue is full is in some sense too late. The problem has already grown too severe.
– Instead, track call response time, and trigger backoff when response time exceeds bounds.
– Any amount of queueing increases RPC response latency. Reacting to unusually high RPC response time can
prevent the problem from becoming so severe that the queue overflows.
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Performance
• Garbage Collection
– NameNode heap must scale up in relation to the number of file system objects (files, directories, blocks, etc.).
– Recent hardware trends can cause larger DataNode heaps too. (Nodes have more disks and those disks are
larger, therefore the memory footprint has increased for tracking block state.)
– Much has been written about garbage collection tuning for large heap JVM processes.
– In addition to recommending configuration best practices, we can optimize the codebase to reduce garbage
collection pressure.
• Block Reporting
– The process by which DataNodes report information about their stored blocks to the NameNode.
– Full Block Report: a complete catalog of all of the node’s blocks, sent infrequently.
– Incremental Block Report: partial information about recently added or deleted blocks, sent more frequently.
– All block reporting occurs asynchronous of any user-facing operations, so it does not impact end user latency
directly.
– However, inefficiencies in block reporting can overwhelm a cluster to the point that it can no longer serve end user
operations sufficiently.
Page 15
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-7097: Allow block reports to be processed during
checkpointing on standby name node
• Coarse-grained locking impedes block report processing.
– NameNode has a global lock required to enforce mutual exclusion for some operations.
– One such operation is checkpointing performed at the HA standby NameNode: process of creating a new fsimage
representing the full metadata state and beginning a new edit log. This can take a long time in large clusters.
– Block report processing also required holding the lock, and therefore could not proceed during a checkpoint.
• Coarse-grained lock contention can lead to cascading failure and downtime.
– Checkpointing holds lock.
– Frequent incremental block reports from DataNodes block waiting to acquire lock.
– Eventually consumes all available RPC handler threads, all waiting to acquire lock.
– In extreme case, blocks HA NameNode failover, because there is no RPC handler thread available to handle the
failover request.
– Even if HA failover can succeed, may still leave cluster in a state where it appears many nodes have gone dead,
because their blocked heartbeats couldn’t be processed.
• Solution: allow block report processing without holding global lock.
– Block reports now can be processed concurrently with a checkpoint in progress.
– Like most multi-threading and locking logic, required careful reasoning to ensure change was safe.
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-7435: PB encoding of block reports is very inefficient
• Block report RPC message encoding can cause memory allocation inefficiency and garbage
collection churn.
– HDFS RPC messages are encoded using Protocol Buffers.
– Block reports encoded each block ID, length and generation stamp in a Protocol Buffers repeated long field.
– Behind the scenes, this becomes an ArrayList with a default capacity of 10.
– DataNodes in large clusters almost always send a larger block report than this, so ArrayList reallocation churn is almost
guaranteed.
– Data type contained in the ArrayList is Long (note captialization, not primitive long).
– Boxing and unboxing causes additional allocation requirements.
• Solution: a more GC-friendly encoding of block reports.
– Within the Protocol Buffers RPC message, take over serialization directly.
– Manually encode number of longs, followed by list of primitive longs.
– Eliminates ArrayList reallocation costs.
– Eliminates boxing and unboxing costs by deserializing straight to primitive long.
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-7609: Avoid retry cache collision when Standby
NameNode loading edits
• Idempotence and at-most-once delivery of HDFS RPC messages.
– Some RPC message processing is inherently idempotent: can be applied multiple times, and the final result is still
the same. Example: setPermission.
– Other messages are not inherently idempotent, but the NameNode can still provide an “at-most-once” processing
guarantee by temporarily tracking recently executed operations by a unique call ID. Example: rename.
– The data structure that does this is called the RetryCache.
– This is important in failure modes, such as an HA failover or a network partition, which may cause a client to send
the same message more than once.
• Erroneous multiple RetryCache entries for same operation.
– Duplicate entries caused slowdown.
– Particularly noticeable during an HA transition.
– Bug fix to prevent duplicate entries.
Page 18
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9710: Change DN to send block receipt IBRs in
batches
• Incremental block reports trigger multiple RPC calls.
– When a DataNode receives a block, it sends an incremental block report RPC to the NameNode immediately.
– Even multiple block receipts translate to multiple individual incremental block report RPCs.
– With consideration of all DataNodes in a large cluster, this can become a huge number of RPC messages for the
NameNode to process.
• Solution: batch multiple block receipt events into a single RPC message.
– Reduces RPC overhead of sending multiple messages.
– Scales better with respect to number of nodes and number of blocks in a cluster.
Page 19
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Liveness
• "...make progress despite the fact that its concurrently executing components ("processes") may
have to "take turns" in critical sections, parts of the program that cannot be simultaneously run
by multiple processes." -Wikipedia
• DataNode Heartbeats
– Responsible for reporting health of a DataNode to the NameNode.
– Operational problems of managing load and performance can block timely heartbeat processing.
– Heartbeat processing at the NameNode can be surprisingly costly due to contention on a global lock and
asynchronous dispatch of commands (e.g. delete block).
• Blocked heartbeat processing can cause cascading failure and downtime.
– Blocked heartbeat processing can make the NameNode think DataNodes are not heartbeating at all, and
therefore are not running.
– DataNodes that stop running are flagged by the NameNode as dead.
– Too many dead DataNodes makes the cluster inoperable as a whole.
– Dead DataNodes must have their replicas copied to other DataNodes to satisfy replication requirements.
– Erroneously flagging DataNodes as dead can cause a storm of wasteful re-replication activity.
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9239: DataNode Lifeline Protocol: an alternative
protocol for reporting DataNode health
• The lifeline keeps the DataNode alive, despite conditions of unusually high load.
– Optionally run a separate RPC server within the NameNode dedicated to processing of lifeline messages sent by
DataNodes.
– Lifeline messages are a simplified form of heartbeat messages, but do not have the same costly requirements for
asynchronous command dispatch, and therefore do not need to contend on a shared lock.
– Even if the main NameNode RPC queue is overwhelmed, the lifeline still keeps the DataNode alive.
– Prevents erroneous and costly re-replication activity.
Page 21
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9311: Support optional offload of NameNode HA
service health checks to a separate RPC server.
• RPC offload of HA health check and failover messages.
– Similar to problem of timely heartbeat message delivery.
– NameNode HA requires messages sent from the ZKFC (ZooKeeper Failover Controller) process to the
NameNode.
– Messages are related to handling periodic health checks and initiating shutdown and failover if necessary.
– A NameNode overwhelmed with unusually high load cannot process these messages.
– Delayed processing of these messages slows down NameNode failover, and thus creates a visibly prolonged
outage period.
– The lifeline RPC server can be used to offload HA messages, and similarly keep processing them even in the
case of unusually high load.
Page 22
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Optimizing Applications
• HDFS Utilization Patterns
– Sometimes it’s helpful to look a layer higher and assess what applications are doing with HDFS.
– FileSystem API unfortunately can make it too easy to implement inefficient call patterns.
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HIVE-10223: Consolidate several redundant FileSystem
API calls.
• Hadoop FileSystem API can cause applications to make redundant RPC calls.
• Before:
if (fs.isFile(file)) { // RPC #1
...
} else if (fs.isDirectory(file)) { // RPC #2
...
}
• After:
FileStatus fileStatus = fs.getFileStatus(file); // Just 1 RPC
if (fileStatus.isFile()) { // Local, no RPC
...
} else if (fileStatus.isDirectory()) { // Local, no RPC
...
}
• Good for Hive, because it reduces latency associated with NameNode RPCs.
• Good for the whole ecosystem, because it reduces load on the NameNode, a shared service.
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2011
PIG-4442: Eliminate redundant RPC call to get file
information in HPath.
• A similar story of redundant RPC within Pig code.
• Before:
long blockSize = fs.getHFS().getFileStatus(path).getBlockSize(); // RPC #1
short replication = fs.getHFS().getFileStatus(path).getReplication(); // RPC #2
• After:
FileStatus fileStatus = fs.getHFS().getFileStatus(path); // Just 1 RPC
long blockSize = fileStatus.getBlockSize(); // Local, no RPC
short replication = fileStatus.getReplication(); // Local, no RPC
• Revealed from inspection of HDFS audit log.
– HDFS audit log shows a record of each file system operation executed against the NameNode.
– This continues to be one of the most significant sources of HDFS troubleshooting information.
– In this case, manual inspection revealed a suspicious pattern of multiple getfileinfo calls for the same path from a
Pig job submission.
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS-9924: Asynchronous HDFS Access
• Current Hadoop FileSystem API is inherently synchronous.
– Issue a single synchronous file system call.
– In the case of HDFS, that call is implemented with a synchronous RPC.
– Block waiting for the result.
– Then, client application may proceed.
• Some application usage patterns would benefit from asynchronous access.
– Some applications regularly issue a large sequence of multiple file system calls, with no data dependencies
between the results of those calls.
– For example, Hive partition logic can involve hundreds or thousands of rename operations, where each rename
can execute independently, with no data dependencies on the results of other renames.
public Future<Boolean> rename(Path src, Path dst) throws IOException;
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Summary
• A variety of recent enhancements have improved the ability of HDFS to serve as the foundational
storage layer of the Hadoop ecosystem.
• Optimization
– Performance
– Optimizing Applications
• Stabilization
– Liveness
– Managing Load
• Supportability
– Logging
– Troubleshooting
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank you!
Q&A
Ad

More Related Content

What's hot (20)

Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
DataWorks Summit
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
Rommel Garcia
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
Rafał Leszko
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
Rafał Leszko
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 

Viewers also liked (20)

Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Impetus Technologies
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Making It To Veteren Cassandra Status
Making It To Veteren Cassandra StatusMaking It To Veteren Cassandra Status
Making It To Veteren Cassandra Status
Eric Lubow
 
It's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use itIt's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use it
DataWorks Summit/Hadoop Summit
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
DataWorks Summit/Hadoop Summit
 
Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emc
xKinAnx
 
Contributing to Open Source - A Beginners Guide
Contributing to Open Source - A Beginners GuideContributing to Open Source - A Beginners Guide
Contributing to Open Source - A Beginners Guide
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
DataWorks Summit/Hadoop Summit
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
Yifeng Jiang
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
NTT DATA OSS Professional Services
 
Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!
Monica Beckwith
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
DataWorks Summit/Hadoop Summit
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Impetus Technologies
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Making It To Veteren Cassandra Status
Making It To Veteren Cassandra StatusMaking It To Veteren Cassandra Status
Making It To Veteren Cassandra Status
Eric Lubow
 
It's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use itIt's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use it
DataWorks Summit/Hadoop Summit
 
Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emc
xKinAnx
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
NTT DATA OSS Professional Services
 
Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!
Monica Beckwith
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
Ad

Similar to HDFS: Optimization, Stabilization and Supportability (20)

Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Etu Solution
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
Big Data Joe™ Rossi
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
hdhappy001
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
pbelko82
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Etu Solution
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
Big Data Joe™ Rossi
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
hdhappy001
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
pbelko82
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

How to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and TrendsHow to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and Trends
Nascenture
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
ACE Aarhus - Team'25 wrap-up presentation
ACE Aarhus - Team'25 wrap-up presentationACE Aarhus - Team'25 wrap-up presentation
ACE Aarhus - Team'25 wrap-up presentation
DanielEriksen5
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
How to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and TrendsHow to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and Trends
Nascenture
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
ACE Aarhus - Team'25 wrap-up presentation
ACE Aarhus - Team'25 wrap-up presentationACE Aarhus - Team'25 wrap-up presentation
ACE Aarhus - Team'25 wrap-up presentation
DanielEriksen5
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
React Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for SuccessReact Native for Business Solutions: Building Scalable Apps for Success
React Native for Business Solutions: Building Scalable Apps for Success
Amelia Swank
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 

HDFS: Optimization, Stabilization and Supportability

  • 1. HDFS: Optimization, Stabilization and Supportability April 13, 2016 Chris Nauroth email: cnauroth@hortonworks.com twitter: @cnauroth
  • 2. © Hortonworks Inc. 2011 About Me Chris Nauroth • Member of Technical Staff, Hortonworks – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Major contributor to HDFS ACLs, Windows compatibility, and operability improvements • Hadoop user since 2010 – Prior employment experience deploying, maintaining and using Hadoop clusters Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Motivation • HDFS engineers are on the front line for operational support of Hadoop. – HDFS is the foundational storage layer for typical Hadoop deployments. – Therefore, challenges in HDFS have the potential to impact the entire Hadoop ecosystem. – Conversely, application problems can become visible at the layer of HDFS operations. • Analysis of Hadoop Support Cases – Support case trends reveal common patterns for HDFS operational challenges. – Those challenges inform what needs to improve in the software. • Software Improvements – Optimization: Identify bottlenecks and make them faster. – Stabilization: Prevent unusual circumstances from harming cluster uptime. – Supportability: When something goes wrong, provide visibility and tools to fix it. Thank you to the entire community of Apache contributors. Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Logging • Logging requires a careful balance. – Too little logging hides valuable operational information. – Too much logging causes information overload, increased load and greater garbage collection overhead. • Logging APIs – Hadoop codebase currently uses a mix of logging APIs. – Commons Logging and Log4J 1 require additional guard logic to prevent execution of expensive messages. if (LOG.isDebugEnabled()) { LOG.debug(“Processing block: “ + block); // expensive toString() implementation! } – SLF4J simplifies this. LOG.debug(“Processing block: {}”, block); // calls toString() only if debug enabled • Pitfalls – Forgotten guard logic. – Logging in a tight loop. – Logging while holding a shared resource, such as a mutually exclusive lock. Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2011 HADOOP-12318: better logging of LDAP exceptions • Failure to log full details of an authentication failure. – Very simple patch, huge payoff. – Include exception details when logging failure. • Before: throw new SaslException("PLAIN auth failed: " + e.getMessage()); • After: throw new SaslException("PLAIN auth failed: " + e.getMessage(), e); Page 5 Architecting the Future of Big Data
  • 6. © Hortonworks Inc. 2011 HDFS-9434: Recommission a datanode with 500k blocks may pause NN for 30 seconds • Logging is too verbose – Summary of patch: don’t log too much! – Move detailed logging to trace level. – It’s still accessible for edge case troubleshooting, but it doesn’t impact base operations. • Before: LOG.info("BLOCK* processOverReplicatedBlock: " + "Postponing processing of over-replicated " + block + " since storage + " + storage + "datanode " + cur + " does not yet have up-to-date " + "block information."); • After: if (LOG.isTraceEnabled()) { LOG.trace("BLOCK* processOverReplicatedBlock: Postponing " + block + " since storage " + storage + " does not yet have up-to-date information."); } Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2011 Troubleshooting • Kerberos is hard. – Many moving parts: KDC, DNS, principals, keytabs and Hadoop configuration. – Management tools like Apache Ambari automate initial provisioning of principals, keytabs and configuration. – When it doesn’t work, finding root cause is challenging. • Metrics are vital for diagnosis of most operational problems. – Metrics must be capable of showing that there is a problem. (e.g. RPC call volume spike) – Metrics also must be capable of identifying the source of that problem. (e.g. user issuing RPC calls) Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 HADOOP-12426: kdiag • Kerberos misconfiguration diagnosis. – Attempts to diagnose multiple sources of potential Kerberos misconfiguration problems. – DNS – Hadoop configuration files – KDC configuration • kdiag: a command-line tool for diagnosis of Kerberos problems – Automatically trigger Java diagnostics, such as -Dsun.security.krb5.debug. – Prints various environment variables, Java system properties and Hadoop configuration options related to security. – Attempt a login. – If keytab used, print principal information from keytab. – Print krb5.conf. – Validate kinit executable (used for ticket renewals). Page 8 Architecting the Future of Big Data
  • 9. © Hortonworks Inc. 2011 HDFS-6982: nntop • Find activity trends of HDFS operations. – HDFS audit log contains a record of each file system operation to the NameNode. – NameNode metrics contain raw counts of operations. – Identifying load trends from particular users or particular operations has always required ad-hoc scripting to analyze the above sources of information. • nntop: HDFS operation counts aggregated per operation and per user within time windows. – curl 'http://127.0.0.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState’ – Look for the “TopUserOpCounts” section in the returned JSON. "ops": [ { "totalCount": 1, "opType": "delete", "topUsers": [ { "count": 1, "user": "chris" } Page 9 Architecting the Future of Big Data
  • 10. © Hortonworks Inc. 2011 HDFS-7182: JMX metrics aren't accessible when NN is busy • Lock contention while attempting to query NameNode JMX metrics. – JMX metrics are often queried in response to operational problems. – Some metrics data required acquisition of a lock inside the NameNode. If another thread held this lock, then metrics could not be accessed. – During times of high load, the lock is likely to be held by another thread. – At a time when the metrics are most likely to be needed, they were inaccessible. – This patch addressed the problem by acquiring the metrics data without requiring the lock held. Page 10 Architecting the Future of Big Data
  • 11. © Hortonworks Inc. 2011 Managing Load • RPC call load. – It’s too easy for a single inefficient job to overwhelm a cluster with too much RPC load. – RPC servers accept calls into a single shared queue. – Overflowing that queue causes increased latency and rejection of calls for all callers, not just the single inefficient job that caused the problem. – Load problems can be mitigated with enhanced admission control, client back-off and throttling policies tailored to real-world usage patterns. Page 11 Architecting the Future of Big Data
  • 12. © Hortonworks Inc. 2011 HADOOP-10282: FairCallQueue • Hadoop RPC Architecture – Traditionally, Hadoop RPC internally admits incoming RPC calls into a single shared queue. – Worker threads consume the incoming calls from that shared queue and process them. – In an overloaded situation, calls spend more time waiting in the queue for a worker thread to become available. – At the extreme, the queue overflows, which then requires rejecting the calls. – This tends to punish all callers, not just the caller that triggered the unusually high load. • RPC Congestion Control with FairCallQueue – Replace single shared queue with multiple prioritized queues. – Call is placed into a queue with priority selected based on the calling user’s current history. – Calls are dequeued and processed with greater frequency from higher-priority queues. – Under normal operations, when the RPC server can keep up with load, this is not noticeably different from the original architecture. – Under high load, this tends to deprioritize users triggering unusually high load, thus allowing room for other processes to make progress. There is less risk of a single runaway job overwhelming a cluster. Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2011 HADOOP-10597: RPC Server signals backoff to clients when all request queues are full • Client-side backoff from overloaded RPC servers. – Builds upon work of the RPC FairCallQueue. – If an RPC server’s queue is full, then optionally send a signal to additional incoming clients to request backoff. – Clients are aware of the signal, and react by performing exponential backoff before sending additional calls. – Improves quality of service for clients when server is under heavy load. RPC calls that would have failed will instead succeed, but with longer latency. – Improves likelihood of server recovering, because client backoff will give it more opportunity to catch up. Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2011 HADOOP-12916: Allow RPC scheduler/callqueue backoff using response times • More flexibility in back-off policies. – Triggering backoff when the queue is full is in some sense too late. The problem has already grown too severe. – Instead, track call response time, and trigger backoff when response time exceeds bounds. – Any amount of queueing increases RPC response latency. Reacting to unusually high RPC response time can prevent the problem from becoming so severe that the queue overflows. Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2011 Performance • Garbage Collection – NameNode heap must scale up in relation to the number of file system objects (files, directories, blocks, etc.). – Recent hardware trends can cause larger DataNode heaps too. (Nodes have more disks and those disks are larger, therefore the memory footprint has increased for tracking block state.) – Much has been written about garbage collection tuning for large heap JVM processes. – In addition to recommending configuration best practices, we can optimize the codebase to reduce garbage collection pressure. • Block Reporting – The process by which DataNodes report information about their stored blocks to the NameNode. – Full Block Report: a complete catalog of all of the node’s blocks, sent infrequently. – Incremental Block Report: partial information about recently added or deleted blocks, sent more frequently. – All block reporting occurs asynchronous of any user-facing operations, so it does not impact end user latency directly. – However, inefficiencies in block reporting can overwhelm a cluster to the point that it can no longer serve end user operations sufficiently. Page 15 Architecting the Future of Big Data
  • 16. © Hortonworks Inc. 2011 HDFS-7097: Allow block reports to be processed during checkpointing on standby name node • Coarse-grained locking impedes block report processing. – NameNode has a global lock required to enforce mutual exclusion for some operations. – One such operation is checkpointing performed at the HA standby NameNode: process of creating a new fsimage representing the full metadata state and beginning a new edit log. This can take a long time in large clusters. – Block report processing also required holding the lock, and therefore could not proceed during a checkpoint. • Coarse-grained lock contention can lead to cascading failure and downtime. – Checkpointing holds lock. – Frequent incremental block reports from DataNodes block waiting to acquire lock. – Eventually consumes all available RPC handler threads, all waiting to acquire lock. – In extreme case, blocks HA NameNode failover, because there is no RPC handler thread available to handle the failover request. – Even if HA failover can succeed, may still leave cluster in a state where it appears many nodes have gone dead, because their blocked heartbeats couldn’t be processed. • Solution: allow block report processing without holding global lock. – Block reports now can be processed concurrently with a checkpoint in progress. – Like most multi-threading and locking logic, required careful reasoning to ensure change was safe. Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 HDFS-7435: PB encoding of block reports is very inefficient • Block report RPC message encoding can cause memory allocation inefficiency and garbage collection churn. – HDFS RPC messages are encoded using Protocol Buffers. – Block reports encoded each block ID, length and generation stamp in a Protocol Buffers repeated long field. – Behind the scenes, this becomes an ArrayList with a default capacity of 10. – DataNodes in large clusters almost always send a larger block report than this, so ArrayList reallocation churn is almost guaranteed. – Data type contained in the ArrayList is Long (note captialization, not primitive long). – Boxing and unboxing causes additional allocation requirements. • Solution: a more GC-friendly encoding of block reports. – Within the Protocol Buffers RPC message, take over serialization directly. – Manually encode number of longs, followed by list of primitive longs. – Eliminates ArrayList reallocation costs. – Eliminates boxing and unboxing costs by deserializing straight to primitive long. Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 HDFS-7609: Avoid retry cache collision when Standby NameNode loading edits • Idempotence and at-most-once delivery of HDFS RPC messages. – Some RPC message processing is inherently idempotent: can be applied multiple times, and the final result is still the same. Example: setPermission. – Other messages are not inherently idempotent, but the NameNode can still provide an “at-most-once” processing guarantee by temporarily tracking recently executed operations by a unique call ID. Example: rename. – The data structure that does this is called the RetryCache. – This is important in failure modes, such as an HA failover or a network partition, which may cause a client to send the same message more than once. • Erroneous multiple RetryCache entries for same operation. – Duplicate entries caused slowdown. – Particularly noticeable during an HA transition. – Bug fix to prevent duplicate entries. Page 18 Architecting the Future of Big Data
  • 19. © Hortonworks Inc. 2011 HDFS-9710: Change DN to send block receipt IBRs in batches • Incremental block reports trigger multiple RPC calls. – When a DataNode receives a block, it sends an incremental block report RPC to the NameNode immediately. – Even multiple block receipts translate to multiple individual incremental block report RPCs. – With consideration of all DataNodes in a large cluster, this can become a huge number of RPC messages for the NameNode to process. • Solution: batch multiple block receipt events into a single RPC message. – Reduces RPC overhead of sending multiple messages. – Scales better with respect to number of nodes and number of blocks in a cluster. Page 19 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 Liveness • "...make progress despite the fact that its concurrently executing components ("processes") may have to "take turns" in critical sections, parts of the program that cannot be simultaneously run by multiple processes." -Wikipedia • DataNode Heartbeats – Responsible for reporting health of a DataNode to the NameNode. – Operational problems of managing load and performance can block timely heartbeat processing. – Heartbeat processing at the NameNode can be surprisingly costly due to contention on a global lock and asynchronous dispatch of commands (e.g. delete block). • Blocked heartbeat processing can cause cascading failure and downtime. – Blocked heartbeat processing can make the NameNode think DataNodes are not heartbeating at all, and therefore are not running. – DataNodes that stop running are flagged by the NameNode as dead. – Too many dead DataNodes makes the cluster inoperable as a whole. – Dead DataNodes must have their replicas copied to other DataNodes to satisfy replication requirements. – Erroneously flagging DataNodes as dead can cause a storm of wasteful re-replication activity. Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 HDFS-9239: DataNode Lifeline Protocol: an alternative protocol for reporting DataNode health • The lifeline keeps the DataNode alive, despite conditions of unusually high load. – Optionally run a separate RPC server within the NameNode dedicated to processing of lifeline messages sent by DataNodes. – Lifeline messages are a simplified form of heartbeat messages, but do not have the same costly requirements for asynchronous command dispatch, and therefore do not need to contend on a shared lock. – Even if the main NameNode RPC queue is overwhelmed, the lifeline still keeps the DataNode alive. – Prevents erroneous and costly re-replication activity. Page 21 Architecting the Future of Big Data
  • 22. © Hortonworks Inc. 2011 HDFS-9311: Support optional offload of NameNode HA service health checks to a separate RPC server. • RPC offload of HA health check and failover messages. – Similar to problem of timely heartbeat message delivery. – NameNode HA requires messages sent from the ZKFC (ZooKeeper Failover Controller) process to the NameNode. – Messages are related to handling periodic health checks and initiating shutdown and failover if necessary. – A NameNode overwhelmed with unusually high load cannot process these messages. – Delayed processing of these messages slows down NameNode failover, and thus creates a visibly prolonged outage period. – The lifeline RPC server can be used to offload HA messages, and similarly keep processing them even in the case of unusually high load. Page 22 Architecting the Future of Big Data
  • 23. © Hortonworks Inc. 2011 Optimizing Applications • HDFS Utilization Patterns – Sometimes it’s helpful to look a layer higher and assess what applications are doing with HDFS. – FileSystem API unfortunately can make it too easy to implement inefficient call patterns. Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2011 HIVE-10223: Consolidate several redundant FileSystem API calls. • Hadoop FileSystem API can cause applications to make redundant RPC calls. • Before: if (fs.isFile(file)) { // RPC #1 ... } else if (fs.isDirectory(file)) { // RPC #2 ... } • After: FileStatus fileStatus = fs.getFileStatus(file); // Just 1 RPC if (fileStatus.isFile()) { // Local, no RPC ... } else if (fileStatus.isDirectory()) { // Local, no RPC ... } • Good for Hive, because it reduces latency associated with NameNode RPCs. • Good for the whole ecosystem, because it reduces load on the NameNode, a shared service. Page 24 Architecting the Future of Big Data
  • 25. © Hortonworks Inc. 2011 PIG-4442: Eliminate redundant RPC call to get file information in HPath. • A similar story of redundant RPC within Pig code. • Before: long blockSize = fs.getHFS().getFileStatus(path).getBlockSize(); // RPC #1 short replication = fs.getHFS().getFileStatus(path).getReplication(); // RPC #2 • After: FileStatus fileStatus = fs.getHFS().getFileStatus(path); // Just 1 RPC long blockSize = fileStatus.getBlockSize(); // Local, no RPC short replication = fileStatus.getReplication(); // Local, no RPC • Revealed from inspection of HDFS audit log. – HDFS audit log shows a record of each file system operation executed against the NameNode. – This continues to be one of the most significant sources of HDFS troubleshooting information. – In this case, manual inspection revealed a suspicious pattern of multiple getfileinfo calls for the same path from a Pig job submission. Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2011 HDFS-9924: Asynchronous HDFS Access • Current Hadoop FileSystem API is inherently synchronous. – Issue a single synchronous file system call. – In the case of HDFS, that call is implemented with a synchronous RPC. – Block waiting for the result. – Then, client application may proceed. • Some application usage patterns would benefit from asynchronous access. – Some applications regularly issue a large sequence of multiple file system calls, with no data dependencies between the results of those calls. – For example, Hive partition logic can involve hundreds or thousands of rename operations, where each rename can execute independently, with no data dependencies on the results of other renames. public Future<Boolean> rename(Path src, Path dst) throws IOException; Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2011 Summary • A variety of recent enhancements have improved the ability of HDFS to serve as the foundational storage layer of the Hadoop ecosystem. • Optimization – Performance – Optimizing Applications • Stabilization – Liveness – Managing Load • Supportability – Logging – Troubleshooting Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2011 Thank you! Q&A

Editor's Notes

  • #3: Thank Arpit.
  • #4: We’ll look at specific Apache JIRA issues, some not yet shipped, some still in progress. Small patches often yield big wins. Sometimes those patches are even small enough to fit on a PowerPoint slide, as you’re about to see. Some are larger.
  • #5: These are common challenges for any large Java codebase, not just specific to Hadoop.
  • #6: Too little logging. Size of code change: 3 characters. Without this extra logging information, diagnosis is very challenging.
  • #7: Too much logging.
  • #8: Kerberos is notorious for obtuse error messages that don’t directly point out root cause.
  • #9: These are often steps we need to follow in any case that requires Kerberos troubleshooting. Codifying these steps into a standard tool makes gathering this information easier and more consistent.
  • #10: Helps find the naughty user who is overwhelming your cluster.
  • #15: “smoothing”
  • #16: In contrast to managing an overloaded situation, how can we more effectively handle more load?
  • #18: Garbage collection friendly data structures are particularly relevant to the NameNode, which has a large heap size requirement.
  • #19: Data structure not efficient for duplicate entries. (Not the use case.)
  • #24: We’ve talked about how HDFS can better react to overloaded conditions, and we’ve talked about improving HDFS to handle more total load. What is the source of that load? Is it legitimate?
  • #26: I encourage you to explore and analyze the HDFS audit log in your clusters.
  • #27: Improving the API to encourage more efficient applications.
  • #28: Performance of HDFS itself and also optimizing applications.
  翻译: