This document discusses data deduplication techniques for big data stored in HDFS (Hadoop Distributed File System). It begins by defining data deduplication as a data compression technique that eliminates duplicate copies of repeating data to reduce storage space. The document then reviews different levels and types of deduplication (file-level, block-level, inline, post-process, client-side, target-based) and discusses how deduplication can reduce storage needs significantly for backup applications and file systems. However, security and privacy concerns arise when sensitive user data is deduplicated in the cloud. The document proposes a new authorized deduplication scheme that considers access control policies of users in addition to the data itself.