This document summarizes key points from chapters 11 and 15 of Programming Hive. It discusses choosing compression codecs for intermediate and final outputs in Hive, how different compression schemes like LZO, Snappy, and BWT work, and enabling compression in Hive. It also covers Hive file formats like SequenceFiles, RCFiles, and ORCFiles. RCFiles store data columnarly and use RLE compression. ORCFiles provide faster reads than RCFiles. The document recommends LZO and Snappy as fast compression codecs that still achieve good compression rates.
Hadoop Conference Japan 2011 Fallに行ってきましたmoai kids
The document summarizes the Hadoop Conference Japan 2011 fall. Key points include:
1) The conference was held on September 26th, 2011 and featured talks from companies like Cloudera, HortonWorks, and MapR on their Hadoop distributions.
2) Major Hadoop updates that were discussed include Hadoop 0.23 and MapReduce version 2 which features improvements like Graph processing with Giraph and Spark.
3) Companies promoting their Hadoop distributions included Cloudera, HortonWorks, and MapR and how they differ in their approaches to distribution and support.
The document discusses performance tuning for HBase. It provides tips for garbage collection tuning, using memstore local allocation buffers, compression, optimizing splits and compactions, load balancing, merging regions, best client API practices, and configuration settings. It also recommends using tools like the HBase PerformanceEvaluation tool and YCSB for load testing HBase.
This document compares the compression algorithms gzip, lzo, and snappy. It finds that snappy compresses data very quickly but not as highly compressed as lzo or gzip. Lzo has high compression but is slower than snappy. Gzip has the highest compression but is the slowest algorithm. The document recommends choosing an algorithm based on the necessary balance of compression ratio, speed, and resources for the task.
This document provides information about using Sony's FeliCa SDK with Adobe AIR/Flash to access RFID tags. It describes the FeliCaProxy software which allows reading from and writing to FeliCa tags. It includes code snippets in ActionScript showing how to open the FeliCa port, perform polling to get card IDs, and do read/write operations. It also mentions other resources for learning about FeliCa and using NFC on Android devices.
This document discusses the HandlerSocket plugin for MySQL. It provides a Java client for HandlerSocket that allows high-performance operations on MySQL databases. Code examples are given showing how to open a connection, execute commands, and retrieve results. Performance test results are also presented showing HandlerSocket outperforming standard JDBC access in terms of query response times. System monitoring outputs are displayed relating to disk I/O, context switching, and CPU usage.
HandlerSocket plugin Client for Javaとそれを用いたベンチマークmoai kids
The document describes benchmarks comparing the performance of the HandlerSocket plugin for MySQL to traditional JDBC. The HandlerSocket plugin provides a Java client for MySQL that can perform CRUD operations more efficiently than JDBC, especially in bulk. The benchmarks show HandlerSocket achieving significantly higher query rates than JDBC for select, insert, update, and delete operations, particularly when performing bulk operations of 100 rows at a time.
Yammer is an enterprise social networking service (SNS) launched in 2008. It allows employees to communicate internally, share files and information, and collaborate across geographic boundaries. Some key facts are that it was one of the top prizes at TechCrunch50 in 2008, is available as a web app and also offers mobile apps for iPhone and Android. It focuses on private social networking within companies.
This document discusses mobile internet and technology trends in China. It notes that mobile internet usage is more popular in China than the US, with over 56% of Chinese internet users accessing the internet via mobile devices in 2010. Popular mobile services in China include QQ, an instant messenger with over 600 million users, and WeChat, a social networking platform. The document also references several other Chinese mobile internet and technology companies and trends.
This document discusses emoji analysis and frequency counting. It describes using a Double Array Trie Tree structure called Darts to efficiently store and search emoji sequences and their frequencies from a large corpus. Examples are given of frequent emoji sequences extracted from Twitter along with their counts. Links are provided to resources about the Unicode emoji standard and the Darts algorithm.
1. The document discusses n-grams and their use in natural language processing tasks.
2. It provides examples of n-grams of different sizes and describes how n-grams are used to calculate weights and similarity measures between words or phrases.
3. The document also mentions using n-grams with the Markov Clustering Algorithm to cluster similar words or phrases together.
This document analyzes data from the language learning website Lang-8, which had over 155,000 users as of June 2010. It finds that the top native languages of users are Japanese, English, Mandarin, Korean, and Russian. The most popular languages of study are English, Japanese, Mandarin, Spanish, and Korean. The analysis also looks at age, gender ratios, and networks between native and target languages of users.
This document discusses text-to-speech (TTS) and speech recognition APIs available on Android. It describes how TTS is implemented using the TextToSpeech API and common methods like onInit. Speech recognition utilizes the RecognizerIntent to start the recognition activity and get results. Both APIs allow interacting with the user through voice instead of only touch interfaces.