“これまでの Big Data 分析において、欠落しているものといえば、LAMP(Linux, Apache HTTP Server, MySQL and PHP)に相当するものとなる。 幸いなことに、Big Data の収集/処理/分析においては、以下の項目を含む、LAMP に似たスタックが出現している: Hadoop Distributed File System (HDFS) for storage
MapReduce for distributed processing of large data sets on compute clusters
HBase for fast read/write access to tabular data
Hive for SQL-like queries on large data sets as well as a columnar storage layout using RCFile
Flume for log file and streaming data collection, along with Sqoop for database imports
JDBC and ODBC drivers to allow tools written for relational databases to access data stored in Hive
Hue for user interfaces
Pig for dataflow and parallel computations
Oozie for workflow
Avro for serialization
Zookeeper for coordinated service for distributed applications”
MapReduce for distributed processing of large data sets on compute clusters
HBase for fast read/write access to tabular data
Hive for SQL-like queries on large data sets as well as a columnar storage layout using RCFile
Flume for log file and streaming data collection, along with Sqoop for database imports
JDBC and ODBC drivers to allow tools written for relational databases to access data stored in Hive
Hue for user interfaces
Pig for dataflow and parallel computations
Oozie for workflow
Avro for serialization
Zookeeper for coordinated service for distributed applications”