- Tech know how online


Apache Hadoop is an Apache Software Foundation( ASF) framework for scalable, distributed software. Originally created by open source developer Doug Cutting, Hadoop became a top- level project of the Apache Software Foundation in January 2008, and in September 2010 Cutting was elected chair of the ASF Foundation.

Hadoop enables intensive computational processes with massive amounts of data, in the range of petabytes. Built on Java, Hadoop uses Google 's MapReduce algorithm with configurable classes for Map, Reduce and Combine phases. Hadoop is also used on Facebook, IBM and Yahoo web platforms. Companies that support Hadoop include EMC, Microsoft, SAP and Teradata.

The Hadoop Distributed File System( HDFS) is a distributed file system with high fault tolerance to hardware failures. It is used to store several 100 million files on multiple storage components and servers. It divides the files into fixed-length data blocks and distributes them among the connected hardware. A master node (NameNode) processes the incoming queries and organizes the storage of metadata of files in the slaves.

Hadoop includes several extensions, including the HBase database, a free implementation of Google BigTable for managing billions of rows within a Hadoop cluster. Hive enables data warehousing with Hadoop. This query language has SQL-basedsyntax and was developed by Facebook. Other extensions include Pig for developing MapReduce programs in the Pig Latin programming language, ZooKeeper for configuring distributed systems, and Chukwa for monitoring distributed systems in real time.

With Apache Spark, the Apache Software Foundation has developed a much faster open source framework.

Englisch: Hadoop
Updated at: 15.12.2017
#Words: 245
Links: Apache, advanced streaming format (file format) (ASF), framework, software (SW), level
Translations: DE

All rights reserved DATACOM Buchverlag GmbH © 2024