Apache Spark is an open-source framework for analytic computing with large data sets over clustered computers. It was developed by the Apache Software Foundation( ASF) and is about a hundred times faster than Apache Hadoop.
Apache Spark is a unified framework with in-memory computing for comprehensive online analytical processing( OLAP). Spark has a DAG engine, Directed Acyclic Graph (DAG), which supports cyclic data flow. The Spark platform can access data from a wide variety of repositories, including the Hadoop Distributed File System( HDFS), NoSQLdatabases, and relational databases.
The Spark engine works in part like an application programming interface and is supported by related tools for managing and analyzing data, including Spark SQL for accessing relational databases, the Function Library of learning algorithms, the Machine Learning Library (MLlib), a distributed framework for computing on graphs, GraphX, and Spark Streaming for stream processing.