What is Spark? Why there is a serious buzz going on about this technology? I hope this introduction tutorial will help to answer some of these questions. Apache Spark is an open-source cluster computing system that provides high-level API in Java, Scala, Python and R. It can access data from HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source. And run in Standalone, YARN and Mesos cluster manager.

The objective of this introductory guide is to provide Spark Overview in detail, its history, its architecture, deployment model and RDD.

What is Spark?

Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API. For example, Java, Scala, Python and R. It is 100 times faster than Bigdata Hadoop and 10 times faster than accessing data from disk.

It is written in Scala but provides rich APIs in Scala, Java, Python, and R.

It can be integrated with Hadoop and can process existing Hadoop HDFS data.

It is an open source, wide range data processing engine with revealing development API’s, that qualify data workers to accomplish streaming, machine learning or SQL workloads which demand repeated access to data sets. It is designed in such a way that it can perform batch processing (processing of the previously collected job in a single batch) and stream processing (deal with streaming data). It is a general purpose, cluster computing platform.
If you need more information please visit : https://www.w3schools.blog/


Leave a Reply

Your email address will not be published. Required fields are marked *