Apache Spark: Promises and Challenges - DZone Big Data

#artificialintelligence 

If you're looking for a solution for processing huge chucks of data, then there are lots of options these days. Depending on your use case and the type of operations you want to perform on data, you can choose from a variety of data-processing frameworks, such as Apache Samza, Apache Storm…, and Apache Spark. Apache Spark is a full-fledged, data engineering toolkit that enables you to operate on large datasets without worrying about underlying infrastructure. It helps you with data ingestion, querying, processing, and machine learning, while providing an abstraction for building a distributed system. Spark is known for its speed, which is a result of improved implementation of MapReduce that focuses on keeping data in memory instead of persisting data on disk. Apache Spark provides libraries for three languages, i.e., Scala, Java, and Python.