A Big Data Analysis Framework Using Apache Spark and Deep Learning

Gupta, Anand, Thakur, Hardeo, Shrivastava, Ritvik, Kumar, Pulkit, Nag, Sreyashi

Nov-25-2017–arXiv.org Machine Learning

Abstract--With the spreading prevalence of Big Data, many advances have recently been made in this field. Frameworks such as Apache Hadoop and Apache Spark have gained a lot of traction over the past decades and have become massively popular, especially in industries. It is becoming increasingly evident that effective big data analysis is key to solving artificial intelligence problems. Thus, a multi-algorithm library was implemented in the Spark framework, called MLlib. While this library supports multiple machine learning algorithms, there is still scope to use the Spark setup efficiently for highly timeintensive and computationally expensive procedures like deep learning. In this paper, we propose a novel framework that combines the distributive computational abilities of Apache Spark and the advanced machine learning architecture of a deep multilayer perceptron (MLP), using the popular concept of Cascade Learning. We conduct empirical analysis of our framework on two real world datasets. The results are encouraging and corroborate our proposed framework, in turn proving that it is an improvement over traditional big data analysis methods that use either Spark or Deep learning as individual elements. A. Overview I. INTRODUCTION With the amount of data growing at an exponential rate, it is necessary to develop tools that are able to harness that data and extract value from it.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

Nov-25-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report (0.83)
- Overview (0.68)

Industry:
- Information Technology > Security & Privacy (0.93)
- Health & Medicine > Therapeutic Area
  - Cardiology/Vascular Diseases (0.47)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found