A Decade Later, Apache Spark Still Going Strong


Don't look now but Apache Spark is about to turn 10 years old. The open source project began quietly at UC Berkeley in 2009 before emerging as an open source project in 2010. For the past five years, Spark has been on an absolute tear, becoming one of the most widely used technologies in big data and AI. Let's take a look at Spark's remarkable run up to this point, and see where it might be headed next. Apache Spark is best known as the in-memory replacement for MapReduce, the disk-based computational engine at the heart of early Hadoop clusters.

GitHub digs into machine-learning repos, uncovers a lot of Python • DEVCLASS


GitHub has thrown a light on what is happening in the machine learning and data science development worlds, by doing its own data dig on repos across its platform. The figures covered the period from January 1 to December 31 2018,and tracked "contributions" such as pushing code, opening issues or pull request, commenting, etc. Unsurprisingly, perhaps, Python is the most common language among machine learning repositories, and was also the third most common language on GitHub overall, as it has been since 2015. However, C, JavaScript, Java, C#, Shell made up the rest of the top five languages, while Julia came in at 6, R at 8 and Scala at 10. "Julia, R, and Scala all appear in the top 10 for machine learning projects but not for GitHub overall," GitHub said. When it comes to the Python packages imported by machine learning or data science projects, numpy was the leader, being taken in by 74 per cent. The top five was rounded out by scipy (47 per cent) pandas (41 per cent) matplotlib (40 per cent) and sckit-learn (38 per cent).

Top 8 programming languages every data scientist should master in 2019


The demand for a data scientist in every industry is growing substantially. For the development of every business, there is a need to assess the data you gather. And data scientists require both the right tools and perfect skill set to enable you to produce better results with your information. Based on the Forbes report, Data Science is the best job in the US for the last three consecutive years. Also, according to an IBM study, the demand for Data Scientist will increase by 28% by 2020, with nearly three million job openings for data science professionals.

The State of the Octoverse: Machine Learning - The GitHub Blog


In our 2018 Octoverse report, we noticed machine learning and data science were popular topics on GitHub. We decided to dig a little deeper into the state of machine learning and data science on GitHub. We pulled data on contributions between January 1, 2018 and December 31, 2018. Contributions could include pushing code, opening an issue or pull request, commenting on an issue or pull request, or reviewing a pull request. For the most imported packages, we used data from the dependency graph, which includes all public repositories and any private repositories that have opted in to the dependency graph.

Usage-Driven Groupings of Data Science and Machine Learning Programming Languages


Analysis of usage patterns of 16 data science programming languages by over 18,000 data professionals showed that programming languages can be grouped into a smaller set (specifically, 5 groupings). That is, some programming languages tend to be used together apart from other programming languages. A few of the different groupings of languages reflect specific types of applications or specific roles that data professionals could support, including analytics, general-purpose, and front-end efforts. Data scientists and machine learning engineers rely on programming languages to help them get insights from data. A recent analysis showed that data professionals typically use around 3 programming languages.

Machine Learning Systems - Programmer Books


Machine Learning Systems: Designs that scale is an example-rich guide that teaches you how to implement reactive design solutions in your machine learning systems to make them as reliable as a well-built web app. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. If you're building machine learning models to be used on a small scale, you don't need this book. But if you're a developer building a production-grade ML application that needs quick response times, reliability, and good user experience, this is the book for you. It collects principles and practices of machine learning systems that are dramatically easier to run and maintain, and that is reliably better for users.

Predictive and Preventive Maintenance using IoT, Machine Learning & Apache Spark – BMC Blogs


Here we explain a use case of how to use Apache Spark and machine learning. This is the classic preventive maintenance problem, one of the most common business use cases of machine learning and IoT too. We take the data for this analysis from the Kaggle website, a site dedicated to data science. This is sensor data from machines, specifically moisture, temperature, and pressure. The goal is to predict which machines needs to be taken out of service for maintenance.

Evolutionary Algorithms on the JVM via Scala -- a minimal introduction


Unless you've just woken up from a several-year cryostasis, you're probably aware of the recent resurgence of machine learning and AI. This is yet another cycle of enthusiasm (historically interspersed with so-called Winters), and this one is fueled mostly by interest in recommendation systems and the advances -- in algorithmics and supporting hardware -- of neural networks for machine vision and other purposes. It is therefore worthwhile to also consider other machine learning approaches, not as significantly blessed by the current hype. So, let's talk about evolution. The generic proper term for any sort of heuristic approach that is inspired and/or mimics the process of evolution is Evolutionary Algorithms.

What Does an Ideal Data Scientist's Profile Look Like?


If you are a Data Science job seeker, you must be wondering all the time what skills to put on your resume to get calls; if you are looking to get into the field, you may have scratched your head many times wanting to know which technologies to learn to be an attractive candidate. Read on, I have the answer for you. First, we look at the skill requirements for different job titles. There once was a debate of whether Python or R is the language of choice in Data Science. Clearly the demand in market is telling us that Python now is the leader.

OCAPIS: R package for Ordinal Classification And Preprocessing In Scala Machine Learning

Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order constraints. In this paper, an innovative R package named ocapis (Ordinal Classification and Preprocessing In Scala) is introduced. Implemented mainly in Scala and available through Github, this library includes four learners and two pre-processing algorithms for ordinal and monotonic data. Main features of the package and examples of installation and use are explained throughout this manuscript.