Goto

Collaborating Authors

 Education


Ensemble Machine Learning in Python: Random Forest, AdaBoost

@machinelearnbot

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


Data Warehouse Concepts, Design, and Data Integration Coursera

@machinelearnbot

About this course: This is the second course in the Data Warehousing for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will learn exciting concepts and skills for designing data warehouses and creating data integration workflows. These are fundamental skills for data warehouse developers and administrators. You will have hands-on experience for data warehouse design and use open source products for manipulating pivot tables and creating data integration workflows.You will also gain conceptual background about maturity models, architectures, multidimensional models, and management practices, providing an organizational perspective about data warehouse development.


Advanced Data Mining projects with R Udemy

@machinelearnbot

Advanced Data Mining Projects with R takes you one step ahead in understanding the most complex data mining algorithms and implementing them in the popular R language. Follow up to our course Data Mining Projects in R, this course will teach you how to build your own recommendation engine. You will also implement dimensionality reduction and use it to build a real-world project. Going ahead, you will be introduced to the concept of neural networks and learn how to apply them for predictions, classifications, and forecasting. Finally, you will implement ggplot2, plotly and aspects of geomapping to create your own data visualization projects.By the end of this course, you will be well-versed with all the advanced data mining techniques and how to implement them using R, in any real-world scenario.


The Blueprint for Developers to Get Started with Machine Learning - The New Stack

#artificialintelligence

Many developers (including myself) have included learning machine learning in their new year resolutions for 2018. Even after blocking an hour everyday in the calendar, I am hardly able to make progress. The key reason for this is the confusion on where to start and how to get started. It is overwhelming for an average developer to get started with machine learning. There are many tutorials, MOOCs, free resources, and blogs covering this topic. But they are only adding to the confusion by making it look complex.


Gartner: Here are 4 critical lessons we've learned from early AI projects

#artificialintelligence

While the value of artificial intelligence (AI) is just beginning to emerge in the enterprise, some 46% of CIOs have plans to implement the technology in the future, according to a survey from research firm Gartner. "Despite huge levels of interest in AI technologies, current implementations remain at quite low levels," Gartner analyst Whit Andrews wrote in a press release. "However, there is potential for strong growth as CIOs begin piloting AI programmes through a combination of buy, build and outsource efforts." Early AI projects have demonstrated the value of the technology, but have also showed some of the challenges linked to its adoption, the release said. Here are four of the main lessons that the enterprise has learned from early AI initiatives.



500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)

arXiv.org Machine Learning

Deep learning methods are useful for high-dimensional data and are becoming widely used in many areas of software engineering. Deep learners utilizes extensive computational power and can take a long time to train-- making it difficult to widely validate and repeat and improve their results. Further, they are not the best solution in all domains. For example, recent results show that for finding related Stack Overflow posts, a tuned SVM performs similarly to a deep learner, but is significantly faster to train. This paper extends that recent result by clustering the dataset, then tuning very learners within each cluster. This approach is over 500 times faster than deep learning (and over 900 times faster if we use all the cores on a standard laptop computer). Significantly, this faster approach generates classifiers nearly as good (within 2\% F1 Score) as the much slower deep learning method. Hence we recommend this faster methods since it is much easier to reproduce and utilizes far fewer CPU resources. More generally, we recommend that before researchers release research results, that they compare their supposedly sophisticated methods against simpler alternatives (e.g applying simpler learners to build local models).


Shamap: Shape-based Manifold Learning

arXiv.org Machine Learning

Fortunately, in many cases such data are essentially low dimensional; i.e., they stay on a low-dimensional manifold embedded in the high dimensional space [1]. This key observation suggests the possibility of dimensionality reduction to facilitate visualization and analysis of the data. Manifold learning is one of the mainstream nonlinear dimensionality reduction techniques [2]. Driven by major academic and practical motives, many algorithms [3-9] were developed to flatten an embedded manifold and reveal an intrinsic structure. One of the representative methods, Isomap combines the Floyd-Warshall algorithm with multidimensional scaling (MDS [10]) to compress high-dimensional data.


Online Learning for Non-Stationary A/B Tests

arXiv.org Machine Learning

Whether it is a minor tweak, or a major new update, releasing a new version of a running system is a stressful time. While the release has typically gone through rounds of offline testing, real world testing often uncovers additional corner cases that may manifest themselves as bugs, inefficiencies, or overall poor performance. This is especially the case in machine learning applications, where models are typically trained to maximize a proxy objective, and a model that performs better on offline metrics is not guaranteed to work well in practice. The usual approach in such scenarios is to evaluate the new system through a series of closely monitored A/B tests. The new version is usually released to a small number of customers, and, if no concerns are found and metrics look good, the portion of traffic served by the new system is slowly increased. While A/B tests provide a sense of safety in that a detrimental change will be quickly observed and corrected (or rolled back), they are not a silver bullet. First, A/B tests are labor intensive--they are typically monitored manually, with an engineer, or a technician, checking the results of the test on a regular basis (for example, daily or weekly). Second, the evaluation is usually dependent on average metrics--e.g.


Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

arXiv.org Artificial Intelligence

Learning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test inputs. Alternatively, a more recent approach to meta-learning aims to acquire deep representations that can be effectively fine-tuned, via standard gradient descent, to new tasks. In this paper, we consider the meta-learning problem from the perspective of universality, formalizing the notion of learning algorithm approximation and comparing the expressive power of the aforementioned recurrent models to the more recent approaches that embed gradient descent into the meta-learner. In particular, we seek to answer the following question: does deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm? We find that this is indeed true, and further find, in our experiments, that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.