Goto

Collaborating Authors

 Performance Analysis


Catching Bugs Without Really Trying

#artificialintelligence

Finding and fixing bugs is critical to delivering quality software. One-time new bugs are often introduced into a system when changes are uploaded to a software repository. Changes may be due to adding new features or possibly correcting existing bugs. Mozilla is planning on taking advantage of research being done by Ubisoft using artificial intelligence (AI) and machine learning (ML) to automatically catch software bugs when source code is committed to a software repository. The software is called CLEVER for Combining Levels of Bug Prevention and Resolution techniques.


8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

#artificialintelligence

Has this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy immediately. You dive a little deeper and discover that 90% of the data belongs to one class. This is an example of an imbalanced dataset and the frustrating results it can cause.


8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

#artificialintelligence

Has this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy immediately. You dive a little deeper and discover that 90% of the data belongs to one class. This is an example of an imbalanced dataset and the frustrating results it can cause.


What is Predictive Model Performance Evaluation DIMENSIONLESS TECHNOLOGIES PVT.LTD.

#artificialintelligence

Evaluation metrics have a correlation with machine learning tasks. The tasks of classification, regression, ranking, clustering, topic modelling, etc, all have different metrics. Some metrics, such as precision, recall, are of use for multiple tasks. Classification, regression, and ranking are examples of supervised learning, which comprises a majority of machine learning applications. In this blog, we'll be focusing on the metrics for supervised learning modules.


Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

arXiv.org Machine Learning

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference - it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present ease.ml/ci, to our best knowledge, the first continuous integration system for machine learning. The challenge of building ease.ml/ci is to provide rigorous guarantees, e.g., single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems.


On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

arXiv.org Machine Learning

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.


Machine Learning Series Day 3 (Naive Bayes) โ€“ Becoming Human: Artificial Intelligence Magazine

#artificialintelligence

Intuitively, the idea of a Naive Bayes is how you probably approach life. Like all my articles, I believe that a simple and intuitive understanding of a model should be understood first before diving into the mathematics and practical jargon. Let's say you're responsible for Thanksgiving dinner. You have cooked Thanksgiving dinner for the last ten years. Within those ten years, you have prepared three desserts: pumpkin pie, chocolate cheesecake, and white macadamia cookies.


Cross validation in sparse linear regression with piecewise continuous nonconvex penalties and its acceleration

arXiv.org Machine Learning

We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the smoothly clipped absolute deviation (SCAD) penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each component of the design matrix is given as an independent and identically distributed (i.i.d.) Gaussian variable. This clarifies the superiority of the SCAD estimator compared with $\ell_1$ in a wide parameter range, although the nonconvex nature of the penalty tends to lead to solution multiplicity in certain regions. This multiplicity is shown to be connected to replica symmetry breaking in the spin-glass theory, and associated phase diagrams are given. We also show that the global minimum of the mean square error between the estimator and the true signal is located in the replica symmetric phase. Second, we develop an approximate formula efficiently computing the cross-validation error without actually conducting the cross-validation, which is also applicable to the non-i.i.d. design matrices. It is shown that this formula is only applicable to the unique solution region and tends to be unstable in the multiple solution region. We implement instability detection procedures, which allows the approximate formula to stand alone and resultantly enables us to draw phase diagrams for any specific dataset. Third, we propose an annealing procedure, called nonconvexity annealing, to obtain the solution path efficiently. Numerical simulations are conducted on simulated datasets to examine these results to verify the consistency of the theoretical results and the efficiency of the approximate formula and nonconvexity annealing.


AI, Live Video And Your Smartphone Camera

#artificialintelligence

Badri is the Senior Vice President, Technology at Vonage - Video Engineering. As I speak with business leaders from around the world, I'm continually surprised by two important realities that seem to go unnoticed and that are poised to transform the way companies engage with their customers. First, while artificial intelligence (AI) remains a buzzword, many people are still unaware of how advanced algorithms have become. We're not talking about a collaborative filtering algorithm that predicts which Netflix shows you'll want to watch next. Today's algorithms are able to mimic human decision-making on tasks as complex as composing music and predicting what topics are of interest to your Congressional representatives.


Machine Learning Model for Early Sepsis Risk Stratification - Infectious Disease Advisor

#artificialintelligence

A new sepsis screening tool developed using machine learning was timelier and more discriminating than several benchmark screening tools, according to data published in the Annals of Emergency Medicine. The new tool, the Risk of Sepsis (RoS) score, was developed using machine learning and compared with benchmark sepsis-screening tools such as the systemic inflammatory response syndrome, sequential organ failure assessment, quick sequential organ failure assessment, modified early warning score, and national early warning score. Investigators used retrospective electronic health record data from adult patients from 49 urban community hospital emergency departments over a 22-month period to derive and test the model. A total of 2,759,529 records were obtained using the Rhee, et al1 standard for clinical surveillance criteria as the definition of sepsis and the primary target for developing the model. The selection process consisted of 3 stages: (1) existing models for sepsis screening were reviewed, (2) consultation with local subject matter experts, and (3) supervised machine learning called gradient boosting.