AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Machine LearningMar-1-2019

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

Renggli, Cedric, Karlaš, Bojan, Ding, Bolin, Liu, Feng, Schawinski, Kevin, Wu, Wentao, Zhang, Ce

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference - it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present ease.ml/ci, to our best knowledge, the first continuous integration system for machine learning. The challenge of building ease.ml/ci is to provide rigorous guarantees, e.g., single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems.

artificial intelligence, machine learning, testset, (17 more...)

1903.00278

Country: Europe (0.14)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Elkano, Mikel, Uriz, Mikel, Bustince, Humberto, Galar, Mikel

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

arXiv.org Machine LearningFeb-28-2019

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.

artificial intelligence, data mining, machine learning, (19 more...)

doi: 10.1109/BigDataCongress.2018.00011

1903.00345

Country:

Europe > Spain > Navarre > Pamplona (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.84)

#artificialintelligenceFeb-27-2019, 14:50:59 GMT

Machine Learning Series Day 3 (Naive Bayes) – Becoming Human: Artificial Intelligence Magazine

Intuitively, the idea of a Naive Bayes is how you probably approach life. Like all my articles, I believe that a simple and intuitive understanding of a model should be understood first before diving into the mathematics and practical jargon. Let's say you're responsible for Thanksgiving dinner. You have cooked Thanksgiving dinner for the last ten years. Within those ten years, you have prepared three desserts: pumpkin pie, chocolate cheesecake, and white macadamia cookies.

artificial intelligence magazine, machine learning series day 3, naive baye, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.65)

Obuchi, Tomoyuki, Sakata, Ayaka

Cross validation in sparse linear regression with piecewise continuous nonconvex penalties and its acceleration

arXiv.org Machine LearningFeb-27-2019

We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the smoothly clipped absolute deviation (SCAD) penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each component of the design matrix is given as an independent and identically distributed (i.i.d.) Gaussian variable. This clarifies the superiority of the SCAD estimator compared with $\ell_1$ in a wide parameter range, although the nonconvex nature of the penalty tends to lead to solution multiplicity in certain regions. This multiplicity is shown to be connected to replica symmetry breaking in the spin-glass theory, and associated phase diagrams are given. We also show that the global minimum of the mean square error between the estimator and the true signal is located in the replica symmetric phase. Second, we develop an approximate formula efficiently computing the cross-validation error without actually conducting the cross-validation, which is also applicable to the non-i.i.d. design matrices. It is shown that this formula is only applicable to the unique solution region and tends to be unstable in the multiple solution region. We implement instability detection procedures, which allows the approximate formula to stand alone and resultantly enables us to draw phase diagrams for any specific dataset. Third, we propose an annealing procedure, called nonconvexity annealing, to obtain the solution path efficiently. Numerical simulations are conducted on simulated datasets to examine these results to verify the consistency of the theoretical results and the efficiency of the approximate formula and nonconvexity annealing.

artificial intelligence, machine learning, sparse linear regression, (16 more...)

1902.10375

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

#artificialintelligenceFeb-26-2019, 02:40:04 GMT

AI, Live Video And Your Smartphone Camera

Badri is the Senior Vice President, Technology at Vonage - Video Engineering. As I speak with business leaders from around the world, I'm continually surprised by two important realities that seem to go unnoticed and that are poised to transform the way companies engage with their customers. First, while artificial intelligence (AI) remains a buzzword, many people are still unaware of how advanced algorithms have become. We're not talking about a collaborative filtering algorithm that predicts which Netflix shows you'll want to watch next. Today's algorithms are able to mimic human decision-making on tasks as complex as composing music and predicting what topics are of interest to your Congressional representatives.

algorithm, artificial intelligence, machine learning, (15 more...)

Industry:

Media (0.77)
Leisure & Entertainment (0.56)
Information Technology (0.51)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

#artificialintelligenceFeb-26-2019, 00:57:08 GMT

Machine Learning Model for Early Sepsis Risk Stratification - Infectious Disease Advisor

A new sepsis screening tool developed using machine learning was timelier and more discriminating than several benchmark screening tools, according to data published in the Annals of Emergency Medicine. The new tool, the Risk of Sepsis (RoS) score, was developed using machine learning and compared with benchmark sepsis-screening tools such as the systemic inflammatory response syndrome, sequential organ failure assessment, quick sequential organ failure assessment, modified early warning score, and national early warning score. Investigators used retrospective electronic health record data from adult patients from 49 urban community hospital emergency departments over a 22-month period to derive and test the model. A total of 2,759,529 records were obtained using the Rhee, et al1 standard for clinical surveillance criteria as the definition of sepsis and the primary target for developing the model. The selection process consisted of 3 stages: (1) existing models for sepsis screening were reviewed, (2) consultation with local subject matter experts, and (3) supervised machine learning called gradient boosting.

early sepsis risk stratification, screening tool, sequential organ failure assessment, (8 more...)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.80)

arXiv.org Machine LearningFeb-26-2019

Saec: Similarity-Aware Embedding Compression in Recommendation Systems

Wu, Xiaorui, Xu, Hong, Zhang, Honglin, Chen, Huaming, Wang, Jian

Production recommendation systems rely on embedding methods to represent various features. An impeding challenge in practice is that the large embedding matrix incurs substantial memory footprint in serving as the number of features grows over time. We propose a similarity-aware embedding matrix compression method called Saec to address this challenge. Saec clusters similar features within a field to reduce the embedding matrix size. Saec also adopts a fast clustering optimization based on feature frequency to drastically improve clustering time. We implement and evaluate Saec on Numerous, the production distributed machine learning system in Tencent, with 10-day worth of feature data from QQ mobile browser. Testbed experiments show that Saec reduces the number of embedding vectors by two orders of magnitude, compresses the embedding size by ~27x, and delivers the same AUC and log loss performance.

artificial intelligence, machine learning, vector, (17 more...)

1903.00103

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Zachariah, Dave, Stoica, Petre

Effect Inference from Two-Group Data with Sampling Bias

arXiv.org Machine LearningFeb-26-2019

In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we develop an inference method that is resilient to sampling biases and is able to control the false positive errors under moderate bias levels in contrast to the standard approach. We demonstrate the method using synthetic and real biomarker data.

artificial intelligence, estimator, machine learning, (16 more...)

1902.09923

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.49)
Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.37)

Kate, Rohit J., Pearce, Noah, Mazumdar, Debesh, Nilakantan, Vani

Continual Prediction from EHR Data for Inpatient Acute Kidney Injury

arXiv.org Machine LearningFeb-26-2019

Acute kidney injury (AKI) commonly occurs in hospitalized patients and can lead to serious medical complications. In order to optimally predict AKI before it develops at any time during a hospital stay, we present a novel framework in which AKI is continually predicted automatically from EHR data over the entire hospital stay instead of at only one particular time. The continual model predicts AKI every time a patients AKI-relevant variable changes in the EHR. Thus the model is not only independent of a particular time for making predictions, but it can also leverage the latest values of all the AKI-relevant patient variables for making predictions. Using data of 44,691 hospital stays of duration longer than 24 hours we evaluated our continual prediction model and compared it with the traditional one-time prediction models. Excluding hospitals stays in which AKI occurred within 24 hours from admission, the one-time prediction model predicting at 24 hours from admission obtained area under ROC curve (AUC) of 0.653 while the continual prediction model obtained AUC of 0.724. The one-time prediction model that predicts at 24 hours obviously cannot predict AKI incidences that occur within 24 hours of admission which when included in the evaluation reduced its AUC to 0.57. In comparison, the continual prediction model had AUC of 0.709. The continual prediction model also did better than all other one-time prediction models predicting at other fixed times. By being able to take into account the latest values of AKI-relevant patient variables and by not being limited to a particular time of prediction, the continual prediction model out-performed one-time prediction models in predicting AKI.

artificial intelligence, data mining, machine learning, (17 more...)

1902.10228

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)