AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceOct-4-2019, 08:00:10 GMT

Choosing a Machine Learning Model

The number of shiny models out there can be overwhelming, which means a lot of times people fall back on a few they trust the most and use them on all new problems. This can lead to sub-optimal results. Today we're going to learn how to quickly and efficiently narrow down the space of available models to find those that are most likely to perform best on your problem type. We'll also see how we can keep track of our models' performances using Weights and Biases and compare them. You can find the accompanying code here.

dataset, kaggle competition, machine learning model, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.49)

Yan, Hao, Paynabar, Kamran, Shi, Jianjun

AKM$^2$D : An Adaptive Framework for Online Sensing and Anomaly Quantification

arXiv.org Machine LearningOct-4-2019

In point-based sensing systems such as coordinate measuring machines (CMM) and laser ultrasonics where complete sensing is impractical due to the high sensing time and cost, adaptive sensing through a systematic exploration is vital for online inspection and anomaly quantification. Most of the existing sequential sampling methodologies focus on reducing the overall fitting error for the entire sampling space. However, in many anomaly quantification applications, the main goal is to estimate sparse anomalous regions in the pixel-level accurately. In this paper, we develop a novel framework named Adaptive Kernelized Maximum-Minimum Distance AKM$^2$D to speed up the inspection and anomaly detection process through an intelligent sequential sampling scheme integrated with fast estimation and detection. The proposed method balances the sampling efforts between the space-filling sampling (exploration) and focused sampling near the anomalous region (exploitation). The proposed methodology is validated by conducting simulations and a case study of anomaly detection in composite sheets using a guided wave test.

anomalous region, optimization problem, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1910.02119

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Kempfert, Katherine C., Wang, Yishi, Chen, Cuixian, Wong, Samuel W. K.

A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

arXiv.org Machine LearningOct-4-2019

Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.

classification, morph-ii, skpca, (16 more...)

arXiv.org Machine Learning

1910.02114

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > North Carolina > New Hanover County > Wilmington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (0.87)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.68)
Education (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Khosravi, Pasha, Choi, YooJung, Liang, Yitao, Vergari, Antonio, Broeck, Guy Van den

On Tractable Computation of Expected Predictions

Computing expected predictions has many interesting applications in areas such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations of the latter with respect to the former, as well as moments of any order, in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations, for classification scenarios in which exact computations are intractable. We evaluate the effectiveness of our exact and approximate algorithms in handling missing data during prediction time where they prove to be competitive to standard imputation techniques on a variety of datasets. Finally, we illustrate how expected prediction framework can be used to reason about the behaviour of discriminative models.

computation, prediction, vtree, (16 more...)

1910.02182

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > Australia > Tasmania (0.04)
North America > Canada (0.04)
Indian Ocean > Bass Strait (0.04)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
(2 more...)

Liu, Dianbo, Miller, Timothy A, Mandl, Kenneth D.

Confederated Machine Learning on Horizontally and Vertically Separated Medical Data for Large-Scale Health System Intelligence

Access to a large amount of high quality data is possibly the most important factor for success in advancing medicine with machine learning and data science. However, valuable healthcare data are usually distributed across isolated silos, and there are complex operational and regulatory concerns. Data on patient populations are often horizontally separated,each other across different practices and health systems. In addition, individual patient data are often vertically separated, by data type, across her sites of care, service, and testing. We train a confederated learning model in a manner to stratify elderly patients by their risk of a fall in the next two years, using diagnoses, medication claims data and clinical lab test records of patients.

data type, learning, separation, (14 more...)

1910.02109

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > West Virginia (0.04)
(29 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Nachum, Ofir, Jiang, Heinrich

Group-based Fair Learning Leads to Counter-intuitive Predictions

A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations. We propose a desirable property for these procedures, slack-consistency: For any individual, the predictions of the model should be monotonic with respect to allowed slack (i.e., maximum allowed group-parity violation). Such monotonicity can be useful for individuals to understand the impact of enforcing fairness on their predictions. Surprisingly, we find that standard ML methods for enforcing fairness violate this basic property. Moreover, this undesirable behavior arises in situations agnostic to the complexity of the underlying model or approximate optimizations, suggesting that the simple act of incorporating a constraint can lead to drastically unintended behavior in ML. We present a simple theoretical method for enforcing slack-consistency, while encouraging further discussions on the unintended behaviors potentially induced when enforcing group-based parity.

constraint, prediction, threshold, (10 more...)

1910.02097

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Darabi, Sajad, Kachuee, Mohammad, Sarrafzadeh, Majid

Unsupervised Representation for EHR Signals and Codes as Patient Status Vector

Effective modeling of electronic health records presents many challenges as they contain large amounts of irregularity most of which are due to the varying procedures and diagnosis a patient may have. Despite the recent progress in machine learning, unsupervised learning remains largely at open, especially in the healthcare domain. In this work, we present a two-step unsupervised representation learning scheme to summarize the multi-modal clinical time series consisting of signals and medical codes into a patient status vector. First, an auto-encoder step is used to reduce sparse medical codes and clinical time series into a distributed representation. Subsequently, the concatenation of the distributed representations is further fine-tuned using a forecasting task. We evaluate the usefulness of the representation on two downstream tasks: mortality and readmission. Our proposed method shows improved generalization performance for both short duration ICU visits and long duration ICU visits.

arxiv preprint arxiv, downstream task, representation, (11 more...)

1910.01803

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

#artificialintelligenceOct-3-2019, 23:33:45 GMT

Predictive Analytics using Machine Learning

Below you will read in the training and test data which are already split for you to load separately. Then use unnest() from tidytext to create the tidy version with one word per record. Now that you have train and test data loaded and tidied, you can see how many songs exist per artist/author. Since the dataset has songs and book pages, I'll refer to them each as a document. The features that you will create are based on documents and their associated metadata, so it's important to understand this concept.

algorithm, dataset, tutorial, (16 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

#artificialintelligenceOct-3-2019, 21:48:07 GMT

11 Important Model Evaluation Error Metrics Everyone should know

This article was originally published in February 2016 and updated in August 2019. The idea of building machine learning models works on a constructive feedback principle. You build a model, get feedback from metrics, make improvements and continue until you achieve a desirable accuracy. Evaluation metrics explain the performance of a model. An important aspect of evaluation metrics is their capability to discriminate among model results. I have seen plenty of analysts and aspiring data scientists not even bothering to check how robust their model is. Once they are finished building a model, they hurriedly map predicted values on unseen data. This is an incorrect approach. Simply building a predictive model is not your motive. It's about creating and selecting a model which gives high accuracy on out of sample data.

cross validation, decile, validation, (16 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)