AITopics

1912.04738

Country:

North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Whitaker, Tom, Beranger, Boris, Sisson, Scott A.

Logistic regression models for aggregated data

arXiv.org Machine LearningDec-8-2019

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates compared to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost. Performance is explored through simulated examples, and analyses of large supersymmetry and satellite crop classification datasets.

dataset, histogram, separation, (16 more...)

1912.03805

Country:

Oceania > Australia > Queensland (0.04)
Oceania > Australia > New South Wales (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceDec-7-2019, 19:54:13 GMT

Global Big Data Conference

The boons of machine learning have been leveraged in the industry in the past many years. With its increasing implementation, the ML tools have also evolved with time. Today, people can easily work with machine learning owing to its easy-to-use, user-friendly tools. As the gathering of data and turning it into actionable insights has been automated enough, people with some knowledge of technology and motivation can work with ML. These tools possess the strength to handle the mundane work of collecting data, adding structure and consistency where possible, and then starting the calculation.

global big data conference, splunk

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)

#artificialintelligenceDec-7-2019, 16:18:03 GMT

Explainability: Cracking open the black box, Part 1 - KDnuggets

Explainable AI (XAI) is a sub-field of AI which has been gaining ground in the recent past. And as I machine learning practitioner dealing with customers day in and day out, I can see why. I've been an analytics practitioner for more than 5 years, and I swear, the hardest part of a machine learning project is not creating the perfect model which beats all the benchmarks. It's the part where you convince the customer why and how it works. Humans always had a dichotomy when faced with the unknown.

coefficient, linear regression, regression, (15 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.31)

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Garg, Bhanu, Manwani, Naresh

Robust Deep Ordinal Regression Under Label Noise

arXiv.org Machine LearningDec-7-2019

State-of-the-art ordinal regression methods rely on the correctness of the labels in the data. The real-world data might be susceptible to label noise, and the existing state of the art algorithms do not take label noise into account. So far, none of the approaches for ordinal regression take care of the label noise issue. We propose two novel noise models for ordinal regression. Further, we propose a general framework for robust ordinal regression learning. The proposed method is based on unbiased estimators approach and assumes the knowledge of the noise model. We then give a deep learning implementation for two commonly used loss functions for ordinal regression. We prove that this approach gives a rank consistent model, which is needed for a good ranking rule. We verify the proposed approach empirically and show that it is indeed robust to label noise. To the best of our knowledge, this is the first approach for learning robust deep ordinal regression models in the presence of label noise.

imc, ordinal regression, regression, (15 more...)

1912.03488

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > India > Telangana > Hyderabad (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

#artificialintelligenceDec-6-2019, 07:18:29 GMT

Is Netflix Original Content getting worse?

Using the data available I will make a simple Logistic Regression model to predict the status of a show. For this analysis the training set is small but the model may still provide some insights as to the important features in Netflix's decision to Renew or End a show. Since the mean rating of renewed vs ended shows seems to be a major difference a very simple model which would be intuitive would be to predict a higher IMDB rating as renewed and a lower rating as ended. My model will take into account more features than just rating and hopefully will be able to provide some insights into why shows are renewed or ended by Netflix management. For how small the dataset is that I am working with and how simple the model is these accuracy scores are pretty good!

higher value, imdb rating, netflix original content

Industry:

Media > Television (0.88)
Media > Film (0.88)
Information Technology > Services (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)

Tantipongpipat, Uthaipon, Waites, Chris, Boob, Digvijay, Siva, Amaresh Ankit, Cummings, Rachel

Differentially Private Mixed-Type Data Generation For Unsupervised Learning

arXiv.org Machine LearningDec-6-2019

In this work we introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data, and privately train a model for generating synthetic data that will satisfy the same statistical properties as the original data. This learned model can be used to generate arbitrary amounts of publicly available synthetic data, which can then be freely shared due to the post-processing guarantees of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both unlabeled binary data (MIMIC-III) and unlabeled mixed-type data (ADULT). We also introduce new metrics for evaluating the quality of synthetic mixed-type data, particularly in unsupervised settings.

dataset, privacy, synthetic data, (14 more...)

1912.0325

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningDec-6-2019

Influenza Modeling Based on Massive Feature Engineering and International Flow Deconvolution

Liu, Ziming, Wang, Yixuan, Han, Zizhao, Wu, Dian

In this article, we focus on the analysis of the potential factors driving the spread of influenza, and possible policies to mitigate the adverse effects of the disease. To be precise, we first invoke discrete Fourier transform (DFT) to conclude a yearly periodic regional structure in the influenza activity, thus safely restricting ourselves to the analysis of the yearly influenza behavior. Then we collect a massive number of possible region-wise indicators contributing to the influenza mortality, such as consumption, immunization, sanitation, water quality, and other indicators from external data, with $1170$ dimensions in total. We extract significant features from the high dimensional indicators using a combination of data analysis techniques, including matrix completion, support vector machines (SVM), autoencoders, and principal component analysis (PCA). Furthermore, we model the international flow of migration and trade as a convolution on regional influenza activity, and solve the deconvolution problem as higher-order perturbations to the linear regression, thus separating regional and international factors related to the influenza mortality. Finally, both the original model and the perturbed model are tested on regional examples, as validations of our models. Pertaining to the policy, we make a proposal based on the connectivity data along with the previously extracted significant features to alleviate the impact of influenza, as well as efficiently propagate and carry out the policies. We conclude that environmental features and economic features are of significance to the influenza mortality. The model can be easily adapted to model other types of infectious diseases.

autoencoder, influenza, influenza activity, (15 more...)

1912.02989

Country:

North America > United States (0.14)
Asia > Japan (0.05)
Africa > Nigeria (0.05)
Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

#artificialintelligenceDec-5-2019, 04:10:45 GMT

A Quasi-Newton Method Based Vertical Federated Learning Framework for Logistic Regression

Data privacy and security becomes a major concern in building machine learning models from different data providers. Federated learning shows promise by leaving data at providers locally and exchanging encrypted information. This paper studies the vertical federated learning structure for logistic regression where the data sets at two parties have the same sample IDs but own disjoint subsets of features. Existing frameworks adopt the first-order stochastic gradient descent algorithm, which requires large number of communication rounds. To address the communication challenge, we propose a quasi-Newton method based vertical federated learning framework for logistic regression under the additively homomorphic encryption scheme.

logistic regression, quasi-newton method, vertical federated learning framework, (1 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Ramosaj, Burim, Pauly, Markus

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

arXiv.org Machine LearningDec-5-2019

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is a helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assumptions and prove its (asymptotic) unbiasedness. An extensive simulation study verifies our findings.

permutation importance, sample size, signal-to-noise ratio, (16 more...)

1912.03306

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)