AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Utkin, Lev V., Konstantinov, Andrei V., Chukanov, Viacheslav S., Kots, Mikhail V., Meldo, Anna A.

An Adaptive Weighted Deep Forest Classifier

arXiv.org Machine LearningJan-4-2019

A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance with the classification accuracy of every training instance. However, if the confidence screening mechanism just removes instances from training and testing processes, then the proposed modification is more flexible and assigns weights by taking into account the classification accuracy. The modification is similar to the AdaBoost to some extent. Numerical experiments illustrate good performance of the proposed modification in comparison with the original Deep Forest proposed by Zhou and Feng.

classification, gcforest, modification, (13 more...)

1901.01334

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Chen, Wan-Ping Nicole, Chang, Yuan-chin Ivan

Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation

arXiv.org Machine LearningJan-4-2019

The model interpretation is essential in many application scenarios and to build a classification model with a ease of model interpretation may provide useful information for further studies and improvement. It is common to encounter with a lengthy set of variables in modern data analysis, especially when data are collected in some automatic ways. This kinds of datasets may not collected with a specific analysis target and usually contains redundant features, which have no contribution to a the current analysis task of interest. Variable selection is a common way to increase the ability of model interpretation and is popularly used with some parametric classification models. There is a lack of studies about variable selection in nonparametric classification models such as the density estimation-based methods and this is especially the case for multiple-class classification situations. In this study we study multiple-class classification problems using the thought of sparse non-parametric density estimation and propose a method for identifying high impacts variables for each class. We present the asymptotic properties and the computation procedure for the proposed method together with some suggested sample size. We also repost the numerical results using both synthesized and some real data sets.

bandwidth, title suppressed, wan-ping nicole chen, (13 more...)

1901.01

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

#artificialintelligenceJan-3-2019, 23:21:40 GMT

Big Tech Deploys AI to Combat Hackers

Last year, Microsoft Corp.'s Azure security team detected suspicious activity in the cloud computing usage of a large retailer: One of the company's administrators, who usually logs on from New York, was trying to gain entry from Romania. A hacker had broken in. Microsoft quickly alerted its customer, and the attack was foiled before the intruder got too far. Inc. and various startups are moving away from solely using older "rules-based" technology designed to respond to specific kinds of intrusion and deploying machine-learning algorithms that crunch massive amounts of data on logins, behavior and previous attacks to ferret out and stop hackers. "Machine learning is a very powerful technique for security--it's dynamic, while rules-based systems are very rigid," says Dawn Song, a professor at the University of California at Berkeley's Artificial Intelligence Research Lab. "It's a very manual intensive process to change them, whereas machine learning is automated, dynamic and you can retrain it easily."

artificial intelligence, customer, machine learning, (14 more...)

Country:

North America > United States > New York (0.25)
North America > United States > California (0.25)
Europe > Italy > Sardinia (0.15)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

arXiv.org Machine LearningJan-3-2019

A Model for Learned Bloom Filters, and Optimizing by Sandwiching

Mitzenmacher, Michael

Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to determine a function that models the data set the Bloom filter is meant to represent. Here we model such learned Bloom filters,, with the following outcomes: (1) we clarify what guarantees can and cannot be associated with such a structure; (2) we show how to estimate what size the learning function must obtain in order to obtain improved performance; (3) we provide a simple method, sandwiching, for optimizing learned Bloom filters; and (4) we propose a design and analysis approach for a learned Bloomier filter, based on our modeling approach.

bloom filter, false positive rate, positive rate, (14 more...)

1901.00902

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)

arXiv.org Machine LearningJan-3-2019

Sparse Learning in reproducing kernel Hilbert space

He, Xin, Wang, Junhui

Sparse learning aims to learn the sparse structure of the true target function from the collected data, which plays a crucial role in high dimensional data analysis. This article proposes a unified and universal method for learning sparsity of M-estimators within a rich family of loss functions in a reproducing kernel Hilbert space (RKHS). The family of loss functions interested is very rich, including most commonly used ones in literature. More importantly, the proposed method is motivated by some nice properties in the induced RKHS, and is computationally efficient for large-scale data, and can be further improved through parallel computing. The asymptotic estimation and selection consistencies of the proposed method are established for a general loss function under mild conditions. It works for general loss function, admits general dependence structure, allows for efficient computation, and with theoretical guarantee. The superior performance of our proposed method is also supported by a variety of simulated examples and a real application in the human breast cancer study (GSE20194).

assumption, loss function, selection consistency, (13 more...)

1901.00615

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

#artificialintelligenceJan-2-2019, 12:28:37 GMT

Model evaluation, model selection, and algorithm selection in machine learning

A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808 This final article in the series Model evaluation, model selection, and algorithm selection in machine learning presents overviews of several statistical hypothesis testing approaches, with applications to machine learning model and algorithm comparisons. This includes statistical tests based on target predictions for independent test sets (the downsides of using a single test set for model comparisons was discussed in previous articles) as well as methods for algorithm comparisons by fitting and evaluating models via cross-validation. Lastly, this article will introduce nested cross-validation, which has become a common and recommended a method of choice for algorithm comparisons for small to moderately-sized datasets. Then, at the end of this article, I provide a list of my personal suggestions concerning model evaluation, selection, and algorithm selection summarizing the several techniques covered in this series of articles. There are several different statistical hypothesis testing frameworks that are being used in practice to compare the performance of classification models, including conventional methods such as difference of two proportions (here, the proportions are the estimated generalization accuracies from a test set), for which we can construct 95% confidence intervals based on the concept of the Normal Approximation to the Binomial that was covered in Part I. Performing a z-score test for two population proportions is inarguably the most straight-forward way to compare to models (but certainly not the best!): In a nutshell, if the 95% confidence intervals of the accuracies of two models do not overlap, we can reject the null hypothesis that the performance of both classifiers is equal at a confidence level of (or 5% probability).

artificial intelligence, hypothesis, machine learning, (18 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

#artificialintelligenceJan-1-2019, 23:29:16 GMT

AI and ML choices can dramatically improve data security

As networks have advanced in complexity, so have the tools and tactics of cybercriminals. Organizations increase their cybersecurity budgets and teams, yet breaches keep occurring. In the fight for stronger security, vendors are offering up AI and machine learning as a Holy Grail. But do these technologies actually deliver? Frequent headlines make it clear that cybercriminals are currently are winning battles regularly.

artificial intelligence, machine learning, security team, (18 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Wang, Yan, Ni, Xuelei Sherry, Stone, Brian

An Automatic Interaction Detection Hybrid Model for Bankcard Response Classification

arXiv.org Machine LearningJan-1-2019

In this paper, we propose a hybrid bankcard response model, which integrates decision tree based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possibly potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model is that adding variable interactions may improve the performance of logistic regression. To demonstrate the effectiveness of the proposed hybrid model, it is evaluated on a real credit customer response data set. As the results reveal, by identifying potential interactions among independent variables, the proposed hybrid approach outperforms the logistic regression without searching for interactions in terms of classification accuracy, the area under the receiver operating characteristic curve (ROC), and Kolmogorov-Smirnov (KS) statistics. Furthermore, CHAID analysis for interaction detection is much more computationally efficient than the stepwise search mentioned above and some identified interactions are shown to have statistically significant predictive power on the target variable. Last but not least, the customer profile created based on the CHAID tree provides a reasonable interpretation of the interactions, which is the required by regulations of the credit industry. Hence, this study provides an alternative for handling bankcard classification tasks.

chaid analysis, interaction, logistic regression, (15 more...)

1901.00251

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.14)
North America > United States > Georgia > Cobb County > Kennesaw (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)