AITopics

2006.07911

Country:

Asia > China > Hong Kong (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Credit (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

#artificialintelligenceJun-13-2020, 22:05:59 GMT

Top 12 R Packages For Machine Learning In 2020

R is one of the most prevalent programming languages for statistical analysis and computing. Researchers in the field of data science and statistical computing have been using this language for a few years now because of its number of intuitive features. These features include running code without a compiler, open-source, robust visualisation library, and other such. This article lists down the top 12 R packages for machine learning one must know in 2020. About: The Classification And REgression Training or caret package is a set of functions that seeks to streamline the method for creating predictive models.

artificial intelligence, interface, machine learning, (11 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

#artificialintelligenceJun-13-2020, 00:24:11 GMT

Step-By-Step Guide On How To Build Linear Regression In R (With Code)

It will also provide information about missing values or outliers if any. For more information and functions which you can use read beginner's guide to exploratory data analysis. Both missing values and outliers are of concern for Machine Learning models as they tend to push the result towards extreme values.

artificial intelligence, machine learning, regression, (17 more...)

#artificialintelligence

Country: North America > United States (0.05)

Genre: Research Report > Experimental Study (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Ferrando, Cecilia, Wang, Shufan, Sheldon, Daniel

General-Purpose Differentially-Private Confidence Intervals

arXiv.org Machine LearningJun-13-2020

One of the most common statistical goals is to estimate a population parameter and quantify uncertainty by constructing a confidence interval. However, the field of differential privacy lacks easy-to-use and general methods for doing so. We partially fill this gap by developing two broadly applicable methods for private confidence-interval construction. The first is based on asymptotics: for two widely used model classes, exponential families and linear regression, a simple private estimator has the same asymptotic normal distribution as the corresponding non-private estimator, so confidence intervals can be constructed using quantiles of the normal distribution. These are computationally cheap and accurate for large data sets, but do not have good coverage for small data sets. The second approach is based on the parametric bootstrap. It applies "out of the box" to a wide class of private estimators and has good coverage at small sample sizes, but with increased computational cost. Both methods are based on post-processing the private estimator and do not consume additional privacy budget.

artificial intelligence, confidence interval, machine learning, (17 more...)

2006.07749

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Kohns, David, Szendrei, Tibor

Horseshoe Prior Bayesian Quantile Regression

arXiv.org Machine LearningJun-13-2020

This paper extends the horseshoe prior of Carvalho et al. (2010) to the Bayesian quantile regression (HS-BQR) and provides a fast sampling algorithm that speeds up computation significantly in high dimensions. The performance of the HS-BQR is tested on large scale Monte Carlo simulations and an empirical application relevant to macroeoncomics. The Monte Carlo design considers several sparsity structures (sparse, dense, block) and error structures (i.i.d. errors and heteroskedastic errors). A number of LASSO based estimators (frequentist and Bayesian) are pitted against the HS-BQR to better gauge the performance of the method on the different designs. The HS-BQR yields just as good, or better performance than the other estimators considered when evaluated using coefficient bias and forecast error. We find that the HS-BQR is particularly potent in sparse designs and when estimating extreme quantiles. The simulations also highlight how the high dimensional quantile estimators fail to correctly identify the quantile function of the variables when both location and scale effects are present. In the empirical application, in which we evaluate forecast densities of US inflation, the HS-BQR provides well calibrated forecast densities whose individual quantiles, have the highest pseudo R squared, highlighting its potential for Value-at-Risk estimation.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

2006.07655

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance > Economy (0.93)

Technology:

Information Technology > Modeling & Simulation (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Wundervald, Bruna, Parnell, Andrew, Domijan, Katarina

Generalizing Gain Penalization for Feature Selection in Tree-based Models

We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for more flexibility in the choice of feature-specific importance weights. We validate our method on both simulated and real data and implement itas an extension of the popular R package ranger.

artificial intelligence, machine learning, random forest, (18 more...)

2006.07515

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Chahuara, Pedro, Grislain, Nicolas, Jauvion, Grégoire, Renders, Jean-Michel

Real-Time Optimization Of Web Publisher RTB Revenues

This paper describes an engine to optimize web publisher revenues from second-price auctions. These auctions are widely used to sell online ad spaces in a mechanism called real-time bidding (RTB). Optimization within these auctions is crucial for web publishers, because setting appropriate reserve prices can significantly increase revenue. We consider a practical real-world setting where the only available information before an auction occurs consists of a user identifier and an ad placement identifier. The real-world challenges we had to tackle consist mainly of tracking the dependencies on both the user and placement in an highly non-stationary environment and of dealing with censored bid observations. These challenges led us to make the following design choices: (i) we adopted a relatively simple non-parametric regression model of auction revenue based on an incremental time-weighted matrix factorization which implicitly builds adaptive users' and placements' profiles; (ii) we jointly used a non-parametric model to estimate the first and second bids' distribution when they are censored, based on an on-line extension of the Aalen's Additive model. Our engine is a component of a deployed system handling hundreds of web publishers across the world, serving billions of ads a day to hundreds of millions of visitors. The engine is able to predict, for each auction, an optimal reserve price in approximately one millisecond and yields a significant revenue increase for the web publishers.

artificial intelligence, machine learning, real time system, (20 more...)

doi: 10.1145/3097983.3098150

2006.07083

Country:

North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Industry:

Marketing (0.87)
Information Technology > Services (0.66)
Law > Civil Rights & Constitutional Law (0.60)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

Kato, Masahiro

This study addresses the problem of off-policy evaluation (OPE) from dependent samples obtained via the bandit algorithm. The goal of OPE is to evaluate a new policy using historical data obtained from behavior policies generated by the bandit algorithm. Because the bandit algorithm updates the policy based on past observations, the samples are not independent and identically distributed (i.i.d.). However, several existing methods for OPE do not take this issue into account and are based on the assumption that samples are i.i.d. In this study, we address this problem by constructing an estimator from a standardized martingale difference sequence. To standardize the sequence, we consider using evaluation data or sample splitting with a two-step estimation. This technique produces an estimator with asymptotic normality without restricting a class of behavior policies. In an experiment, the proposed estimator performs better than existing methods, which assume that the behavior policy converges to a time-invariant policy.

artificial intelligence, data mining, machine learning, (14 more...)

2006.06982

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Coyle, Jeremy R., Hejazi, Nima S., Malenica, Ivana, Phillips, Rachael V., Arnold, Benjamin F., Mertens, Andrew, Benjamin-Chung, Jade, Cai, Weixin, Dayal, Sonali, Colford, John M. Jr., Hubbard, Alan E., van der Laan, Mark J.

Targeting Learning: Robust Statistics for Reproducible Research

Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real-world scenarios: observational studies with missing treatments and outcomes, personalized interventions, longitudinal settings with time-varying treatment regimes, survival analysis, adaptive randomized trials, mediation analysis, and networks of connected subjects. In contrast to the (mis)application of restrictive modeling strategies that dominate the current practice of statistics, Targeted Learning establishes a principled standard for statistical estimation and inference (i.e., confidence intervals and p-values). This multiply robust approach is accompanied by a guiding roadmap and a burgeoning software ecosystem, both of which provide guidance on the construction of estimators optimized to best answer the motivating question. The roadmap of Targeted Learning emphasizes tailoring statistical procedures so as to minimize their assumptions, carefully grounding them only in the scientific knowledge available. The end result is a framework that honestly reflects the uncertainty in both the background knowledge and the available data in order to draw reliable conclusions from statistical analyses -- ultimately enhancing the reproducibility and rigor of scientific findings.

artificial intelligence, estimator, machine learning, (17 more...)

2006.07333

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

arXiv.org Machine LearningJun-11-2020

Symbolic Regression using Mixed-Integer Nonlinear Optimization

Austel, Vernon, Cornelio, Cristina, Dash, Sanjeeb, Goncalves, Joao, Horesh, Lior, Josephson, Tyler, Megiddo, Nimrod

The Symbolic Regression (SR) problem, where the goal is to find a regression function that does not have a pre-specified form but is any function that can be composed of a list of operators, is a hard problem in machine learning, both theoretically and computationally. Genetic programming based methods, that heuristically search over a very large space of functions, are the most commonly used methods to tackle SR problems. An alternative mathematical programming approach, proposed in the last decade, is to express the optimal symbolic expression as the solution of a system of nonlinear equations over continuous and discrete variables that minimizes a certain objective, and to solve this system via a global solver for mixed-integer nonlinear programming problems. Algorithms based on the latter approach are often very slow. We propose a hybrid algorithm that combines mixed-integer nonlinear optimization with explicit enumeration and incorporates constraints from dimensional analysis. We show that our algorithm is competitive, for some synthetic data sets, with a state-of-the-art SR software and a recent physics-inspired method called AI Feynman.

artificial intelligence, expression, machine learning, (16 more...)

2006.06813

Country:

North America > United States > Minnesota (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)