AITopics

Many existing fairness criteria for machine learning involve equalizing or achieving some metric across \textit{protected groups} such as race or gender groups. However, practitioners trying to audit or enforce such group-based criteria can easily face the problem of noisy or biased protected group information. We study this important practical problem in two ways. First, we study the consequences of na{\"i}vely only relying on noisy protected groups: we provide an upper bound on the fairness violations on the true groups $G$ when the fairness criteria are satisfied on noisy groups $\hat{G}$. Second, we introduce two new approaches using robust optimization that, unlike the na{\"i}ve approach of only relying on $\hat{G}$, are guaranteed to satisfy fairness criteria on the true protected groups $G$ while minimizing a training objective. We provide theoretical guarantees that one such approach converges to an optimal feasible solution. Using two case studies, we empirically show that the robust approaches achieve better true group fairness guarantees than the na{\"i}ve approach.

constraint, fairness criteria, violation, (13 more...)

2002.09343

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Gerace, Federica, Loureiro, Bruno, Krzakala, Florent, Mézard, Marc, Zdeborová, Lenka

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.

arxiv preprint arxiv, generalisation error, neural network, (13 more...)

2002.09339

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Brazil (0.04)
North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Kersting, Hans, Krämer, Nicholas, Schiegg, Martin, Daniel, Christian, Tiemann, Michael, Hennig, Philipp

Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems

Likelihood-free (a.k.a. simulation-based) inference problems are inverse problems with expensive, or intractable, forward models. ODE inverse problems are commonly treated as likelihood-free, as their forward map has to be numerically approximated by an ODE solver. This, however, is not a fundamental constraint but just a lack of functionality in classic ODE solvers, which do not return a likelihood but a point estimate. To address this shortcoming, we employ Gaussian ODE filtering (a probabilistic numerical method for ODEs) to construct a local Gaussian approximation to the likelihood. This approximation yields tractable estimators for the gradient and Hessian of the (log-)likelihood. Insertion of these estimators into existing gradient-based optimization and sampling methods engenders new solvers for ODE inverse problems. We demonstrate that these methods outperform standard likelihood-free approaches on three benchmark-systems.

estimator, likelihood, likelihood-free, (13 more...)

2002.09301

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > Panama (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Mathematics of Computing (0.88)

Transformer Hawkes Process

Zuo, Simiao, Jiang, Haoming, Li, Zichong, Zhao, Tuo, Zha, Hongyuan

Event sequence data are naturally observed in our daily life. Through social media such as Twitter and Facebook, we share our experiences and respond to other users information (Yang et al., 2011). In these websites, each user has a sequence of events such as tweets and interactions. Hundreds of millions of users generate large amounts of tweets, which are essentially sequences of events at different time stamps. Besides social media, event data also exist in domains like financial transactions (Bacry et al., 2015) and personalized healthcare (Wang et al., 2018). For example, in electronic medical records, tests and diagnoses of each patient can be treated as a sequence of events. Unlike other sequential data such as time series, event sequences tend to be asynchronous (Ross et al., 1996), which means time intervals between events are just as important as the order of them to describe their dynamics. Also, depending on specific application requirements, event data show sophisticated dependencies on their history. Point process is a powerful tool for modeling sequences of discrete events in continuous time, and the technique has been widely applied.

dependency, point process, sequence, (15 more...)

2002.09291

Country:

Asia > China (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nguyen, Binh T., Chevalier, Jérôme-Alexis, Thirion, Bertrand, Arlot, Sylvain

Aggregation of Multiple Knockoffs

We develop an extension of the Knockoff Inference procedure, introduced by Barber and Candes (2015). This new method, called Aggregation of Multiple Knockoffs (AKO), addresses the instability inherent to the random nature of Knockoff-based inference. Specifically, AKO improves both the stability and power compared with the original Knockoff algorithm while still maintaining guarantees for False Discovery Rate control. We provide a new inference procedure, prove its core properties, and demonstrate its benefits in a set of experiments on synthetic and real datasets.

aggregation, knockoff, procedure, (15 more...)

2002.09269

Country:

Europe > France (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.85)

Industry:

Health & Medicine > Health Care Technology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

NeuroQuery: comprehensive meta-analysis of human brain mapping

Dockès, Jérôme, Poldrack, Russell, Primet, Romain, Gözükan, Hande, Yarkoni, Tal, Suchanek, Fabian, Thirion, Bertrand, Varoquaux, Gaël

Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms. The variety of human neuroscience concepts and terminology poses a fundamental challenge to relating brain imaging results across the scientific literature. Existing meta-analysis methods perform statistical tests on sets of publications associated with a particular concept. Thus, large-scale meta-analyses only tackle single terms that occur frequently. We propose a new paradigm, focusing on prediction rather than inference. Our multivariate model predicts the spatial distribution of neurological observations, given text describing an experiment, cognitive process, or disease. This approach handles text of arbitrary length and terms that are too rare for standard meta-analysis. We capture the relationships and neural correlates of 7 547 neuroscience terms across 13 459 neuroimaging publications. The resulting meta-analytic tool, neuroquery.org, can ground hypothesis generation and data-analysis priors on a comprehensive view of published findings on the brain.

neuroquery, neurosynth, prediction, (16 more...)

2002.09261

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.64)

Beregi-Kovács, Marcell, Baran, Ágnes, Hajdu, András

Efficient Learning of Model Weights via Changing Features During Training

In this paper, we propose a machine learning model, which dynamically changes the features during training. Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool. The main benefit is coming from the fact that opposite to the common practice we do not start training a new model from the scratch, but can keep the already learned weights. This procedure allows the scan of a large feature pool which together with keeping the complexity of the model leads to an increase of the model accuracy within the same training time. The efficiency of our approach is demonstrated in several classic machine learning scenarios including linear regression and neural network-based training. As a specific analysis towards signal processing, we have successfully tested our approach on the database MNIST for digit classification considering single pixel and pixel-pairs intensities as possible features.

algorithm, classification, selection, (16 more...)

2002.09249

Country: Europe > Hungary > Hajdú-Bihar County > Debrecen (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)

Huesmann, Karim, Klemm, Soeren, Linsen, Lars, Risse, Benjamin

Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization

Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. Here, we demonstrate that neural network activation sparsity is a reliable indicator for overfitting which we utilize to propose novel targeted sparsity visualization and regularization strategies. Based on these strategies we are able to understand and counteract overfitting caused by activation sparsity and filter correlation in a targeted layer-by-layer manner. Our results demonstrate that targeted sparsity regularization can efficiently be used to regularize well-known datasets and architectures with a significant increase in image classification performance while outperforming both dropout and batch normalization. Ultimately, our study reveals novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that targeted sparsity regularization enables salient and discriminative feature learning while exploiting the full capacity of deep models without suffering from overfitting, even when trained excessively.

entropy, regularization, sparsity, (14 more...)

2002.09237

Country: Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Muandet, Krikamol, Jitkrittum, Wittawat, Kübler, Jonas

Kernel Conditional Moment Test via Maximum Moment Restriction

We propose a new family of specification tests called kernel conditional moment (KCM) tests. Our tests are built on conditional moment embeddings (CMME)---a novel representation of conditional moment restrictions in a reproducing kernel Hilbert space (RKHS). After transforming the conditional moment restrictions into a continuum of unconditional counterparts, the test statistic is defined as the maximum moment restriction within the unit ball of the RKHS. We show that the CMME fully characterizes the original conditional moment restrictions, leading to consistency in both hypothesis testing and parameter estimation. The proposed test also has an analytic expression that is easy to compute as well as closed-form asymptotic distributions. Our empirical studies show that the KCM test has a promising finite-sample performance compared to existing tests.

estimation, restriction, theorem 3, (15 more...)

2002.09225

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Franceschi, Jean-Yves, Delasalles, Edouard, Chen, Mickaël, Lamprier, Sylvain, Gallinari, Patrick

Stochastic Latent Residual Video Prediction

Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.

dataset, international conference, prediction, (14 more...)

2002.09219

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)