AITopics | Jordon, James

Collaborating Authors

Jordon, James

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data

Houssiau, Florimond, Jordon, James, Cohen, Samuel N., Daniel, Owen, Elliott, Andrew, Geddes, James, Mole, Callum, Rangel-Smith, Camila, Szpruch, Lukasz

arXiv.org Artificial IntelligenceNov-11-2022

Personal data collected at scale promises to improve decision-making and accelerate innovation. However, sharing and using such data raises serious privacy concerns. A promising solution is to produce synthetic data, artificial records to share instead of real data. Since synthetic records are not linked to real persons, this intuitively prevents classical re-identification attacks. However, this is insufficient to protect privacy. We here present TAPAS, a toolbox of attacks to evaluate synthetic data privacy under a wide range of scenarios. These attacks include generalizations of prior works and novel attacks. We also introduce a general framework for reasoning about privacy threats to synthetic data and showcase TAPAS on several examples.

artificial intelligence, dataset, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.0655

Country: Europe > United Kingdom (0.47)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Add feedback

To Impute or not to Impute? -- Missing Data in Treatment Effect Estimation

Berrevoets, Jeroen, Imrie, Fergus, Kyono, Trent, Jordon, James, van der Schaar, Mihaela

arXiv.org Machine LearningFeb-4-2022

Missing data is a systemic problem in practical scenarios that causes noise and bias when estimating treatment effects. This makes treatment effect estimation from data with missingness a particularly tricky endeavour. A key reason for this is that standard assumptions on missingness are rendered insufficient due to the presence of an additional variable, treatment, besides the individual and the outcome. Having a treatment variable introduces additional complexity with respect to why some variables are missing that is not fully explored by previous work. In our work we identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection. Given MCM, we show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates. However, no imputation at all also leads to biased estimates, as missingness determined by treatment divides the population in distinct subpopulations, where estimates across these populations will be biased. Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not. We empirically demonstrate how various learners benefit from selective imputation compared to other solutions for missing data.

data quality, machine learning, missingness, (21 more...)

arXiv.org Machine Learning

2202.02096

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hide-and-Seek Privacy Challenge

Jordon, James, Jarrett, Daniel, Yoon, Jinsung, Barnes, Tavian, Elbers, Paul, Thoral, Patrick, Ercole, Ari, Zhang, Cheng, Belgrave, Danielle, van der Schaar, Mihaela

arXiv.org Machine LearningJul-24-2020

The clinical time-series setting poses a unique combination of challenges to data modeling and sharing. Due to the high dimensionality of clinical time series, adequate de-identification to preserve privacy while retaining data utility is difficult to achieve using common de-identification techniques. An innovative approach to this problem is synthetic data generation. From a technical perspective, a good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between high-dimensional variables across time. From the privacy perspective, the model should prevent patient re-identification by limiting vulnerability to membership inference attacks. The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance generative techniques for dense and high-dimensional temporal data streams that are (1) clinically meaningful in terms of fidelity and predictivity, as well as (2) capable of minimizing membership privacy risks in terms of the concrete notion of patient re-identification.

algorithm, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2007.12087

Country:

North America (0.68)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ASAC: Active Sensing using Actor-Critic models

Yoon, Jinsung, Jordon, James, van der Schaar, Mihaela

arXiv.org Machine LearningJun-16-2019

Deciding what and when to observe is critical when making observations is costly. In a medical setting where observations can be made sequentially, making these observations (or not) should be an active choice. We refer to this as the active sensing problem. In this paper, we propose a novel deep learning framework, which we call ASAC (Active Sensing using Actor-Critic models) to address this problem. ASAC consists of two networks: a selector network and a predictor network. The selector network uses previously selected observations to determine what should be observed in the future. The predictor network uses the observations selected by the selector network to predict a label, providing feedback to the selector network (well-selected variables should be predictive of the label). The goal of the selector network is then to select variables that balance the cost of observing the selected variables with their predictive power; we wish to preserve the conditional label distribution. During training, we use the actor-critic models to allow the loss of the selector to be "back-propagated" through the sampling process. The selector network "acts" by selecting future observations to make. The predictor network acts as a "critic" by feeding predictive errors for the selected variables back to the selector network. In our experiments, we show that ASAC significantly outperforms state-of-the-arts in two real-world medical datasets.

asac, deep learning, neural network, (24 more...)

arXiv.org Machine Learning

1906.06796

Country:

North America > United States (0.68)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lifelong Bayesian Optimization

Zhang, Yao, Jordon, James, Alaa, Ahmed M., van der Schaar, Mihaela

arXiv.org Machine LearningMay-29-2019

Automatic Machine Learning (Auto-ML) systems tackle the problem of automating the design of prediction models or pipelines for data science. In this paper, we present Lifelong Bayesian Optimization (LBO), an online, multitask Bayesian optimization (BO) algorithm designed to solve the problem of model selection for datasets arriving and evolving over time. To be suitable for Lifelong Bayesian Optimization, an algorithm needs to scale with the ever-increasing size of the dataset, and should be able to leverage past optimizations in learning the current best model. We cast the problem of model selection as a black-box function optimization problem. In LBO, we exploit the correlation between functions by using components of previously learned functions to speed up the learning process for newly arriving datasets. Experiments on real and synthetic data show that LBO outperforms standard BO algorithms applied repeatedly on the data.

black-box function, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

1905.1228

Country:

North America > United States (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Measuring the quality of Synthetic data for use in competitions

Jordon, James, Yoon, Jinsung, van der Schaar, Mihaela

arXiv.org Machine LearningJun-29-2018

Machine learning has the potential to assist many communities in using the large datasets that are becoming more and more available. Unfortunately, much of that potential is not being realized because it would require sharing data in a way that compromises privacy. In order to overcome this hurdle, several methods have been proposed that generate synthetic data while preserving the privacy of the real data. In this paper we consider a key characteristic that synthetic data should have in order to be useful for machine learning researchers - the relative performance of two algorithms (trained and tested) on the synthetic dataset should be the same as their relative performance (when trained and tested) on the original dataset.

algorithm, artificial intelligence, health & medicine, (16 more...)

arXiv.org Machine Learning

1806.11345

Country:

Europe > United Kingdom > England (0.16)
North America > United States > California (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

GAIN: Missing Data Imputation using Generative Adversarial Nets

Yoon, Jinsung, Jordon, James, van der Schaar, Mihaela

arXiv.org Machine LearningJun-7-2018

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

dataset, health & medicine, neural network, (21 more...)

arXiv.org Machine Learning

1806.0292

Country:

Europe (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks

Yoon, Jinsung, Jordon, James, van der Schaar, Mihaela

arXiv.org Machine LearningFeb-18-2018

Training complex machine learning models for prediction often requires a large amount of data that is not always readily available. Leveraging these external datasets from related but different sources is therefore an important task if good predictive models are to be built for deployment in settings where data can be rare. In this paper we propose a novel approach to the problem in which we use multiple GAN architectures to learn to translate from one dataset to another, thereby allowing us to effectively enlarge the target dataset, and therefore learn better predictive models than if we simply used the target dataset. We show the utility of such an approach, demonstrating that our method improves the prediction performance on the target domain over using just the target dataset and also show that our framework outperforms several other benchmarks on a collection of real-world medical datasets.

dataset, health & medicine, neural network, (21 more...)

arXiv.org Machine Learning

1802.06403

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.97)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Deep-Treat: Learning Optimal Personalized Treatments From Observational Data Using Neural Networks

Atan, Onur (University of California Los Angeles) | Jordon, James (University of Oxford) | Schaar, Mihaela van der (University of Oxford)

AAAI ConferencesFeb-8-2018

We propose a novel approach for constructing effective treatment policies when the observed data is biased and lacks counterfactual information. Learning in settings where the observed data does not contain all possible outcomes for all treatments is difficult since the observed data is typically biased due to existing clinical guidelines. This is an important problem in the medical domain as collecting unbiased data is expensive and so learning from the wealth of existing biased data is a worthwhile task. Our approach separates the problem into two stages: first we reduce the bias by learning a representation map using a novel auto-encoder network---this allows us to control the trade-off between the bias-reduction and the information loss---and then we construct effective treatment policies on the transformed data using a novel feedforward network. Separation of the problem into these two stages creates an algorithm that can be adapted to the problem at hand---the bias-reduction step can be performed as a preprocessing step for other algorithms. We compare our algorithm against state-of-art algorithms on two semi-synthetic datasets and demonstrate that our algorithm achieves a significant improvement in performance.

algorithm, neural network, oncology, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback