AITopics | Osborne, Michael

Collaborating Authors

Osborne, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Quadrature for Neural Ensemble Search

Hamid, Saad, Wan, Xingchen, Jørgensen, Martin, Ru, Binxin, Osborne, Michael

arXiv.org Artificial IntelligenceMar-17-2023

Ensembling can improve the performance of Neural Networks, but existing approaches struggle when the architecture likelihood surface has dispersed, narrow peaks. Furthermore, existing methods construct equally weighted ensembles, and this is likely to be vulnerable to the failure modes of the weaker architectures. By viewing ensembling as approximately marginalising over architectures we construct ensembles using the tools of Bayesian Quadrature -- tools which are well suited to the exploration of likelihood surfaces with dispersed, narrow peaks. Additionally, the resulting ensembles consist of architectures weighted commensurate with their performance. We show empirically -- in terms of test likelihood, accuracy, and expected calibration error -- that our method outperforms state-of-the-art baselines, and verify via ablation studies that its components do so independently.

architecture, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.08874

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel

Ru, Binxin, Wan, Xingchen, Dong, Xiaowen, Osborne, Michael

arXiv.org Machine LearningJun-13-2020

Bayesian optimisation (BO) has been widely used for hyperparameter optimisation but its application in neural architecture search (NAS) is limited due to the non-continuous, high-dimensional and graph-like search spaces. Current approaches either rely on encoding schemes, which are not scalable to large architectures and ignore the implicit topological structure of architectures, or use graph neural networks, which require additional hyperparameter tuning and a large amount of observed data, which is particularly expensive to obtain in NAS. We propose a neat BO approach for NAS, which combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate to capture the topological structure of architectures, without having to explicitly define a Gaussian process over high-dimensional vector spaces. We also harness the interpretable features learnt via the graph kernel to guide the generation of new architectures. We demonstrate empirically that our surrogate model is scalable to large architectures and highly data-efficient; competing methods require 3 to 20 times more observations to achieve equally good prediction performance as ours. We finally show that our method outperforms existing NAS approaches to achieve state-of-the-art results on NAS datasets.

architecture, artificial intelligence, neural network, (18 more...)

arXiv.org Machine Learning

2006.07556

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

A Maximum Entropy approach to Massive Graph Spectra

Granziol, Diego, Ru, Robin, Zohren, Stefan, Dong, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine LearningDec-19-2019

Machine Learning Research Group and Oxford-Man Institute for Quantitative Finance, Department of Engineering Science, University of Oxford Abstract Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing. The choice of kernel function and bandwidth are typically chosen in an ad-hoc manner and heavily affect the resulting output. We prove that kernel smoothing biases the moments of the spectral density. We propose an information theoretically optimal approach to learn a smooth graph spectral density, which fully respects the moment information. Our method's computational cost is linear in the number of edges, and hence can be applied to large networks, with millions of nodes. We apply our method to the problems to graph similarity and cluster number learning, where we outperform comparable iterative spectral approaches on synthetic and real graphs. Keywords: Networks, Information Theory, Maximum Entropy, Graph Spectral Theory, Random matrix theory, iterative methods, kernel smoothing 1. Introduction: networks, their graph spectra and importance Many systems of interest can be naturally characterised by complex networks; examples include social networks (Mislove et al., 2007b; Flake et al., 2000; Leskovec et al., 2007), biological networks (Palla et al., 2005) and technological networks.

artificial intelligence, machine learning, spectral density, (17 more...)

arXiv.org Machine Learning

1912.09068

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.34)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)

Add feedback

Radial Bayesian Neural Networks: Robust Variational Inference In Big Models

Farquhar, Sebastian, Osborne, Michael, Gal, Yarin

arXiv.org Machine LearningJul-1-2019

We propose Radial Bayesian Neural Networks: a variational distribution for mean field variational inference (MFVI) in Bayesian neural networks that is simple to implement, scalable to large models, and robust to hyperparameter selection. We hypothesize that standard MFVI fails in large models because of a property of the high-dimensional Gaussians used as posteriors. As variances grow, samples come almost entirely from a `soap-bubble' far from the mean. We show that the ad-hoc tweaks used previously in the literature to get MFVI to work served to stop such variances growing. Designing a new posterior distribution, we avoid this pathology in a theoretically principled way. Our distribution improves accuracy and uncertainty over standard MFVI, while scaling to large data where most other VI and MCMC methods struggle. We benchmark Radial BNNs in a real-world task of diabetic retinopathy diagnosis from fundus images, a task with ~100x larger input dimensionality and model size compared to previous demonstrations of MFVI.

deep learning, neural network, posterior, (20 more...)

arXiv.org Machine Learning

1907.00865

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area (0.56)
Health & Medicine > Diagnostic Medicine (0.34)

Add feedback

MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Granziol, Diego, Ru, Binxin, Zohren, Stefan, Doing, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine LearningJun-3-2019

Making high quality inference on large, feature rich datasets under a constrained computational budget is arguably the primary goal of the learning community. This, however, comes with significant challenges. On the one hand, the exact computation of linear algebraic quantities may be prohibitively expensive, such as that of the log determinant. On the other hand, an analytic expression for the quantity of interest may not exist at all, such as the case for the entropy of a Gaussian mixture model, and approximate methods are often both inefficient and inaccurate.

algorithm, artificial intelligence, bayesian inference, (13 more...)

arXiv.org Machine Learning

doi: 10.3390/e21060551

1906.01101

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.44)

Add feedback

On the Limitations of Representing Functions on Sets

Wagstaff, Edward, Fuchs, Fabian B., Engelcke, Martin, Posner, Ingmar, Osborne, Michael

arXiv.org Machine LearningJan-25-2019

Recent work on the representation of functions on sets has considered the use of summation in a latent space to enforce permutation invariance. In particular, it has been conjectured that the dimension of this latent space may remain fixed as the cardinality of the sets under consideration increases. However, we demonstrate that the analysis leading to this conjecture requires mappings which are highly discontinuous and argue that this is only of limited practical use. Motivated by this observation, we prove that an implementation of this model via continuous mappings (as provided by e.g. neural networks or Gaussian processes) actually imposes a constraint on the dimensionality of the latent space. Practical universal function representation for set inputs can only be achieved with a latent dimension at least the size of the maximum number of input elements.

continuity, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1901.09006

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Batch Selection for Parallelisation of Bayesian Quadrature

Wagstaff, Ed, Hamid, Saad, Osborne, Michael

arXiv.org Machine LearningDec-4-2018

Integration over non-negative integrands is a central problem in machine learning (e.g. for model averaging, (hyper-)parameter marginalisation, and computing posterior predictive distributions). Bayesian Quadrature is a probabilistic numerical integration technique that performs promisingly when compared to traditional Markov Chain Monte Carlo methods. However, in contrast to easily-parallelised MCMC methods, Bayesian Quadrature methods have, thus far, been essentially serial in nature, selecting a single point to sample at each step of the algorithm. We deliver methods to select batches of points at each step, based upon those recently presented in the Batch Bayesian Optimisation literature. Such parallelisation significantly reduces computation time, especially when the integrand is expensive to sample.

acquisition function, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1812.01553

Country: North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Intersectionality: Multiple Group Fairness in Expectation Constraints

Fitzsimons, Jack, Osborne, Michael, Roberts, Stephen

arXiv.org Artificial IntelligenceNov-25-2018

Group fairness is an important concern for machine learning researchers, developers, and regulators. However, the strictness to which models must be constrained to be considered fair is still under debate. The focus of this work is on constraining the expected outcome of subpopulations in kernel regression and, in particular, decision tree regression, with application to random forests, boosted trees and other ensemble models. While individual constraints were previously addressed, this work addresses concerns about incorporating multiple constraints simultaneously. The proposed solution does not affect the order of computational or memory complexity of the decision trees and is easily integrated into models post training.

artificial intelligence, constraint, decision tree learning, (18 more...)

arXiv.org Artificial Intelligence

1811.0996

Country: North America > United States > Illinois (0.15)

Genre: Research Report (0.40)

Industry:

Law (0.48)
Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Equality Constrained Decision Trees: For the Algorithmic Enforcement of Group Fairness

Fitzsimons, Jack, Ali, AbdulRahman Al, Osborne, Michael, Roberts, Stephen

arXiv.org Artificial IntelligenceOct-10-2018

Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods. More specifically, we focus on examining the incorporation of these constraints in decision tree regression when cast as a form of kernel regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that order of complexity of memory and computation is preserved for such models and bounds the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use.

artificial intelligence, constraint, decision tree learning, (17 more...)

arXiv.org Artificial Intelligence

1810.05041

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Entropic Spectral Learning in Large Scale Networks

Granziol, Diego, Ru, Binxin, Zohren, Stefan, Dong, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine LearningApr-18-2018

We present a novel algorithm for learning the spectral density of large scale networks using stochastic trace estimation and the method of maximum entropy. The complexity of the algorithm is linear in the number of non-zero elements of the matrix, offering a computational advantage over other algorithms. We apply our algorithm to the problem of community detection in large networks. We show state-of-the-art performance on both synthetic and real datasets.

algorithm, artificial intelligence, social media, (18 more...)

arXiv.org Machine Learning

1804.06802

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media (0.96)

Add feedback