Goto

Collaborating Authors

 Instructional Material


Accelerated Training for Matrix-norm Regularization: A Boosting Approach

Neural Information Processing Systems

Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees \epsilon accuracy within O(1/\epsilon) iterations. Performance is further accelerated by interlacing boosting with fixed-rank local optimization---exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems.


I would love this to be like an assistant, not the teacher: a voice of the customer perspective of what distance learning students want from an Artificial Intelligence Digital Assistant

arXiv.org Artificial Intelligence

With the release of Generative AI systems such as ChatGPT, an increasing interest in using Artificial Intelligence (AI) has been observed across domains, including higher education. While emerging statistics show the popularity of using AI amongst undergraduate students, little is yet known about students' perceptions regarding AI including self-reported benefits and concerns from their actual usage, in particular in distance learning contexts. Using a two-step, mixed-methods approach, we examined the perceptions of ten online and distance learning students from diverse disciplines regarding the design of a hypothetical AI Digital Assistant (AIDA). In the first step, we captured students' perceptions via interviews, while the second step supported the triangulation of data by enabling students to share, compare, and contrast perceptions with those of peers. All participants agreed on the usefulness of such an AI tool while studying and reported benefits from using it for real-time assistance and query resolution, support for academic tasks, personalisation and accessibility, together with emotional and social support. Students' concerns related to the ethical and social implications of implementing AIDA, data privacy and data use, operational challenges, academic integrity and misuse, and the future of education. Implications for the design of AI-tailored systems are also discussed.


A Regression Mixture Model to understand the effect of the Covid-19 pandemic on Public Transport Ridership

arXiv.org Artificial Intelligence

The Covid-19 pandemic drastically changed urban mobility, both during the height of the pandemic with government lockdowns, but also in the longer term with the adoption of working-from-home policies. To understand its effects on rail public transport ridership, we propose a dedicated Regression Mixture Model able to perform both the clustering of public transport stations and the segmentation of time periods, while ignoring variations due to additional variables such as the official lockdowns or non-working days. Each cluster is thus defined by a series of segments in which the effect of the exogenous variables is constant. As each segment within a cluster has its own regression coefficients to model the impact of the covariates, we analyze how these coefficients evolve to understand the changes in the cluster. We present the regression mixture model and the parameter estimation using the EM algorithm, before demonstrating the benefits of the model on both simulated and real data. Thanks to a five-year dataset of the ridership in the Paris public transport system, we analyze the impact of the pandemic, not only in terms of the number of travelers but also on the weekly commute. We further analyze the specific changes that the pandemic caused inside each cluster.


FedKit: Enabling Cross-Platform Federated Learning for Android and iOS

arXiv.org Artificial Intelligence

We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and training. We have deployed FedKit in a real-world use case for health data analysis on university campuses, demonstrating its effectiveness. FedKit is open-source at https://github.com/FedCampus/FedKit.


Black Box: a new podcast series about AI and us – trailer

The Guardian

In rural Norway, a young woman's boyfriend forgets who she is overnight. In Detroit, a man is arrested for a crime, but he was never there. In a Spanish town, disturbing pictures of young girls have appeared, but no one knows who is behind them. In this new series from the Guardian, we'll explore what it is that connects all these stories: the collision between people and artificial intelligence.


What's coming up at #AAAI2024?

AIHub

The 38th AAAI Conference on Artificial Intelligence (AAAI2024) will take place in Vancouver, and runs from Tuesday 20 February to Tuesday 27 February. Find out about some of the main events that are taking place throughout the conference. The following distinguished invited speakers will be presenting at this year's conference. We will be holding a science communication training session on Wednesday 21 February. This will comprise a one-hour talk (starting at 2pm), and a two-hour informal drop-in session (starting at 3pm).


QuRating: Selecting High-Quality Data for Training Language Models

arXiv.org Artificial Intelligence

Selecting high-quality pre-training data is important for creating capable language models, but existing methods rely on simple heuristics. We introduce QuRating, a method for selecting pre-training data that captures the abstract qualities of texts which humans intuitively perceive. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We find that LLMs are able to discern these qualities and observe that they are better at making pairwise judgments of texts than at rating the quality of a text directly. We train a QuRater model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria. In our experiments, we select 30B tokens according to the different quality ratings and train 1.3B-parameter language models on the selected data. We find that it is important to balance quality and diversity, as selecting only the highest-rated documents leads to poor results. When we sample using quality ratings as logits over documents, our models achieve lower perplexity and stronger in-context learning performance than baselines. Beyond data selection, we use the quality ratings to construct a training curriculum which improves performance without changing the training dataset. We extensively analyze the quality ratings and discuss their characteristics, biases, and wider implications.


What to Do When Your Discrete Optimization Is the Size of a Neural Network?

arXiv.org Artificial Intelligence

Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks. Still, these discrete problems are combinatorial in nature and are also not amenable to gradient-based optimization. Additionally, classical approaches used in discrete settings do not scale well to large neural networks, forcing scientists and empiricists to rely on alternative methods. Among these, two main distinct sources of top-down information can be used to lead the model to good solutions: (1) extrapolating gradient information from points outside of the solution set (2) comparing evaluations between members of a subset of the valid solutions. We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter, while also noting that some hybrid methods combine the two. The main goal of this work is to compare both approaches. For that purpose, we first overview the two classes while also discussing some of their drawbacks analytically. Then, on the experimental section, we compare their performance, starting with smaller microworld experiments, which allow more fine-grained control of problem variables, and gradually moving towards larger problems, including neural network regression and neural network pruning for image classification, where we additionally compare against magnitude-based pruning.


Ising on the Graph: Task-specific Graph Subsampling via the Ising Model

arXiv.org Artificial Intelligence

Reducing a graph while preserving its overall structure is an important problem with many applications. Typically, the reduction approaches either remove edges (sparsification) or merge nodes (coarsening) in an unsupervised way with no specific downstream task in mind. In this paper, we present an approach for subsampling graph structures using an Ising model defined on either the nodes or edges and learning the external magnetic field of the Ising model using a graph neural network. Our approach is task-specific as it can learn how to reduce a graph for a specific downstream task in an end-to-end fashion. The utilized loss function of the task does not even have to be differentiable.


MCMC-driven learning

arXiv.org Artificial Intelligence

This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning$\unicode{x2014}$which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset construction for MCMC with big data, Markov chain gradient descent, Markovian score climbing, and more$\unicode{x2014}$within one common framework. By doing so, the theory and methods developed for each may be translated and generalized.