Goto

Collaborating Authors

 bao


Federated Learning on Stochastic Neural Networks

Tang, Jingqiao, Bausback, Ryan, Bao, Feng, Archibald, Richard

arXiv.org Artificial Intelligence

Original Manuscript Submitted: 05/05/2025; Final Draft Received: mm/dd/yyyy Federated learning is a machine learning paradigm that leverages edge computing on client devices to optimize models while maintaining user privacy by ensuring that local data remains on the device. However, since all data is collected by clients, federated learning is susceptible to latent noise in local datasets. Factors such as limited measurement capabilities or human errors may introduce inaccuracies in client data. T o address this challenge, we propose the use of a stochastic neural network as the local model within the federated learning framework. Stochastic neural networks not only facilitate the estimation of the true underlying states of the data but also enable the quantification of latent noise. We refer to our federated learning approach, which incorporates stochastic neural networks as local models, as Federated stochastic neural networks. We will present numerical experiments demonstrating the performance and effectiveness of our method, particularly in handling non-independent and identically distributed data. KEY WORDS: Machine Learning, Federated Learning, Neural Network 1. INTRODUCTION The fundamental principles of federated learning can be traced back to earlier advancements in distributed computing and privacy-preserving machine learning techniques. Before federated learning was introduced in McMahan et al. (2016), distributed machine learning primarily focused on executing training processes in parallel across multiple nodes within a data center. Notable frameworks, such as MapReduce (Dean and Ghemawat (2004)) and AllReduce, were designed to aggregate data from different computational units, perform global aggregation using predefined operators, and subsequently redistribute the outcomes to all participating units.


Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD

Knolle, Moritz, Dorfman, Robert, Ziller, Alexander, Rueckert, Daniel, Kaissis, Georgios

arXiv.org Artificial Intelligence

Differentially private SGD (DP-SGD) holds the promise of enabling the safe and responsible application of machine learning to sensitive datasets. However, DP-SGD only provides a biased, noisy estimate of a mini-batch gradient. This renders optimisation steps less effective and limits model utility as a result. With this work, we show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. Here, we propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias. We show how to efficiently compute quantities needed for BAM to scale to large neural networks and highlight similarities to closely related methods such as Sharpness-Aware Minimisation. Finally, we provide empirical evidence that BAM not only reduces bias but also substantially improves privacy-utility trade-offs on the CIFAR-10, CIFAR-100, and ImageNet-32 datasets.


Electronic Second Skins Are the Wearables of the Future

WIRED

The skin is the largest organ in our body, and also the most complex. Peer at it under a microscope and you'll see thousands of nerve endings that keep the brain connected to the outside world and allow us to feel touch, pressure, and pain. But when Zhenan Bao looks at it, she sees something else. For Bao, a chemical engineer focused on making polymers, the skin is not only a sensory organ, but also a material. One that, in her words, is flexible, but also stretchable, self-healing, and biodegradable.


PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Dong, Xiaoyi, Bao, Jianmin, Zhang, Ting, Chen, Dongdong, Zhang, Weiming, Yuan, Lu, Chen, Dong, Wen, Fang, Yu, Nenghai, Guo, Baining

arXiv.org Artificial Intelligence

This paper explores a better prediction target for BERT pre-training of vision transformers. We observe that current prediction targets disagree with human perception judgment.This contradiction motivates us to learn a perceptual prediction target. We argue that perceptually similar images should stay close to each other in the prediction target space. We surprisingly find one simple yet effective idea: enforcing perceptual similarity during the dVAE training. Moreover, we adopt a self-supervised transformer model for deep feature extraction and show that it works well for calculating perceptual similarity.We demonstrate that such learned visual tokens indeed exhibit better semantic meanings, and help pre-training achieve superior transfer performance in various downstream tasks. For example, we achieve $\textbf{84.5\%}$ Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming the competitive method BEiT by $\textbf{+1.3\%}$ under the same pre-training epochs. Our approach also gets significant improvement on object detection and segmentation on COCO and semantic segmentation on ADE20K. Equipped with a larger backbone ViT-H, we achieve the state-of-the-art ImageNet accuracy (\textbf{88.3\%}) among methods using only ImageNet-1K data.


Teaching robots to touch

#artificialintelligence

Fork in hand, a robot arm skewers a strawberry from above and delivers it to Tyler Schrenk's mouth. Sitting in his wheelchair, Schrenk nudges his neck forward to take a bite. Next, the arm goes for a slice of banana, then a carrot. Each motion it performs by itself, on Schrenk's spoken command. For Schrenk, who became paralysed from the neck down after a diving accident in 2012, such a device would make a huge difference in his daily life if it were in his home. "Getting used to someone else feeding me was one of the strangest things I had to transition to," he says.


Bao

AAAI Conferences

Combining Answer Set Programming (ASP) and Constraint Logic Programming (CLP) can create a more powerful language for knowledge representation and reasoning. The language AC(C) is designed to integrate ASP and CLP. Compared with existing integration of ASP and CSP, AC(C) allows representing user-defined constraints. Such integration provides great power for applications requiring logical reasoning involving constraints, e.g., temporal planning. In AC(C), user-defined and primitive constraints can be solved by a CLP inference engine while the logical reasoning over those constraints and regular logic literals is solved by an ASP inference engine (i.e., solver). My PhD work includes improving the language AC(C), implementing its faster inference engine and investigating how effective the new system can be used to solve a challenging application, temporal planning.


Wooden robot arm is powered by plastic muscles

New Scientist

A polymer that changes shape when heated can lift objects 5000 times its own weight, with potential applications in robotics. Shape-memory polymers flip between their normal state, where molecules are flexible and disordered, and their deformed state, where the molecules bind after being stretched. Once in the stretched, deformed state, the polymer can be unstretched – resuming its "normal" state – by applying heat or light. However, traditional shape-memory polymers don't store significant amounts of energy while being stretched – meaning they don't release much energy while unstretching, which limits their use in tasks that involve lifting or moving objects. Zhenan Bao at Stanford University in California and her colleagues have now produced a shape-memory polymer that does store and release appreciable amounts of energy.


Nested sampling with any prior you like

Alsing, Justin, Handley, Will

arXiv.org Machine Learning

Nested sampling is an important tool for conducting Bayesian analysis in Astronomy and other fields, both for sampling complicated posterior distributions for parameter inference, and for computing marginal likelihoods for model comparison. One technical obstacle to using nested sampling in practice is the requirement (for most common implementations) that prior distributions be provided in the form of transformations from the unit hyper-cube to the target prior density. For many applications - particularly when using the posterior from one experiment as the prior for another - such a transformation is not readily available. In this letter we show that parametric bijectors trained on samples from a desired prior density provide a general-purpose method for constructing transformations from the uniform base density to a target prior, enabling the practical use of nested sampling under arbitrary priors. We demonstrate the use of trained bijectors in conjunction with nested sampling on a number of examples from cosmology.


Data systems that learn to be better

Robohub

Big data has gotten really, really big: By 2025, all the world's data will add up to an estimated 175 trillion gigabytes. For a visual, if you stored that amount of data on DVDs, it would stack up tall enough to circle the Earth 222 times. One of the biggest challenges in computing is handling this onslaught of information while still being able to efficiently store and process it. A team from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that the answer rests with something called "instance-optimized systems." Traditional storage and database systems are designed to work for a wide range of applications because of how long it can take to build them -- months or, often, several years.


Data systems that learn to be better

#artificialintelligence

Big data has gotten really, really big: By 2025, all the world's data will add up to an estimated 175 trillion gigabytes . For a visual, if you stored that amount of data on DVDs, it would stack up tall enough to circle the Earth 222 times. One of the biggest challenges in computing is handling this onslaught of information while still being able to efficiently store and process it. A team from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that the answer rests with something called "instance-optimized systems." Traditional storage and database systems are designed to work for a wide range of applications because of how long it can take to build them -- months or, often, several years.