Goto

Collaborating Authors

 North America


Generalizing Bayesian Optimization with Decision-theoretic Entropies Willie Neiswanger

Neural Information Processing Systems

Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing informationtheoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use the inferred quantity in some downstream procedure. In this paper, we instead consider a generalization of Shannon entropy from work in statistical decision theory [13, 39], which contains a broad class of uncertainty measures parameterized by a problem-specific loss function corresponding to a downstream task. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures such as knowledge gradient, expected improvement, and entropy search. We then show how alternative choices for the loss yield a flexible family of acquisition functions that can be customized for use in novel optimization settings.


Invariant and Transportable Representations for Anti-Causal Domain Shifts and Victor Veitch Department of Computer Science, University of Chicago Department of Statistics, University of Chicago

Neural Information Processing Systems

Real-world classification problems must contend with domain shift, the (potential) mismatch between the domain where a model is deployed and the domain(s) where the training data was gathered. Methods to handle such problems must specify what structure is common between the domains and what varies. A natural assumption is that causal (structural) relationships are invariant in all domains. Then, it is tempting to learn a predictor for label Y that depends only on its causal parents. However, many real-world problems are "anti-causal" in the sense that Y is a cause of the covariates X--in this case, Y has no causal parents and the naive causal invariance is useless.



A The Embeddings

Neural Information Processing Systems

In this section, we briefly introduce the four kinds of emebddings consists the fusion embedding. The goal of position embedding module is to calibrate the position of each time point in the sequence so that the self-attention mechanism can recognize the relative positions between different time points in the input sequence. We design the token embedding module in order to enrich the features of each time point by fusion of other features from the adjacent time points within a certain interval. The role of spatial embedding is to locate and encode the spatial locations of different nodes, by which each node at different location possesses a unique spatial embedding. Thus, it enabling the model to identify nodes in different spatial and temporal planes after the dimensionality is compressed in the later computation.


Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Neural Information Processing Systems

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.


Explicit Regularisation in Gaussian Noise Injections

Neural Information Processing Systems

We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it penalises functions with high-frequency components in the Fourier domain; particularly in layers closer to a neural network's output. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.


Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity Andrew C. Cullen 1 Paul Montague 2 Sarah M. Erfani 1

Neural Information Processing Systems

In response to subtle adversarial examples flipping classifications of neural network models, recent research has promoted certified robustness as a solution. There, invariance of predictions to all norm-bounded attacks is achieved through randomised smoothing of network inputs. Today's state-of-the-art certifications make optimal use of the class output scores at the input instance under test: no better radius of certification (under the L


Supplementary Material A Dataset Detail

Neural Information Processing Systems

Since DSLR and Webcam do not have many examples, we conduct experiments on D to A, W to A, A to C (Caltech), D to C, and W to C shifts. The setting is the same as (11). The second benchmark dataset is OfficeHome (OH) (12), which contains four domains and 65 classes. The third dataset is VisDA (9), which contains 12 classes from the two domains, synthetic and real images. The synthetic domain consists of 152,397 synthetic 2D renderings of 3D objects and the real domain consists of 55,388 real images.


Few-shot Image Generation with Elastic Weight Consolidation Supplementary Material

Neural Information Processing Systems

In this supplementary material, we present more few-shot generation results evaluated extensively with different artistic domains where there are only a few examples available in practical. The goal is to illustrate the effectiveness of the proposed method in generating diverse high-quality results without being over-fitted to the few given examples. Figure 1 shows the generations of source and target domain by feeding the same latent code into the source and adapted model. It clearly tells that while the adaptation renders new appearance of target domain, other attributes such as the pose, glass and hairstyle, are well inherited and preserved from the source domain. For each target domain, we only use 10 examples for the adaptation and present 100 new results.


Everything Unveiled at Google I/O 2025

Mashable

See all the highlights from Google's annual 2025 Developers Conference in Mountain View, California. Check out the latest updates from Android XR to Gemini Live, and more. Topics Android Artificial Intelligence Google Google Gemini Latest Videos Everything Announced at AMD's 2025 Computex Keynote in 19 Minutes Watch highlights from AMD's Computex press conference. 1 hour ago By Mashable Video'Caught Stealing' trailer sees Zoë Kravitz and Austin Butler's cat-sitting gone awry Darren Aronofsky's swaggering new film looks like a rollicking time. Loading... Subscribe These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16 and agree to ourTerms of Use and Privacy Policy.