optimizing
b19aa25ff58940d974234b48391b9549-Supplemental.pdf
All strings generated by the CFG can be broken down into a (non-unique) tree of production ruleswiththenon-terminal startingsymbolS atitshead. Although each individual production rule is a simplereplacement operation, thecombination ofmanysuchrulescanspecific astringspacewith complex syntactical constraints. However,whensampling strings from the grammar, we found this simple sampling strategy to produce long and repetitive strings. In fact, these tasks are considerably more challenging than the common benchmarks used to test standard BO frameworks. We triedSEkernels withbothindividual andtiedlength scales across latentdimensions, however,this did not have a significant effect on performance, possibly due to difficulties in estimating many kernel parameters inthese low-data BO problems. This ranking matches the relative performance of the BO routines based on these surrogate models (Figure 7). Figure 7.d visualizes the intrinsic representation of an SSK when kernel parameters are purposely chosen to provide a bad fit.
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods.
When Does Optimizing a Proper Loss Yield Calibration?
Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in [Kakade and Foster, 2008, Błasiok et al., 2023]. Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal.
Optimizing over trained GNNs via symmetry breaking
Optimization over trained machine learning models has applications including: verification, minimizing neural acquisition functions, and integrating a trained surrogate into a larger decision-making problem. This paper formulates and solves optimization problems constrained by trained graph neural networks (GNNs). To circumvent the symmetry issue caused by graph isomorphism, we propose two types of symmetry-breaking constraints: one indexing a node 0 and one indexing the remaining nodes by lexicographically ordering their neighbor sets. To guarantee that adding these constraints will not remove all symmetric solutions, we construct a graph indexing algorithm and prove that the resulting graph indexing satisfies the proposed symmetry-breaking constraints. For the classical GNN architectures considered in this paper, optimizing over a GNN with a fixed graph is equivalent to optimizing over a dense neural network. Thus, we study the case where the input graph is not fixed, implying that each edge is a decision variable, and develop two mixed-integer optimization formulations. To test our symmetry-breaking strategies and optimization formulations, we consider an application in molecular design.
Is Sequence Information All You Need for Bayesian Optimization of Antibodies?
Ober, Sebastian W., McCarter, Calvin, Raghu, Aniruddh, Li, Yucen Lily, Amin, Alan N., Wilson, Andrew Gordon, Elliott, Hunter
Bayesian optimization is a natural candidate for the engineering of antibody therapeutic properties, which is often iterative and expensive. However, finding the optimal choice of surrogate model for optimization over the highly structured antibody space is difficult, and may differ depending on the property being optimized. Moreover, to the best of our knowledge, no prior works have attempted to incorporate structural information into antibody Bayesian optimization. In this work, we explore different approaches to incorporating structural information into Bayesian optimization, and compare them to a variety of sequence-only approaches on two different antibody properties, binding affinity and stability. In addition, we propose the use of a protein language model-based ``soft constraint,'' which helps guide the optimization to promising regions of the space. We find that certain types of structural information improve data efficiency in early optimization rounds for stability, but have equivalent peak performance. Moreover, when incorporating the protein language model soft constraint we find that the data efficiency gap is diminished for affinity and eliminated for stability, resulting in sequence-only methods that match the performance of structure-based methods, raising questions about the necessity of structure in Bayesian optimization for antibodies.
A Dynamic Programs For SSK Evaluations and Gradients We now detail recursive calculation strategies for calculating k n (a, b) and its gradients with O (nl
A recursive strategy is able to efficiently calculate the contributions of particular substring, pre-calculating contributions of the smaller sub-strings contained within the target string. Context-free grammars (CFG) are 4-tuples G = ( V, Σ,R,S), consisting of: a set of non-terminal symbols V, a set of terminal symbols Σ (also known as an alphabet), a set of production rules R, a non-terminal starting symbol S from which all strings are generated. The CFG for the symbolic regression task of Section 5.3 is given by the following rules: S S '+' T S S ' ' T S S '/' T S T T '(' S ')' T ' sin (' S ')' T'exp (' S ')' T'x' T '1' T '2' T '3', We now provide implementation details for our GA acquisition function optimizers. The GA begins with a randomly sampled population and ends once the best string in the population stops improving between iterations (Algorithm 1). Although seemingly simple tasks, our synthetic string optimization tasks of Section 5.1 are deceptively We now provide comprehensive experimental results across the synthetic string optimization tasks.
Improving Model Classification by Optimizing the Training Dataset
Tukan, Morad, Mualem, Loay, Netzer, Eitan, Sigalat, Liran
In the era of data-centric AI, the ability to curate high-quality training data is as crucial as model design. Coresets offer a principled approach to data reduction, enabling efficient learning on large datasets through importance sampling. However, conventional sensitivity-based coreset construction often falls short in optimizing for classification performance metrics, e.g., $F1$ score, focusing instead on loss approximation. In this work, we present a systematic framework for tuning the coreset generation process to enhance downstream classification quality. Our method introduces new tunable parameters--including deterministic sampling, class-wise allocation, and refinement via active sampling, beyond traditional sensitivity scores. Through extensive experiments on diverse datasets and classifiers, we demonstrate that tuned coresets can significantly outperform both vanilla coresets and full dataset training on key classification metrics, offering an effective path towards better and more efficient model training.
Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms
Everett, Derek, Lu, Fred, Raff, Edward, Camacho, Fernando, Holt, James
Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: existing non-stationary bandit policies are designed for a small number of arms, while Lipschitz, linear, and Gaussian process bandit policies are designed to handle a large (or infinite) number of arms in stationary reward environments under constraints on the reward function. In this manuscript, we propose a novel policy to learn reward environments over a continuous space using Gaussian interpolation. We show that our method efficiently learns continuous Lipschitz reward functions with $\mathcal{O}^*(\sqrt{T})$ cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification. We finally demonstrate that our method is computationally favorable (100-10000x faster) and experimentally outperforms sliding Gaussian process policies on datasets with non-stationarity and an extremely large number of arms.
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > Virginia > Fairfax County > McLean (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.68)
Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition
We study a typical optimization model where the optimization variable is composed of multiple probability distributions. Though the model appears frequently in practice, such as for policy problems, it lacks specific analysis in the general setting. For this optimization problem, we propose a new structural condition/landscape description named generalized quasar-convexity (GQC) beyond the realms of convexity. In contrast to original quasar-convexity \citep{hinder2020near}, GQC allows an individual quasar-convex parameter \gamma_i for each variable block i and the smaller of \gamma_i implies less block-convexity. To minimize the objective function, we consider a generalized oracle termed as the internal function that includes the standard gradient oracle as a special case.
Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines
Sharma, Anantha, Devabhaktuni, Swetha, Mohan, Eklove
The Banking, Financial Services, and Insurance (BFSI) sector operates on vast volumes of highly sensitive customer data, creating an enduring tension between the drive for data-driven insights and the imperative to comply with strict privacy and security regulations such as GDPR [1] and CCP A [2]. Traditional anonymization methods like masking, aggregation, k-anonymity, L-diversity, and T-closeness often degrade data quality to the point where sophisticated analytics, fraud detection, risk modeling, and machine learning applications suffer significant performance drops. Moreover, these legacy approaches can remain vulnerable to linkage and inference attacks, undermining both privacy guarantees and competitive innovation in financial institutions. The need for advanced techniques that can create privacy-preserving datasets without sacrificing analytical utility is paramount. In response, advanced techniques for creating privacy-preserving datasets have emerged, broadly categorized as purely synthetic data generation and advanced data perturbation. Purely synthetic data, often created using deep generative models (like GANs), aims to capture the statistical patterns of real data without any one-to-one mapping to real individuals. Advanced data perturbation applies carefully calibrated noise, transformations, and privacy-enhancing techniques like differential privacy to original datasets, seeking to obscure sensitive information while retaining analytical value. These methods can include context-aware transformations, where the nature of the data and its intended use inform the perturbation strategy, ensuring that the resulting dataset remains useful for specific tasks. However, the challenge remains to balance privacy and utility effectively. Traditional methods often fail to provide sufficient privacy guarantees or result in datasets that are too noisy for practical use.
- Europe (0.14)
- North America > United States > California (0.05)
- North America > Canada (0.05)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.56)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)