Well File:

Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy

Neural Information Processing Systems

Common explanations for shortcut learning assume that the shortcut improves prediction only under the training distribution. Thus, models trained in the typical way by minimizing log-loss using gradient descent, which we call default-ERM, should utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM exhibits shortcut learning. Why are such solutions preferred when the loss can be driven to zero when using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin, even without overparameterization, leads to models that depend more on the shortcut than the stable feature.


Cal-DETR: Calibrated Detection Transformer

Neural Information Processing Systems

Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR, and DINO.


Social media giant hit with scathing ad campaign amid anger over AI chatbots sexually exploiting kids

FOX News

A nonprofit parents coalition is calling on multiple congressional committees to launch an investigation into Meta for prioritizing engagement metrics that put children's safety at risk. The call is part of a three-pronged attack campaign by the American Parents Coalition (APC), launched Thursday. It includes a letter to lawmakers with calls for investigations, a new parental notification system to help parents stay informed on issues impacting their kids at Meta and beyond, and mobile billboards at Meta D.C. and California headquarters, calling out the company for failure to adequately prioritize protecting children. APC's campaign follows an April Wall Street Journal report that included an investigation looking into how the company's metrics focus has led to potential harms for children. "This is not the first time Meta has been caught making tech available to kids that exposes them to inappropriate content," APC Executive Director Alleigh Marre said. "Parents across America should be extremely wary of their children's online activity, especially when it involves emerging technology like AI digital companions.


US military would be unleashed on enemy drones on the homeland if bipartisan bill passes

FOX News

FIRST ON FOX: Dozens of drones that traipsed over Langley Air Force base in late 2023 revealed an astonishing oversight: Military officials did not believe they had the authority to shoot down the unmanned vehicles over the U.S. homeland. A new bipartisan bill, known as the COUNTER Act, seeks to rectify that, offering more bases the opportunity to become a "covered facility," or one that has the authority to shoot down drones that encroach on their airspace. The new bill has broad bipartisan and bicameral support, giving it a greater chance of becoming law. It's led by Armed Services Committee members Tom Cotton, R-Ark., and Kirsten Gillibrand, D-N.Y., in the Senate, and companion legislation is being introduced by August Pfluger, R-Texas, and Chrissy Houlahan, D-Pa., in the House. Currently, only half of the 360 domestic U.S. bases are considered "covered facilities" that are allowed to engage with unidentified drones.


PromptIR: Prompting for All-in-One Image Restoration

Neural Information Processing Systems

Image restoration involves recovering a high-quality clean image from its degraded version. Deep learning-based methods have significantly improved image restoration performance, however, they have limited generalization ability to different degradation types and levels. This restricts their real-world application since it requires training individual models for each specific degradation and knowing the input degradation type to apply the relevant model. We present a prompt-based learning approach, PromptIR, for All-In-One image restoration that can effectively restore images from various types and levels of degradation. In particular, our method uses prompts to encode degradation-specific information, which is then used to dynamically guide the restoration network.


Data Augmentations for Improved (Large) Language Model Generalization

Neural Information Processing Systems

The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features and to learn more robust text classifiers. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute. Under the assumptions of such problems, we discuss the favorable sample complexity of counterfactual data augmentation, compared to importance re-weighting. Pragmatically, we match examples using auxiliary data, based on diff-in-diff methodology, and use a large language model (LLM) to represent a conditional probability of text.


Holistic Evaluation of Text-to-Image Models

Neural Information Processing Systems

The stunning qualitative improvement of text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on image-text alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark.


Unbiased learning of deep generative models with structured discrete representations

Neural Information Processing Systems

By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work.


DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

Neural Information Processing Systems

The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to set the domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain.


3D molecule generation by denoising voxel grids

Neural Information Processing Systems

We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids.First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules.Then, we follow the neural empirical Bayes framework [Saremi and Hyvarinen, 2019] and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step.Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm.Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.