Goto

Collaborating Authors

 Mishra-Sharma, Siddharth


Auditing language models for hidden objectives

arXiv.org Artificial Intelligence

We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two ways. First, we conduct a blind auditing game where four teams, unaware of the model's hidden objective or training, investigate it for concerning behaviors and their causes. Three teams successfully uncovered the model's hidden objective using techniques including interpretability with sparse autoencoders (SAEs), behavioral attacks, and training data analysis. Second, we conduct an unblinded follow-up study of eight techniques for auditing the model, analyzing their strengths and limitations. Overall, our work provides a concrete example of using alignment audits to discover a model's hidden objective and proposes a methodology for practicing and validating progress in alignment auditing.


Inferring the Morphology of the Galactic Center Excess with Gaussian Processes

arXiv.org Machine Learning

Descriptions of the Galactic Center using Fermi gamma-ray data have so far modeled the Galactic Center Excess (GCE) as a template with fixed spatial morphology or as a linear combination of such templates. Although these templates are informed by various physical expectations, the morphology of the excess is a priori unknown. For the first time, we describe the GCE using a flexible, non-parametric machine learning model -- the Gaussian process (GP). We assess our model's performance on synthetic data, demonstrating that the model can recover the templates used to generate the data. We then fit the \Fermi data with our model in a single energy bin from 2-20 GeV (leaving a spectral GP analysis of the GCE for future work) using a variety of template models of diffuse gamma-ray emission to quantify our fits' systematic uncertainties associated with diffuse emission modeling. We interpret our best-fit GP in terms of GCE templates consisting of an NFW squared template and a bulge component to determine which bulge models can best describe the fitted GP and to what extent the best-fit GP is described better by an NFW squared template versus a bulge template. The best-fit GP contains morphological features that are typically not associated with traditional GCE studies. These include a localized bright source at around $(\ell,b) = (20^{\circ}, 0^{\circ})$ and a diagonal arm extending Northwest from the Galactic Center. In spite of these novel features, the fitted GP is explained best by a template-based model consisting of the bulge presented in Coleman et al. (2020) and a squared NFW component. Our results suggest that the physical interpretation of the GCE in terms of stellar bulge and NFW-like components is highly sensitive to the assumed morphologies, background models, and the region of the sky used for inference.


A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

arXiv.org Artificial Intelligence

Efficiently processing structured point cloud data while preserving multiscale information is a key challenge across domains, from graphics to atomistic modeling. Using a curated dataset of simulated galaxy positions and properties, represented as point clouds, we benchmark the ability of graph neural networks to simultaneously capture local clustering environments and long-range correlations. Given the homogeneous and isotropic nature of the Universe, the data exhibits a high degree of symmetry. We therefore focus on evaluating the performance of Euclidean symmetry-preserving ($E(3)$-equivariant) graph neural networks, showing that they can outperform non-equivariant counterparts and domain-specific information extraction techniques in downstream performance as well as simulation-efficiency. However, we find that current architectures fail to capture information from long-range correlations as effectively as domain-specific baselines, motivating future work on architectures better suited for extracting long-range information.


PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

arXiv.org Artificial Intelligence

We present PAPERCLIP (Proposal Abstracts Provide an Effective Representation for Contrastive Language-Image Pre-training), a method which associates astronomical observations imaged by telescopes with natural language using a neural network model. The model is fine-tuned from a pre-trained Contrastive Language-Image Pre-training (CLIP) model using successful observing proposal abstracts and corresponding downstream observations, with the abstracts optionally summarized via guided generation using large language models (LLMs). Using observations from the Hubble Space Telescope (HST) as an example, we show that the fine-tuned model embodies a meaningful joint representation between observations and natural language through tests targeting image retrieval (i.e., finding the most relevant observations using natural language queries) and description retrieval (i.e., querying for astrophysical object classes and use cases most relevant to a given observation). Our study demonstrates the potential for using generalist foundation models rather than task-specific models for interacting with astronomical data by leveraging text as an interface.


Data Compression and Inference in Cosmology with Self-Supervised Machine Learning

arXiv.org Artificial Intelligence

The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations. Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries, which can be used for a variety of downstream tasks, including precise and accurate parameter inference. We demonstrate how this paradigm can be used to construct summary representations that are insensitive to prescribed systematic effects, such as the influence of baryonic physics. Our results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.


A point cloud approach to generative modeling for galaxy surveys at the field level

arXiv.org Machine Learning

We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.


Hierarchical Neural Simulation-Based Inference Over Event Ensembles

arXiv.org Artificial Intelligence

When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).


Uncovering dark matter density profiles in dwarf galaxies with graph neural networks

arXiv.org Artificial Intelligence

Dwarf galaxies are small, dark matter-dominated galaxies, some of which are embedded within the Milky Way. Their lack of baryonic matter (e.g., stars and gas) makes them perfect test beds for probing the properties of dark matter -- understanding the spatial dark matter distribution in these systems can be used to constrain microphysical dark matter interactions that influence the formation and evolution of structures in our Universe. We introduce a new method that leverages simulation-based inference and graph-based machine learning in order to infer the dark matter density profiles of dwarf galaxies from observable kinematics of stars gravitationally bound to these systems. Our approach aims to address some of the limitations of established methods based on dynamical Jeans modeling. We show that this novel method can place stronger constraints on dark matter profiles and, consequently, has the potential to weigh in on some of the ongoing puzzles associated with the small-scale structure of dark matter halos, such as the core-cusp discrepancy.


Semi-parametric $\gamma$-ray modeling with Gaussian processes and variational inference

arXiv.org Machine Learning

Mismodeling the uncertain, diffuse emission of Galactic origin can seriously bias the characterization of astrophysical gamma-ray data, particularly in the region of the Inner Milky Way where such emission can make up over 80% of the photon counts observed at ~GeV energies. We introduce a novel class of methods that use Gaussian processes and variational inference to build flexible background and signal models for gamma-ray analyses with the goal of enabling a more robust interpretation of the make-up of the gamma-ray sky, particularly focusing on characterizing potential signals of dark matter in the Galactic Center with data from the Fermi telescope.


Mining for Dark Matter Substructure: Inferring subhalo population properties from strong lenses with machine learning

arXiv.org Machine Learning

The subtle and unique imprint of dark matter substructure on extended arcs in strong lensing systems contains a wealth of information about the properties and distribution of dark matter on small scales and, consequently, about the underlying particle physics. However, teasing out this effect poses a significant challenge since the likelihood function for realistic simulations of population-level parameters is intractable. We apply recently-developed simulation-based inference techniques to the problem of substructure inference in galaxy-galaxy strong lenses. By leveraging additional information extracted from the simulator, neural networks are efficiently trained to estimate likelihood ratios associated with population-level parameters characterizing substructure. Through proof-of-principle application to simulated data, we show that these methods can provide an efficient and principled way to simultaneously analyze an ensemble of strong lenses, and can be used to mine the large sample of lensing images deliverable by near-future surveys for signatures of dark matter substructure.