Goto

Collaborating Authors

 background


Constructing efficient channels for ideal observers using the conjugate gradient method

arXiv.org Machine Learning

Purpose: Task-based assessment of image quality (IQ) is critically important for the design and optimization of medical imaging systems. Ideal observers, including the Bayesian Ideal Observer (IO) and the ideal linear observer, i.e., the Hotelling observer (HO), provide objective figures of merit (FOMs) that quantify system performance on signal detection tasks. However, the application of ideal observers to high-dimensional image data is often computationally intractable. Channel mechanisms provide an effective framework for dimensionality reduction that can facilitate the computation of ideal observers. This work presents a conjugate gradient (CG)-based method to construct efficient channels for approximating the IO and HO performance.


Integrating Bayesian Spectral Deconvolution and Expert Scientific Reasoning for Robust Peak Estimation

arXiv.org Machine Learning

Spectral deconvolution is essential for extracting peak structures that encode material properties and chemical structures, but conventional automated methods often fail when spectra contain high-intensity noise or unknown background components. In practice, scientists rarely interpret spectra in isolation. Instead, they identify physically meaningful peaks by relating spectral structures to auxiliary information such as physical-property values, chemical structures, and trends across related measurements. Here, we propose a Bayesian framework that integrates spectral deconvolution with a model of expert scientific reasoning. In this work, expert scientific reasoning refers to the practice of evaluating candidate spectral structures by their consistency with independently measured physical-property values, rather than to manual expert intervention during inference. We formalize this reasoning as a physical-property regression layer, implemented using Gaussian process regression, and couple it with Bayesian spectral deconvolution. By averaging the physical-property likelihood over posterior predictive spectra inferred from Bayesian spectral deconvolution, the proposed method selects spectral models according to the consistency between inferred spectral structures and physical-property information. We validate the framework using synthetic spectra with high-intensity noise or unknown backgrounds and infrared spectra of poly(lactic acid). The method recovers physically meaningful peak structures that conventional Bayesian spectral deconvolution misses or misidentifies from spectra alone, including weak peaks in poly(lactic acid) IR spectra related to measured degradation rates. These results demonstrate that integrating expert scientific reasoning with Bayesian spectral deconvolution enables robust peak estimation under conditions where spectrum-only inference is unreliable.


I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI

WIRED

For screenwriters like me--and job seekers all over--AI gig work is the new waiting tables. In eight months, I've done 20 of these soul-crushing contracts for five different platforms. My name on the platform is ri611. I work as an AI trainer. I assess whether a chatbot's tone is natural or flat, affected or annoying. I identify patterns in pictures of furniture; search the internet for group photos of strangers whom I'll eliminate from the portrait, one by one. I trawl through bizarre videos so I can annotate and time-stamp the barking of a dog, the moment a stranger walks past a window, the precise millisecond a balloon pops. I generate anime sex scenes and decapitate young women, coax LLMs into giving me recipes for bombs made of household items, and generate invites to a reprise of January 6 at the White House, all as part of a red team whose purpose is to test safety precautions and probe weaknesses. I work for companies with names like Mercor and Outlier and Task-ify and Turing and Handshake and Micro1. In my "other" career, I am a Hollywood writer and showrunner. I create prime-time TV, usually featuring a middle-class white lady having the worst day of her life, with some salt-of-the-earth police interference to raise the stakes. You can find my shows on Paramount and Hulu and the BBC.


Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Neural Information Processing Systems

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.



Distribution of Mentioned IDs17R2>= 3# of IDs

Neural Information Processing Systems

For each image's list of candidate objects, we heuristically downsample to a set of "most interesting" regions by: 1) selecting the at-most k " 4 largest/most central people; 2) keeping the most central/large objects; 3) over-sampling rarer objects according to prior frequency of detection in the LVIS vocabulary; 4) limiting the number of objects of a single type per-image; and 5) downsampling overlapping region proposals to encourage broader coverage of the pixel area of the image.




Object-Centric Slot Diffusion

Neural Information Processing Systems

The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes: it is the first object-centric learning model to replace conventional slot decoders with a latent diffusion model conditioned on object slots, and it is also the first unsupervised compositional conditional diffusion model that operates without the need for supervised annotations like text. Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality. In addition, we conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD and demonstrate its effectiveness in real-world image segmentation and generation.


COHESIV: Contrastive Object and Hand Embeddings for Segmentation In Video

Neural Information Processing Systems

In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand's motion explains other pixels' motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.