Goto

Collaborating Authors

 ambiguous image


SyncSDE: A Probabilistic Framework for Diffusion Synchronization

Lee, Hyunjun, Lee, Hyunsoo, Han, Sookwan

arXiv.org Machine Learning

There have been many attempts to leverage multiple diffusion models for collaborative generation, extending beyond the original domain. A prominent approach involves synchronizing multiple diffusion trajectories by mixing the estimated scores to artificially correlate the generation processes. However, existing methods rely on naive heuristics, such as averaging, without considering task specificity. These approaches do not clarify why such methods work and often fail when a heuristic suitable for one task is blindly applied to others. In this paper, we present a probabilistic framework for analyzing why diffusion synchronization works and reveal where heuristics should be focused - modeling correlations between multiple trajectories and adapting them to each specific task. We further identify optimal correlation models per task, achieving better results than previous approaches that apply a single heuristic across all tasks without justification.


Ambiguous Images With Human Judgments for Robust Visual Event Classification

Neural Information Processing Systems

Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.


Reviews: A Probabilistic U-Net for Segmentation of Ambiguous Images

Neural Information Processing Systems

Post rebuttal: Authors have responded well to the issues raised, and I champion publication of this work. Main idea: Use a conditional variational auto-encoder to produce well-calibrated segmentation hypotheses for a given input. Strengths: The application is well motivated and experiments are convincing and state of the art. Possibly in response, the manuscript is a little vague in its positioning relative to prior work. While relevant prior work is cited, the reader is left with some ambiguity and, if not familiar with this prior work, might be misled to think that there is methodological innovation beyond the specifics of architecture and application.


An ideal observer model for identifying the reference frame of objects

Neural Information Processing Systems

The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols and + differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conflicting reference frames, the model predicts two factors should influence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object (proximity) and it should be more likely to share the reference frame containing the most objects (alignment). We confirm people use both cues using a novel methodology that allows for easy testing of human reference frame inference.


Pruning Distorted Images in MNIST Handwritten Digits

R, Amarnath, Kumar, Vinay V

arXiv.org Artificial Intelligence

Recognizing handwritten digits is a challenging task primarily due to the diversity of writing styles and the presence of noisy images. The widely used MNIST dataset, which is commonly employed as a benchmark for this task, includes distorted digits with irregular shapes, incomplete strokes, and varying skew in both the training and testing datasets. Consequently, these factors contribute to reduced accuracy in digit recognition. To overcome this challenge, we propose a two-stage deep learning approach. In the first stage, we create a simple neural network to identify distorted digits within the training set. This model serves to detect and filter out such distorted and ambiguous images. In the second stage, we exclude these identified images from the training dataset and proceed to retrain the model using the filtered dataset. This process aims to improve the classification accuracy and confidence levels while mitigating issues of underfitting and overfitting. Our experimental results demonstrate the effectiveness of the proposed approach, achieving an accuracy rate of over 99.5% on the testing dataset. In our future work, we intend to explore the scalability of this approach and investigate techniques to further enhance accuracy by reducing the size of the training data. NTRODUCTION Handwritten digit recognition is a complex task that finds applications in various fields, including computer vision and machine learning. It involves the identification and classification of digits written by hand, enabling tasks such as character recognition and digit analysis.


Machado

AAAI Conferences

This work explores the creation of ambiguous images, i.e., images that may induce multistable perception, by evolutionary means. Ambiguous images are created using a general purpose approach, composed of an expression-based evolutionary engine and a set of object detectors, which are trained in advance using Machine Learning techniques. Images are evolved using Genetic Programming and object detectors are used to classify them. The information gathered during classification is used to assign fitness. In a first stage, the system is used to evolve images that resemble a single object. In a second stage, the discovery of ambiguous images is promoted by combining pairs of object detectors. The analysis of the results highlights the ability of the system to evolve ambiguous images and the differences between computational and human ambiguous images.


A Probabilistic U-Net for Segmentation of Ambiguous Images

Kohl, Simon, Romera-Paredes, Bernardino, Meyer, Clemens, Fauw, Jeffrey De, Ledsam, Joseph R., Maier-Hein, Klaus, Eslami, S. M. Ali, Rezende, Danilo Jimenez, Ronneberger, Olaf

Neural Information Processing Systems

Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses.


Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach

Booth, Serena, Shah, Ankit, Zhou, Yilun, Shah, Julie

arXiv.org Machine Learning

Though neural network models demonstrate impressive performance, we do not understand exactly how these black-box models make individual predictions. This drawback has led to substantial research devoted to understand these models in areas such as robustness, interpretability, and generalization ability. In this paper, we consider the problem of exploring the prediction level sets of a classifier using probabilistic programming. We define a prediction level set to be the set of examples for which the predictor has the same specified prediction confidence with respect to some arbitrary data distribution. Notably, our sampling-based method does not require the classifier to be differentiable, making it compatible with arbitrary classifiers. As a specific instantiation, if we take the classifier to be a neural network and the data distribution to be that of the training data, we can obtain examples that will result in specified predictions by the neural network. We demonstrate this technique with experiments on a synthetic dataset and MNIST. Such level sets in classification may facilitate human understanding of classification behaviors.


Evolving Ambiguous Images

Machado, Penousal (University of Coimbra) | Vinhas, Adriano (University of Coimbra) | Correia, João (University of Coimbra) | Ekárt, Aniko (Aston University)

AAAI Conferences

This work explores the creation of ambiguous images, i.e., images that may induce multistable perception, by evolutionary means. Ambiguous images are created using a general purpose approach, composed of an expression-based evolutionary engine and a set of object detectors, which are trained in advance using Machine Learning techniques. Images are evolved using Genetic Programming and object detectors are used to classify them. The information gathered during classification is used to assign fitness. In a first stage, the system is used to evolve images that resemble a single object. In a second stage, the discovery of ambiguous images is promoted by combining pairs of object detectors. The analysis of the results highlights the ability of the system to evolve ambiguous images and the differences between computational and human ambiguous images.


An ideal observer model for identifying the reference frame of objects

Austerweil, Joseph L., Friesen, Abram L., Griffiths, Thomas L.

Neural Information Processing Systems

The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols $\times$ and $+$ differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conflicting reference frames, the model predicts two factors should influence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object ({\em proximity}) and it should be more likely to share the reference frame containing the most objects ({\em alignment}). We confirm people use both cues using a novel methodology that allows for easy testing of human reference frame inference.