histogram
Conditional Diffusion Sampling
Castro-Macías, Francisco M., Morales-Álvarez, Pablo, Syed, Saifuddin, Hernández-Lobato, Daniel, Molina, Rafael, Hernández-Lobato, José Miguel
Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the target distribution. Parallel Tempering (PT) serves as the gold standard, while recent diffusion-based approaches offer a continuous alternative at the cost of neural training. In this work, we introduce Conditional Diffusion Sampling (CDS), a framework that combines these two paradigms. To this end, we derive Conditional Interpolants, a class of stochastic processes whose transport dynamics are governed by an exact, closed-form stochastic differential equation (SDE), requiring no neural approximation. Although these dynamics require sampling from a non-trivial initialization distribution, we show both theoretically and empirically that the cost of this initialization diminishes for sufficiently short diffusion times. CDS leverages this by a two-stage procedure: (1) PT is used to efficiently sample the initial distribution, and then (2) samples are transported via the transport SDE. This combination couples the robust global exploration of PT with efficient local transport. Experiments suggest that CDS has the potential to achieve a superior trade-off between sample quality and density evaluation cost compared to state-of-the-art samplers.
Conformal Prediction using Conditional Histograms
This paper develops a conformal method to compute prediction intervals for nonparametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction intervals provably have marginal coverage in finite samples, while asymptotically achieving conditional coverage and optimal length if the black-box model is consistent. Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.
cca79c22037280d066fbd8bc35ac2e72-Supplemental-Datasets_and_Benchmarks.pdf
A.1 Shower shape variables452 We extend the list of shower shape variables described in Sec. Marginals of each point feature by considering the set all the points from all454 the point clouds together.455 Layer Energy Ei. Denotes the total energy deposited in layer i of the shower. Total energy across all layers of the shower. The layer lateral widths can be interpreted as the spread around the center of energy in the lateral462 plane in respective dimensions.
SUPA: ALightweight Diagnostic Simulator for Machine Learning in Particle Physics
Deep learning methods have gained popularity in high energy physics for fast modeling of particle showers in detectors. Detailed simulation frameworks such as the gold standard GEANT4 are computationally intensive, and current deep generative architectures work on discretized, lower resolution versions of the detailed simulation. The development of models that work at higher spatial resolutions is currently hindered by the complexity of the full simulation data, and by the lack of simpler, more interpretable benchmarks. Our contribution is SUPA, the SUrrogate PArticle propagation simulator, an algorithm and software package for generating data by simulating simplified particle propagation, scattering and shower development in matter. The generation is extremely fast and easy to use compared to GEANT4, but still exhibits the key characteristics and challenges of the detailed simulation. The proposed simulator generates thousands of particle showers per second on a desktop machine, a speed up of up to 6 orders of magnitudes over GEANT4, and stores detailed geometric information about the shower propagation. SUPA provides much greater flexibility for setting initial conditions and defining multiple benchmarks for the development of models. Moreover, interpreting particle showers as point clouds creates a connection to geometric machine learning and provides challenging and fundamentally new datasets for the field.
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
Rebuttal for " Revisiting the Evaluation of Image Synthesis with GANs " Anonymous Author(s) Affiliation Address email
Our presentation is organized for following reasons: In Section 2.3, we present the228 details of generative models, evaluated datasets, and analysis approaches (including our visualization229 tool, histogram matching attack, and human evaluation). They are independent of each other, thus230 we discuss them in parallel in the main paper. In Section 3.1, we investigate the feature extractors231 by first identifying their attention on visual semantics, followed by investigating their robustness to232 the histogram matching attack. Finally, we filter extractors that define similar representation spaces.233 These studies are gradually deepening, thus they are organized in a progressive manner.
Supplementary Material Automatic Unsupervised Outlier Model Selection
Model set Mis composed by pairing outlier detection algorithms to distinct hyperparameter choices. Table 2 provides a comprehensive description of models, including 302 unique models composed by 8 popular outlier detection (OD) algorithms. All models and parameters are based on the Python Outlier Detection Toolbox (PyOD)5. B.1 Complete List of Meta-features We summarize the meta-features used by METAOD in Table 3. When applicable, we provide the formula for computing the meta-feature(s) and corresponding variants.