mukherjee
Bilevel Learning via Inexact Stochastic Gradient Descent
Salehi, Mohammad Sadegh, Mukherjee, Subhadip, Roberts, Lindon, Ehrhardt, Matthias J.
Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-adaptive regularizers and optimizing forward operators in variational regularization. These problems are large in many ways: a lot of data is usually available to train a large number of parameters, calling for stochastic gradient-based algorithms. However, exact gradients with respect to parameters (so-called hypergradients) are not available, and their precision is usually linearly related to computational cost. Hence, algorithms must solve the problem efficiently without unnecessary precision. The design of such methods is still not fully understood, especially regarding how accuracy requirements and step size schedules affect theoretical guarantees and practical performance. Existing approaches introduce stochasticity at both the upper level (e.g., in sampling or mini-batch estimates) and the lower level (e.g., in solving the inner problem) to improve generalization, but they typically fix the number of lower-level iterations, which conflicts with asymptotic convergence assumptions. In this work, we advance the theory of inexact stochastic bilevel optimization. We prove convergence and establish rates under decaying accuracy and step size schedules, showing that with optimal configurations convergence occurs at an $\mathcal{O}(k^{-1/4})$ rate in expectation. Experiments on image denoising and inpainting with convex ridge regularizers and input-convex networks confirm our analysis: decreasing step sizes improve stability, accuracy scheduling is more critical than step size strategy, and adaptive preconditioning (e.g., Adam) further boosts performance. These results bridge theory and practice, providing convergence guarantees and practical guidance for large-scale imaging problems.
- Europe > Switzerland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia (0.04)
- (2 more...)
Pivotal CLTs for Pseudolikelihood via Conditional Centering in Dependent Random Fields
Data from such models often exhibits significant deviations from classical Gaussian approximations. A natural class of statistics to analyze in such models are conditionally centered averages (see [30, 63, 52]), where one recenters the observations by their mean, given all other observations. Crucially, such conditionally centered CLTs are closely tied to maximum pseudolikelihood estimators (MPLEs) through the MPLE score (see [64, 60, 41]). This connection is practically important because in many graphical/Markov random field models (such as Ising models, exponential random graph models (ERGMs), etc.), computing the MLE is impeded by an intractable normalizing constant, whereas pseudolikelihood replaces the joint likelihood with a product of tractable conditional models, scales to large networks, and is widely usable in practice. However, most existing theory for conditionally centered statistics and for MPLE focuses on local dependence -- e.g., bounded degree or sparse neighborhoods -- and does not cover realistic dense regimes in which every node may have many connections (which scale with the size of the network). This paper bridges that gap by developing a general limit theory for conditionally centered statistics under weak and verifiable assumptions. Our results accommodate both sparse and dense interactions, as well as regular and irregular network connections. In particular, we deliver valid studentized inference for pseudolikelihood in network/Markov random field settings. As examples, we obtain new CLTs for conditionally centered averages and pseudo-likelihood estimators in Ising models (with pairwise and tensor interactions), and exponential random graph models, without imposing sparsity, regularity, or high temperature restrictions.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (4 more...)
ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations
Swain, Subhankar, Rizwan, Naquee, Deb, Nayandeep, Solanki, Vishwajeet Singh, S, Vishwa Gangadhar, Mukherjee, Animesh
The 2025 Global Risks Report identifies state-based armed conflict and societal polarisation among the most pressing global threats, with social media playing a central role in amplifying toxic discourse. Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high cost of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a tag generation module that produces socially grounded tags, because most in-the-wild memes often do not come with tags. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments.
- Europe > Ukraine (0.14)
- Asia > Russia (0.14)
- North America > United States > Alabama (0.05)
- (6 more...)
- Government > Military (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
AI-enhanced semantic feature norms for 786 concepts
Suresh, Siddharth, Mukherjee, Kushin, Giallanza, Tyler, Yu, Xizheng, Patil, Mia, Cohen, Jonathan D., Rogers, Timothy T.
Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verifying the quality of norms against reliable human judgments. We find that our AI-enhanced feature norm dataset, NOVA: Norms Optimized Via AI, shows much higher feature density and overlap among concepts while outperforming a comparable human-only norm dataset and word-embedding models in predicting people's semantic similarity judgments. Taken together, we demonstrate that human conceptual knowledge is richer than captured in previous norm datasets and show that, with proper validation, LLMs can serve as powerful tools for cognitive science research.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
Jennifer Doudna on the Brave New World Being Ushered In by Gene Editing
In 2012, the biochemist Jennifer Doudna and her colleague Emmanuelle Charpentier developed a method for using RNA-guided proteins to edit specific sections of DNA. Their innovation--for which the two won the Nobel Prize in Chemistry, in 2020--is known as the CRISPR-Cas9 gene-editing system. CRISPR has since been used to alter plants (to, for instance, produce greater yields), insects (preventing them from carrying certain diseases), and people (to treat sickle-cell disease). The technology's promise can sound as if derived from science fiction: it might help us adapt to a radically different climate, or grow organs for those in need, or reprogram a cancer patient's own cells to target tumors. But there are also worries about its possible side effects, both biological and social.
- Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.70)
RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance
Mukherjee, Avideep, Banerjee, Soumya, Rai, Piyush, Namboodiri, Vinay P.
Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once. However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial. To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.
- Europe > United Kingdom (0.04)
- Asia > India > Uttar Pradesh (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
LADDER: Revisiting the Cosmic Distance Ladder with Deep Learning Approaches and Exploring its Applications
Shah, Rahul, Saha, Soumadeep, Mukherjee, Purba, Garain, Utpal, Pal, Supratik
ABSTRACT We investigate the prospect of reconstructing the "cosmic distance ladder" of the Universe using a novel deep learning framework called LADDER - Learning Algorithm for Deep Distance Estimation and Reconstruction. LADDER is trained on the apparent magnitude data from the Pantheon Type Ia supernovae compilation, incorporating the full covariance information among data points, to produce predictions along with corresponding errors. After employing several validation tests with a number of deep learning models, we pick LADDER as the best performing one. We then demonstrate applications of our method in the cosmological context, that include serving as a model-independent tool for consistency checks for other datasets like baryon acoustic oscillations, calibration of high-redshift datasets such as gamma ray bursts, use as a model-independent mock catalog generator for future probes, etc. INTRODUCTION Knowledge of accurate distances to astronomical entities at various redshifts is essential for deducing the expansion history of the Universe. Observationally, however, this task is not simple since there does not exist one single standardizable measure of distances at all scales of cosmological interest. Hence one has to resort to a progressive method of calibrating distances, called the "cosmic distance ladder" method, using overlapping regions of potentially different standardizable objects as "rungs of the ladder". The conventional distance ladder method (Riess & Breuval 2023) starts with direct measures of geometric distance measures and progresses to calibrating Cepheid variables (Freedman & Madore 2023) or Tip of the Red Giant Branch (TRGB) stars (Freedman et al. 2020), and finally Type Ia supernovae (SNIa).
PINSAT: Parallelized Interleaving of Graph Search and Trajectory Optimization for Kinodynamic Motion Planning
Natarajan, Ramkumar, Mukherjee, Shohin, Choset, Howie, Likhachev, Maxim
Trajectory optimization is a widely used technique in robot motion planning for letting the dynamics and constraints on the system shape and synthesize complex behaviors. Several previous works have shown its benefits in high-dimensional continuous state spaces and under differential constraints. However, long time horizons and planning around obstacles in non-convex spaces pose challenges in guaranteeing convergence or finding optimal solutions. As a result, discrete graph search planners and sampling-based planers are preferred when facing obstacle-cluttered environments. A recently developed algorithm called INSAT effectively combines graph search in the low-dimensional subspace and trajectory optimization in the full-dimensional space for global kinodynamic planning over long horizons. Although INSAT successfully reasoned about and solved complex planning problems, the numerous expensive calls to an optimizer resulted in large planning times, thereby limiting its practical use. Inspired by the recent work on edge-based parallel graph search, we present PINSAT, which introduces systematic parallelization in INSAT to achieve lower planning times and higher success rates, while maintaining significantly lower costs over relevant baselines. We demonstrate PINSAT by evaluating it on 6 DoF kinodynamic manipulation planning with obstacles.
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Lai, Chengen, Song, Shengli, Meng, Shiqi, Li, Jingyang, Yan, Sitong, Hu, Guangneng
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations. These problems reduce the faithfulness of explanations generated by models. To address the above issues, we propose a novel self-supervised \textbf{M}ulti-level \textbf{C}ontrastive \textbf{L}earning based natural language \textbf{E}xplanation model (MCLE) for VQA with semantic-level, image-level, and instance-level factual and counterfactual samples. MCLE extracts discriminative features and aligns the feature spaces from explanations with visual question and answer to generate more consistent explanations. We conduct extensive experiments, ablation analysis, and case study to demonstrate the effectiveness of our method on two VQA-NLE benchmarks.
Provably Convergent Plug-and-Play Quasi-Newton Methods
Tan, Hong Ye, Mukherjee, Subhadip, Tang, Junqi, Schönlieb, Carola-Bibiane
Plug-and-Play (PnP) methods are a class of efficient iterative methods that aim to combine data fidelity terms and deep denoisers using classical optimization algorithms, such as ISTA or ADMM, with applications in inverse problems and imaging. Provable PnP methods are a subclass of PnP methods with convergence guarantees, such as fixed point convergence or convergence to critical points of some energy function. Many existing provable PnP methods impose heavy restrictions on the denoiser or fidelity function, such as non-expansiveness or strict convexity, respectively. In this work, we propose a novel algorithmic approach incorporating quasi-Newton steps into a provable PnP framework based on proximal denoisers, resulting in greatly accelerated convergence while retaining light assumptions on the denoiser. By characterizing the denoiser as the proximal operator of a weakly convex function, we show that the fixed points of the proposed quasi-Newton PnP algorithm are critical points of a weakly convex function. Numerical experiments on image deblurring and super-resolution demonstrate 2--8x faster convergence as compared to other provable PnP methods with similar reconstruction quality.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)