Goto

Collaborating Authors

 input



Approximating Real-Time Recurrent Learning with Random Kronecker Factors

Asier Mujika, Florian Meier, Angelika Steger

Neural Information Processing Systems

Wealso confirm these theoretical results experimentally. Further,we showempirically thattheKF-RTRLalgorithm captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by trainingRecurrent Highway Networks on a synthetic string memorization task and onthe Penn TreeBank task, respectively.


Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited

Di Wang, Marco Gaboardi, Jinhui Xu

Neural Information Processing Systems

In this paper, we revisit the Empirical Risk Minimization problem in the noninteractive local model of differential privacy. In the case of constant or low dimensions (pn), we first show that if the loss function is(,T)-smooth, wecanavoidadependence ofthesample complexity,toachieveerrorα,onthe exponential of the dimensionalityp with base1/α (i.e.,α p), which answers a questionin[19].


FullyUnconstrainedOnlineLearning

Neural Information Processing Systems

We provide a technique for online convex optimization that obtains regret G w Tlog( w G T)+ w 2 +G2 on G-Lipschitz losses for any comparison pointw without knowing eitherG or w .



Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Meng, Ruolin, Chung, Ming-Yu, Brahma, Dhanajit, Henao, Ricardo, Carin, Lawrence

arXiv.org Machine Learning

We consider inferring the causal effect of a treatment (intervention) on an outcome of interest in situations where there is potentially an unobserved confounder influencing both the treatment and the outcome. This is achievable by assuming access to two separate sets of control (proxy) measurements associated with treatment and outcomes, which are used to estimate treatment effects through a function termed the em causal bridge (CB). We present a new theoretical perspective, associated assumptions for when estimating treatment effects with the CB is feasible, and a bound on the average error of the treatment effect when the CB assumptions are violated. From this new perspective, we then demonstrate how coupling the CB with an autoencoder architecture allows for the sharing of statistical strength between observed quantities (proxies, treatment, and outcomes), thus improving the quality of the CB estimates. Experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach in relation to the state-of-the-art methodology for proxy measurements.


What If the Input is Expanded in OOD Detection?

Neural Information Processing Systems

Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes, which is important for the reliable deployment of machine learning models in the open world. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. However, existing methods generally focus on excavating the discriminative information from a single input, which implicitly limits its representation dimension. In this work, we introduce a novel perspective, i.e., employing different common corruptions on the input space, to expand that. We reveal an interesting phenomenon termed confidence mutation, where the confidence of OOD data can decrease significantly under the corruptions, while the ID data shows a higher confidence expectation considering the resistance of semantic features. Based on that, we formalize a new scoring method, namely, Confidence aVerage (CoVer), which can capture the dynamic differences by simply averaging the scores obtained from different corrupted inputs and the original ones, making the OOD and ID distributions more separable in detection tasks.


LUSD: Localized Update Score Distillation for Text-Guided Image Editing

Chinchuthakun, Worameth, Saengja, Tossaporn, Tritrong, Nontawat, Rewatbowornwong, Pitchaporn, Khungurn, Pramook, Suwajanakorn, Supasorn

arXiv.org Artificial Intelligence

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.


Exploring the Word Sense Disambiguation Capabilities of Large Language Models

Basile, Pierpaolo, Siciliani, Lucia, Musacchio, Elio, Semeraro, Giovanni

arXiv.org Artificial Intelligence

Word Sense Disambiguation (WSD) is a historical task in computational linguistics that has received much attention over the years. However, with the advent of Large Language Models (LLMs), interest in this task (in its classical definition) has decreased. In this study, we evaluate the performance of various LLMs on the WSD task. We extend a previous benchmark (XL-WSD) to re-design two subtasks suitable for LLM: 1) given a word in a sentence, the LLM must generate the correct definition; 2) given a word in a sentence and a set of predefined meanings, the LLM must select the correct one. The extended benchmark is built using the XL-WSD and BabelNet. The results indicate that LLMs perform well in zero-shot learning but cannot surpass current state-of-the-art methods. However, a fine-tuned model with a medium number of parameters outperforms all other models, including the state-of-the-art.


ObjectMover: Generative Object Movement with Video Prior

Yu, Xin, Wang, Tianyu, Kim, Soo Ye, Guerrero, Paul, Chen, Xi, Liu, Qing, Lin, Zhe, Qi, Xiaojuan

arXiv.org Artificial Intelligence

Simple as it seems, moving an object to another location within an image is, in fact, a challenging image-editing task that requires re-harmonizing the lighting, adjusting the pose based on perspective, accurately filling occluded regions, and ensuring coherent synchronization of shadows and reflections while maintaining the object identity. In this paper, we present ObjectMover, a generative model that can perform object movement in highly challenging scenes. Our key insight is that we model this task as a sequence-to-sequence problem and fine-tune a video generation model to leverage its knowledge of consistent object generation across video frames. We show that with this approach, our model is able to adjust to complex real-world scenarios, handling extreme lighting harmonization and object effect movement. As large-scale data for object movement are unavailable, we construct a data generation pipeline using a modern game engine to synthesize high-quality data pairs. We further propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization. Through extensive experiments, we demonstrate that ObjectMover achieves outstanding results and adapts well to real-world scenarios.