AITopics | Belgium

Collaborating Authors

Belgium

ColdGANs: Taming Language GANs with Cautious Sampling Strategies Thomas Scialom, Paul-Alexis Dray

Neural Information Processing SystemsMay-22-2025, 10:43:19 GMT

Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > France (0.14)
North America > Canada (0.14)
Europe > Denmark (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Neural Information Processing SystemsMar-27-2025, 14:33:11 GMT

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other modalities. Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients. To solve these problems, in this paper, we present a novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other stateof-the-art methods consistently, demonstrating its effectiveness and versatility. Our code is available at https://github.com/zrguo/CGGM.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.14)
Europe > Belgium (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Label Free Language Model Routing

Neural Information Processing SystemsMar-27-2025, 12:57:45 GMT

Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs may be good for different input samples. Prior approaches have thus explored how engineers might select an LLM to use for each sample (i.e.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
North America > United States (0.14)
Europe > Belgium (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry:

Semiconductors & Electronics (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

a9a67d9309a28372dde3de2a1c837390-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 11:23:20 GMT

artificial intelligence, natural language, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Differentiable Structure Learning with Partial Orders Taiyu Ban Lyuzhou Chen Xiangyu Wang

Neural Information Processing SystemsMar-27-2025, 09:47:54 GMT

Differentiable structure learning is a novel line of causal discovery research that transforms the combinatorial optimization of structural models into a continuous optimization problem. However, the field has lacked feasible methods to integrate partial order constraints, a critical prior information typically used in real-world scenarios, into the differentiable structure learning framework. The main difficulty lies in adapting these constraints, typically suited for the space of total orderings, to the continuous optimization context of structure learning in the graph space. To bridge this gap, this paper formalizes a set of equivalent constraints that map partial orders onto graph spaces and introduces a plug-and-play module for their efficient application. This module preserves the equivalent effect of partial order constraints in the graph space, backed by theoretical validations of correctness and completeness. It significantly enhances the quality of recovered structures while maintaining good efficiency, which learns better structures using 90% fewer samples than the data-based method on a real-world dataset. This result, together with a comprehensive evaluation on synthetic cases, demonstrates our method's ability to effectively improve differentiable structure learning with partial orders.

artificial intelligence, constraint, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning Yixiong Zou Ran Ma Yuhua Li Ruixuan Li

Neural Information Processing SystemsMar-27-2025, 09:18:56 GMT

Cross-domain few-shot learning (CDFSL) is proposed to transfer knowledge from large-scale source-domain datasets to downstream target-domain datasets with only a few training samples. However, Vision Transformer (ViT), as a strong backbone network to achieve many top performances, is still under-explored in the CDFSL task in its transferability against large domain gaps. In this paper, we find an interesting phenomenon of ViT in the CDFSL task: by simply multiplying a temperature (even as small as 0) to the attention in ViT blocks, the target-domain performance consistently increases, even though the attention map is downgraded to a uniform map.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Belgium (0.14)
Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

c2f71567cd53464161cab3336e8fc865-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsMar-27-2025, 06:52:42 GMT

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte.

data mining, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)

Genre: Workflow (0.67)

Industry:

Information Technology > Software (0.93)
Law (0.92)
Information Technology > Services (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Add feedback

A Fast Convoluted Story: Scaling Probabilistic Inference for Integer Arithmetic

Neural Information Processing SystemsMar-27-2025, 03:58:47 GMT

As illustrated by the success of integer linear programming, linear integer arithmetic is a powerful tool for modelling combinatorial problems. Furthermore, the probabilistic extension of linear programming has been used to formulate problems in neurosymbolic AI. However, two key problems persist that prevent the adoption of neurosymbolic techniques beyond toy problems. First, probabilistic inference is inherently hard, #P-hard to be precise. Second, the discrete nature of integers renders the construction of meaningful gradients challenging, which is problematic for learning. In order to mitigate these issues, we formulate linear arithmetic over integer-valued random variables as tensor manipulations that can be implemented in a straightforward fashion using modern deep learning libraries. At the core of our formulation lies the observation that the addition of two integer-valued random variables can be performed by adapting the fast Fourier transform to probabilities in the log-domain. By relying on tensor operations we obtain a differentiable data structure, which unlocks, virtually for free, gradient-based learning. In our experimental validation we show that tensorising probabilistic linear integer arithmetic and leveraging the fast Fourier transform allows us to push the state of the art by several orders of magnitude in terms of inference and learning times.

artificial intelligence, data quality, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Energy (0.92)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks

Neural Information Processing SystemsMar-26-2025, 23:55:45 GMT

We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual ReLU neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during stochastic optimization, which impedes fast convergence and hurts generalization performance. Addressing this, we propose Geometric Parameterization (GmP), a novel neural network parameterization technique that effectively separates the radial and angular components of weights in the hyperspherical coordinate system. We show theoretically that GmP resolves the aforementioned instability issue. We report empirical results on various models and benchmarks to verify GmP's advantages of optimization stability, convergence speed and generalization performance.

artificial intelligence, deep learning, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Belgium (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Batched Energy-Entropy acquisition for Bayesian Optimization Felix Teufel 12

Neural Information Processing SystemsMar-26-2025, 17:45:38 GMT

Bayesian optimization (BO) is an attractive machine learning framework for performing sample-efficient global optimization of black-box functions. The optimization process is guided by an acquisition function that selects points to acquire in each round of BO. In batched BO, when multiple points are acquired in parallel, commonly used acquisition functions are often high-dimensional and intractable, leading to the use of sampling-based alternatives. We propose a statistical physics inspired acquisition function for BO with Gaussian processes that can natively handle batches. Batched Energy-Entropy acquisition for BO (BEEBO) enables tight control of the explore-exploit trade-off of the optimization process and generalizes to heteroskedastic black-box problems. We demonstrate the applicability of BEEBO on a range of problems, showing competitive performance to existing methods.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country: