AITopics | ln null

Collaborating Authors

ln null

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bdebb4549d5a79501bc151411abdb6d7-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 20:54:06 GMT

The following lemma is standard in the literature, see e.g.

artificial intelligence, exp, inequality, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Arizona (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Appendix for Softmax Deep Double Deterministic Policy Gradients Ling Pan

Neural Information Processing SystemsFeb-9-2026, 06:33:38 GMT

We demonstrate the smoothing effect of SD3 on the optimization landscape in this section, where experimental setup is the same as in Section 4.1 in the text for the comparative study of SD2 and Experimental details can be found in Section B.2. The performance comparison of SD3 and TD3 is shown in Figure 1(a), where SD3 significantly outperforms TD3. So far, we have demonstrated the smoothing effect of SD3 over TD3. Hyperparameters of DDPG and SD2 are summarized in Table 1. Assume that the actor is a local maximizer with respect to the critic.

artificial intelligence, machine learning, sd3, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

ShiQ: Bringing back Bellman to LLMs

Clavier, Pierre, Grinsztajn, Nathan, Avalos, Raphael, Flet-Berliac, Yannis, Ergun, Irem, Domingues, Omar D., Tarassov, Eugene, Pietquin, Olivier, Richemond, Pierre H., Strub, Florian, Geist, Matthieu

arXiv.org Artificial IntelligenceMay-19-2025

The fine-tuning of pre-trained large language models (LLMs) using reinforcement learning (RL) is generally formulated as direct policy optimization. This approach was naturally favored as it efficiently improves a pretrained LLM, seen as an initial policy. Another RL paradigm, Q-learning methods, has received far less attention in the LLM community while demonstrating major success in various non-LLM RL tasks. In particular, Q-learning effectiveness comes from its sample efficiency and ability to learn offline, which is particularly valuable given the high computational cost of sampling with LLMs. However, naively applying a Q-learning-style update to the model's logits is ineffective due to the specificity of LLMs. Our core contribution is to derive theoretically grounded loss functions from Bellman equations to adapt Q-learning methods to LLMs. To do so, we carefully adapt insights from the RL literature to account for LLM-specific characteristics, ensuring that the logits become reliable Q-value estimates. We then use this loss to build a practical algorithm, ShiQ for Shifted-Q, that supports off-policy, token-wise learning while remaining simple to implement. Finally, we evaluate ShiQ on both synthetic data and real-world benchmarks, e.g., UltraFeedback and BFCL-V3, demonstrating its effectiveness in both single-turn and multi-turn LLM settings

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.11081

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Safety and optimality in learning-based control at low computational cost

Baumann, Dominik, Kowalczyk, Krzysztof, Rojas, Cristian R., Tiels, Koen, Wachel, Pawel

arXiv.org Artificial IntelligenceMay-14-2025

Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.

artificial intelligence, colsafe, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.08026

Country:

Europe > Germany (0.46)
Europe > Sweden (0.29)

Genre: Personal (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits

Iwazaki, Shogo

arXiv.org Artificial IntelligenceFeb-26-2025

We study the noise-free Gaussian Process (GP) bandits problem, in which the learner seeks to minimize regret through noise-free observations of the black-box objective function lying on the known reproducing kernel Hilbert space (RKHS). Gaussian process upper confidence bound (GP-UCB) is the well-known GP-bandits algorithm whose query points are adaptively chosen based on the GP-based upper confidence bound score. Although several existing works have reported the practical success of GP-UCB, the current theoretical results indicate its suboptimal performance. However, GP-UCB tends to perform well empirically compared with other nearly optimal noise-free algorithms that rely on a non-adaptive sampling scheme of query points. This paper resolves this gap between theoretical and empirical performance by showing the nearly optimal regret upper bound of noise-free GP-UCB. Specifically, our analysis shows the first constant cumulative regret in the noise-free settings for the squared exponential kernel and Mat\'ern kernel with some degree of smoothness.

algorithm, ln null, mat ern, (14 more...)

arXiv.org Artificial Intelligence

2502.19006

Genre: Research Report (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

Mitigating Popularity Bias in Collaborative Filtering through Fair Sampling

Liu, Jiahao, Li, Dongsheng, Gu, Hansu, Zhang, Peng, Lu, Tun, Shang, Li, Gu, Ning

arXiv.org Artificial IntelligenceFeb-19-2025

Recommender systems often suffer from popularity bias, where frequently interacted items are overrepresented in recommendations. This bias stems from propensity factors influencing training data, leading to imbalanced exposure. In this paper, we introduce a Fair Sampling (FS) approach to address this issue by ensuring that both users and items are selected with equal probability as positive and negative instances. Unlike traditional inverse propensity score (IPS) methods, FS does not require propensity estimation, eliminating errors associated with inaccurate calculations. Our theoretical analysis demonstrates that FS effectively neutralizes the influence of propensity factors, achieving unbiased learning. Experimental results validate that FS outperforms state-of-the-art methods in both point-wise and pair-wise recommendation tasks, enhancing recommendation fairness without sacrificing accuracy. The implementation is available at https://anonymous.4open.science/r/Fair-Sampling.

loss function, popularity bias, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2502.1384

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Variational Bayesian inference of hidden stochastic processes with unknown parameters

Atitey, Komlan, Loskot, Pavel, Mihaylova, Lyudmila

arXiv.org Machine LearningNov-2-2019

Estimating hidden processes from non-linear noisy observations is particularly difficult when the parameters of these processes are not known. This paper adopts a machine learning approach to devise variational Bayesian inference for such scenarios. In particular, a random process generated by the autoregressive moving average (ARMA) linear model is inferred from non-linearity noise observations. The posterior distribution of hidden states are approximated by a set of weighted particles generated by the sequential Monte carlo (SMC) algorithm involving sampling with importance sampling resampling (SISR). Numerical efficiency and estimation accuracy of the proposed inference method are evaluated by computer simulations. Furthermore, the proposed inference method is demonstrated on a practical problem of estimating the missing values in the gene expression time series assuming vector autoregressive (VAR) data model.

bayesian inference, nullnull null null, variational bayesian inference, (15 more...)

arXiv.org Machine Learning

1911.00757

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback