AITopics | time 0

Collaborating Authors

time 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix AVariational Paragraph Embedder A.1 Selection of substitution rate p

Neural Information Processing SystemsApr-30-2026, 10:10:09 GMT

Figure 4: Impact of the proportion of injected noise for learning Paragraph Embeddings on XSum dataset. PPLint and the PPL of the generation obtained from training PLANNER on the corresponding z at different noise level. We observed when the value of p is within (0, 0.7), there Performing a grid search on each task using diffusion models is an expensive process. However, it has been observed that an increase in the value of p leads to a deviation between the two. This could be attributed to a higher conversion error that occurs when p is excessively large. A.2 Selection of number of latent code k The parameter k determines the number of latent codes used to represent a paragraph and therefore controls the compression level. Latent codes with smaller values of k are easier to model using the diffusion model, but may struggle to accurately preserve all the information in the original text. Additionally, smaller values of k offer computational efficiency as the sequence length for the diffusion model is k. To determine the best set of latent codes, we conducted experiments using three different methods: 1) selecting the first k hidden vectors, 2) selecting the last k hidden vectors, and 3) selecting interleaving hidden vectors, one for every L k hidden vectors. The results of the ablation study are presented in Table 5. Based on our findings, we observed no significant difference among the different choices, so we opted for option 1). Furthermore, we discovered that increasing the value of k does not lead to a dramatic improvement in performance. To balance between efficiency and performance, in most of our study we only use k =16 Setup BLEU_clean BLEU_robust First k (k=16) 79.59 43.17 A.3 Reconstruction, denoising and interpolation examples In Table 6, we present examples that demonstrate the adeptness of the trained Variational Paragraph Embedder in providing clean and denoised reconstructions. Additionally, we showcase interpolation results (Table 7, 8) derived from two random sentences in the hotel review dataset. The interpolated paragraph is usually coherent and incorporates inputs from both sentences, characterizing the distributional smoothness of the latent space. Reconstructed text complaints: after two nights stay, i asked the maid to clean our room (empty the wastebasket & make the bed). Denoising reconstruction (hotel review), noise level 0.3 Original text * * * check out the bathroom picture * * * i was in nyc by myself to watch some friends participate in the us olympic marathon trials. Corrupted text * * [unused697] check exams the bathroom picture * * slams i was in nyc mead myself yankee 2016 some scotch ruin in the outfielder olympicnca trials.

artificial intelligence, hotel, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > United States > Maryland > Prince George's County (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Consumer Products & Services (1.00)
Health & Medicine (0.93)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix A V ariational Paragraph Embedder A.1 Selection of substitution rate p

Neural Information Processing SystemsFeb-18-2026, 03:40:38 GMT

Figure 4: Impact of the proportion of injected noise for learning Paragraph Em-beddings on XSum dataset. (Figure 4). The results of the ablation study are presented in Table 5. Embedder in providing clean and denoised reconstructions. In general, it has been observed that generations progress in a coarse-to-fine manner. The early time step, which is close to 1, tends to be less fluent and generic. This was the nicest stay we have ever had. Turtle Bay was a great resort. This was the nicest stay we have ever had.

artificial intelligence, hotel, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > Australia (0.04)
North America > United States > Virginia (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Consumer Products & Services (1.00)
Law > Criminal Law (0.93)
(7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d1b4076ae067dd23bad5ac2693547a01-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 06:30:39 GMT

artificial intelligence, machine learning, time 0, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SolvingInterpretableKernelDimensionReduction

Neural Information Processing SystemsFeb-12-2026, 09:11:12 GMT

Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable ofcapturing nonlinear relationships.

artificial intelligence, kernel, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Notes 1A special event x0 is sometimes given at time 0 to mark the beginning of the sequence; the model then generatestherestofthesequenceconditionedonx0

Neural Information Processing SystemsFeb-8-2026, 02:26:33 GMT

NHP is a thoughtfully designed framework that has been demonstrated effective on temporal data, but our method can also be used for other models with parametric intensityfunctions. In this section, we prove the claim in section 2.2 that argmaxθJLL(θ) = Θ When we take the expectation under p, each summand gets weighted by the probability that x[0,t) and x[t,t+dt) would take on the values in that summand. Therefore,wehaveG θ( t, x[0, t)) < 0since the distributions in equation (9) are distinct for the given history x[0, t). This lemma says: if θ and θ are meaningfully different in that they predict different intensities at time t for some history, then they actually do so for a set of histories of non-zero measure, making this difference visible in the objective functions like JLL(θ) (see above) and JNC(θ) (see Appendix B). We use d to denote the maximal difference between the intensities over (t0,t00), i.e., d If x[0,t) doesn't have any event, then its probability p( x[0,t)) = exp( Suppose that t1 has been shifted by R. Recall that we need order-(1dt)I many such histories.

artificial intelligence, jnnc, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Feature Learning for Interpretable, Performant Decision Trees Supplementary Material 1 Experiment Specification

Neural Information Processing SystemsOct-9-2025, 08:10:59 GMT

Here we cover the full specification of the experiments. Some details were omitted from the main text. If there were separate training and test sets, they were combined before creating the random 10-fold split. All attributes are normalized to mean 0 and standard deviation 1. Additional details for each model type follow.

artificial intelligence, machine learning, time 0, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)

Add feedback

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Fu, Tianyu, Min, Zihan, Zhang, Hanling, Yan, Jichao, Dai, Guohao, Ouyang, Wanli, Wang, Yu

arXiv.org Artificial IntelligenceOct-6-2025

Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate through text, forcing internal representations to be transformed into output token sequences. This process both loses rich semantic information and incurs token-by-token generation latency. Motivated by these limitations, we ask: Can LLMs communicate beyond text? Oracle experiments show that enriching the KV-Cache semantics can improve response quality without increasing cache size, supporting KV-Cache as an effective medium for inter-model communication. Thus, we propose Cache-to-Cache (C2C), a new paradigm for direct semantic communication between LLMs. C2C uses a neural network to project and fuse the source model's KV-cache with that of the target model to enable direct semantic transfer. A learnable gating mechanism selects the target layers that benefit from cache communication. Compared with text communication, C2C utilizes the deep, specialized semantics from both models, while avoiding explicit intermediate text generation. Experiments show that C2C achieves 8.5-10.5% higher average accuracy than individual models. It further outperforms the text communication paradigm by approximately 3.0-5.0%, while delivering an average 2.0x speedup in latency. Our code is available at https://github.com/thu-nics/C2C.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.03215

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Gyger, Tim, Furrer, Reinhard, Sigrist, Fabio

arXiv.org Machine LearningJul-8-2025

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2507.05064

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives

Bendahi, Abderrahim, Fradin, Adrien, Lerasle, Matthieu

arXiv.org Machine LearningApr-19-2025

We introduce Local Reinforcement-Based Selection of Auxiliary Objectives (LRSAO), a novel approach that selects auxiliary objectives using reinforcement learning (RL) to support the optimization process of an evolutionary algorithm (EA) as in EA+RL framework and furthermore incorporates the ability to unlearn previously used objectives. By modifying the reward mechanism to penalize moves that do no increase the fitness value and relying on the local auxiliary objectives, LRSAO dynamically adapts its selection strategy to optimize performance according to the landscape and unlearn previous objectives when necessary. We analyze and evaluate LRSAO on the black-box complexity version of the non-monotonic Jump function, with gap parameter $\ell$, where each auxiliary objective is beneficial at specific stages of optimization. The Jump function is hard to optimize for evolutionary-based algorithms and the best-known complexity for reinforcement-based selection on Jump was $O(n^2 \log(n) / \ell)$. Our approach improves over this result to achieve a complexity of $\Theta(n^2 / \ell^2 + n \log(n))$ resulting in a significant improvement, which demonstrates the efficiency and adaptability of LRSAO, highlighting its potential to outperform traditional methods in complex optimization scenarios.

evolutionary algorithm, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2504.14418

Country:

Europe > Spain > Andalusia > Málaga Province > Málaga (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (0.84)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Filters

Collaborating Authors

time 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendix AVariational Paragraph Embedder A.1 Selection of substitution rate p

Appendix A V ariational Paragraph Embedder A.1 Selection of substitution rate p

d1b4076ae067dd23bad5ac2693547a01-Supplemental-Conference.pdf

SolvingInterpretableKernelDimensionReduction

Notes 1A special event x0 is sometimes given at time 0 to mark the beginning of the sequence; the model then generatestherestofthesequenceconditionedonx0

Feature Learning for Interpretable, Performant Decision Trees Supplementary Material 1 Experiment Specification

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

61f3a6dbc9120ea78ef75544826c814e-Paper.pdf

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives