AITopics | config

Collaborating Authors

config

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A mathematical framework for time-delay reservoir computing analysis

Clabaut, Anh-Tuan, Auriol, Jean, Boussaada, Islam, Mazanti, Guilherme

arXiv.org Machine LearningMar-20-2026

Reservoir computing is a well-established approach for processing data with a much lower complexity compared to traditional neural networks. Despite two decades of experimental progress, the core properties of reservoir computing (namely separation, robustness, and fading memory) still lack rigorous mathematical foundations. This paper addresses this gap by providing a control-theoretic framework for the analysis of time-delay-based reservoir computers. We introduce formal definitions of the separation property and fading memory in terms of functional norms, and establish their connection to well-known stability notions for time-delay systems as incremental input-to-state stability. For a class of linear reservoirs, we derive an explicit lower bound for the separation distance via Fourier analysis, offering a computable criterion for reservoir design. Numerical results on the NARMA10 benchmark and continuous-time system prediction validate the approach with a minimal digital implementation.

artificial intelligence, machine learning, reservoir, (15 more...)

arXiv.org Machine Learning

2603.18706

Country:

Europe > France (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Neural Information Processing SystemsFeb-18-2026, 18:01:34 GMT

This renders them impractical to use in a wide range of use-cases, limiting who can benefit from them to a handful of big corporations. As an attempt to mitigate this issue, Touvron et al.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

e8258e5140317ff36c7f8225a3bf9590-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 16:41:22 GMT

The original MuZero did not use sticky actions (Machado et al., 2017) (a 25% chance that the selected action is ignored and that instead the previous action is repeated) for Atari experiments. For all experiments in this work we used a network architecture based on the one introduced by MuZero(Schrittwieser etal.,2020), To implement the network, we used the modules provided by the Haiku neural network library (Henniganetal.,2020). We did not observe any benefit from using a Gaussian mixture, so instead inallourexperiments weusedasingle Gaussian withdiagonal covariance. All experiments used the Adam optimiser (Kingma & Ba, 2015) with decoupled weight decay (Loshchilov & Hutter, 2017) for training.

artificial intelligence, config, machine learning, (18 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction

Neural Information Processing SystemsFeb-8-2026, 22:29:33 GMT

UAD has a wide range of applications, e.g., industrial

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 15:35:17 GMT

Thus,theassumption of the policy being conditionally-independent ofzω givenziα corresponds well to the assumption of agents only using local information (rather than joint information) in MARL to inform their policy/decision-making. Note that we found that cyclically-annealing [82]theβ term in our variational lower bound from0to the values specified in Table 5to help avoid KL-vanishing. A.2.4 ComputationalDetails For MARL trajectory data generation, we used an internal CPU cluster for both the 3-agent hillclimbing and 2-agent coordination domains, using TPUs for only the multiagent MuJoCo data generation. Given a characteristic of interest (e.g., the level of dispersion of agents), we define a training set consisting of joint latentszω and class labelsy (e.g., classes corresponding to different intervals of team returns). Using these definitions, we can gauge the representational power ofzω by learning a mapping g: ˆνc(zω) y. In practice, g is a simple model (e.g., shallow network or linear projection) so as to gauge the expressivity of the latent space.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Mixture of Lookup Key-Value Experts

Wang, Zongcheng

arXiv.org Artificial IntelligenceDec-11-2025

Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the communication overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoLKV}) model. In MoLKV, each expert is structured as a key-value pair. For a given input, the input-derived query interacts with the cached key-value experts from the current sequence, generating a context-aware expert output. This context-aware mechanism alleviates the limitation of MoLE, and experimental results demonstrate that MoLKV achieves significantly lower validation loss in small-scale evaluations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.09723

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

Lopez-Piqueres, Javier, Deshpande, Pranav, Ray, Archan, Villani, Mattia J., Pistoia, Marco, Kumar, Niraj

arXiv.org Artificial IntelligenceNov-18-2025

We present MetaTT, a Tensor Train (TT) adapter framework for fine-tuning of pre-trained transformers. MetaTT enables flexible and parameter-efficient model adaptation by using a single shared TT to factorize transformer sub-modules. This factorization indexes key structural dimensions, including layer and matrix type, and can optionally incorporate heads and tasks. This design allows MetaTT's parameter count to scale with the sum, rather than the product, of the modes, resulting in a substantially more compact adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning methods. We observe that when tested on single-task standard language modeling benchmarks, MetaTT achieves competitive parameter efficiency to accuracy tradeoff. We further demonstrate that MetaTT performs competitively when compared to state-of-the-art methods on multi-task learning. Finally, we leverage the TT-ansatz to design a rank adaptive optimizer inspired by the DMRG method from many-body physics. Our results demonstrate that integrating this approach with AdamW enhances optimization performance for a specified target rank.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.09105

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.86)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

DeepCausalMMM: A Deep Learning Framework for Marketing Mix Modeling with Causal Inference

Tirumala, Aditya Puttaparthi

arXiv.org Machine LearningOct-24-2025

Marketing Mix Modeling (MMM) is a statistical technique used to estimate the impact of marketing activities on business outcomes such as sales, revenue, or customer visits. Traditional MMM approaches often rely on linear regression or Bayesian hierarchical models that assume independence between marketing channels and struggle to capture complex temporal dynamics and non-linear saturation effects [@Chan2017; @Hanssens2005; @Ng2021Bayesian]. **DeepCausalMMM** is a Python package that addresses these limitations by combining deep learning, causal inference, and advanced marketing science. The package uses Gated Recurrent Units (GRUs) to automatically learn temporal patterns such as adstock (carryover effects) and lag, while simultaneously learning statistical dependencies and potential causal structures between marketing channels through Directed Acyclic Graph (DAG) learning [@Zheng2018NOTEARS; @Gong2024CausalMMM]. Additionally, it implements Hill equation-based saturation curves to model diminishing returns and optimize budget allocation. Key features include: (1) a data-driven design where hyperparameters and transformations (e.g., adstock decay, saturation curves) are learned or estimated from data with sensible defaults, rather than requiring fixed heuristics or manual specification, (2) multi-region modeling with both shared and region-specific parameters, (3) robust statistical methods including Huber loss and advanced regularization, (4) comprehensive response curve analysis for understanding channel saturation.

artificial intelligence, machine learning, mix modeling, (15 more...)

arXiv.org Machine Learning

2510.13087

Genre: Research Report (0.51)

Industry: Marketing (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

config

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A mathematical framework for time-delay reservoir computing analysis

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

c8e1620b29d546c2999a9339ab29aa82-Supplemental-Conference.pdf

e8258e5140317ff36c7f8225a3bf9590-Supplemental.pdf

ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction

4eb7d41ae6005f60fe401e56277ebd4e-Supplemental.pdf

1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

Mixture of Lookup Key-Value Experts

MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

DeepCausalMMM: A Deep Learning Framework for Marketing Mix Modeling with Causal Inference