AITopics | dst

Collaborating Authors

dst

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

Fang, Yu-Hsueh, Lee, Chia-Yen

arXiv.org Machine LearningMay-4-2026

Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal Inference (ACI)) suffer from systemic marginal under-coverage and high interval variance during abrupt shifts, while temporally discounted Bayesian CP suffers from severe structural lag and uncalibrated interval bloat. We propose State-Adaptive Bayesian Conformal Prediction (SA-BCP) to achieve optimal spatio-temporal decoupling. By gating long-term temporal inertia with spatial kernel-density evidence, SA-BCP proactively expands intervals for recognized historical regimes while maintaining tight efficiency during stable states. We rigorously prove this mechanism's optimality, identifying a minimax bias-variance tradeoff governed by an evidence threshold $K$. Extensive benchmarks on volatile financial datasets (2016--2026), including AMD, Gold, and GBP/USD, demonstrate that SA-BCP consistently minimizes the strictly proper Winkler score across diverse confidence levels. Specifically, SA-BCP resolves the systematic under-coverage inherent to ACI variants while simultaneously reducing the uncalibrated interval bloat of Bayesian CP by 10\% to 37\% under high-confidence requests. By elegantly navigating this tradeoff, SA-BCP achieves an optimal balance between conditional reliability and predictive efficiency.

artificial intelligence, machine learning, sa-bcp, (19 more...)

arXiv.org Machine Learning

2605.00432

Country: Asia > Taiwan (0.14)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Neural Information Processing SystemsMar-22-2026, 15:04:11 GMT

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a much more memory efficient training process,as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, mostimplementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classificationwith large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates,the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especiallyat the largest label spaces. We find that the gradients fed back from the classifier into the text encoder make itmuch more difficult to learn good input representations, despite using a dense encoder.By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability of DST in a challenging domain, characterized by a highly skewed label distribution, that lies outside of DST's typical benchmark datasets, and enable end-to-end training with millions of labels on commodity hardware.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Debiased Self-Training for Semi-Supervised Learning

Neural Information Processing SystemsFeb-12-2026, 01:46:47 GMT

Despite its popularity, self-training is well-believed to be unreliable and often leads to training instability.

artificial intelligence, machine learning, pseudo label, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)

Add feedback

f5f3b8d720f34ebebceb7765e447268b-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:37:26 GMT

For all k N+ and k dst s,ϕ 1(g) = n, let g = ϕ(sk), and let τ = (s0,s1,s2,,sk) be the k-step sub-trajectory ofτ from s0 to sk. Using the triangle inequality, we can prove that the sub-trajectoryτ = (s0,s1,s2,,sk) is also a shortest trajectory froms0 = s to sk: assume that this is not true and there exists a shorter trajectory froms0 tosk. Using Theorem 1, we have that for each subgoalgkt, t = 0,1,,T 1, there exists a subgoal gkt GA(skt,k) that can induce the same low-levelk-step action sequence asgkt. When the temporal distance between twostates inonetrajectory isnotlargerthank,then thecorresponding element in the adjacency matrix will be labeled to 1, indicating the adjacency. The main differences between our method and theirs are: 1) We use trajectories sampled by multiple policies to construct training samples, while theyonly use trajectories sampled by one specific policy; 2) Weuse an adjacency matrix to explicitly aggregate the adjacency information and sample training pairs based on the adjacency matrix, while they directly sample training pairs from trajectories.

adjacency matrix, artificial intelligence, trajectory, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.95)

Add feedback

GeneratingAdjacency-ConstrainedSubgoalsin HierarchicalReinforcementLearning

Neural Information Processing SystemsFeb-11-2026, 03:37:13 GMT

Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach forscaling upreinforcement learning (RL)techniques.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

The U.S. tried permanent daylight saving time--and hated it

The U.S. tried permanent daylight saving time--and hated it In 1974, America set its clocks forward for good in the name of energy savings. Between January and September in 1974, President Richard Nixon made daylight saving time permanent for a brief period. Breakthroughs, discoveries, and DIY tips sent every weekday. As fall approaches, so too does the end of daylight savings time (DST). On November 2nd, the hour between 1 a.m. and 2 a.m. will happen twice.

dst, experiment, permanent daylight, (16 more...)

Popular Science

Country:

Europe > Germany (0.05)
Europe > United Kingdom (0.05)
North America > United States > Alaska (0.05)
(3 more...)

Genre: Research Report > New Finding (0.35)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Bhatia, Gagan, Sripada, Somayajulu G, Allan, Kevin, Azcona, Jacobo

arXiv.org Artificial IntelligenceOct-10-2025

Large Language Models (LLMs) are prone to hallucination, the generation of plausible yet factually incorrect statements. This work investigates the intrinsic, architectural origins of this failure mode through three primary contributions. First, to enable the reliable tracing of internal semantic failures, we propose Distributional Semantics Tracing (DST), a unified framework that integrates established interpretability techniques to produce a causal map of a model's reasoning, treating meaning as a function of context (distributional semantics). Second, we pinpoint the model's layer at which a hallucination becomes inevitable, identifying a specific commitment layer where a model's internal representations irreversibly diverge from factuality. Third, we identify the underlying mechanism for these failures. We observe a conflict between distinct computational pathways, which we interpret using the lens of dual-process theory: a fast, heuristic associative pathway (akin to System 1) and a slow, deliberate, contextual pathway (akin to System 2), leading to predictable failure modes such as Reasoning Shortcut Hijacks. Our framework's ability to quantify the coherence of the contextual pathway reveals a strong negative correlation ($ρ= -0.863$) with hallucination rates, implying that these failures are predictable consequences of internal semantic weakness. The result is a mechanistic account of how, when, and why hallucinations occur within the Transformer architecture.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.06107

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hybrid Dialogue State Tracking for Persian Chatbots: A Language Model-Based Approach

Aghabagher, Samin Mahdipour, Momtazi, Saeedeh

arXiv.org Artificial IntelligenceOct-2-2025

Dialogue State Tracking (DST) is an essential element of conversational AI with the objective of deeply understanding the conversation context and leading it toward answering user requests. Due to high demands for open-domain and multi-turn chatbots, the traditional rule-based DST is not efficient enough, since it cannot provide the required adaptability and coherence for human-like experiences in complex conversations. This study proposes a hybrid DST model that utilizes rule-based methods along with language models, including BERT for slot filling and intent detection, XGBoost for intent validation, GPT for DST, and online agents for real-time answer generation. This model is uniquely designed to be evaluated on a comprehensive Persian multi-turn dialogue dataset and demonstrated significantly improved accuracy and coherence over existing methods in Persian-based chatbots. The results demonstrate how effectively a hybrid approach may improve DST capabilities, paving the way for conversational AI systems that are more customized, adaptable, and human-like.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.01052

Country:

Europe (0.46)
North America > Canada (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis

Sohrabbeig, Amirhossein, Ardakanian, Omid, Musilek, Petr

arXiv.org Artificial IntelligenceAug-28-2025

Long-term forecasting of multivariate urban data poses a significant challenge due to the complex spatiotemporal dependencies inherent in such datasets. This paper presents DST, a novel multivariate time-series forecasting model that integrates graph attention and temporal convolution within a Graph Neural Network (GNN) to effectively capture spatial and temporal dependencies, respectively. To enhance model performance, we apply a decomposition-based preprocessing step that isolates trend, seasonal, and residual components of the time series, enabling the learning of distinct graph structures for different time-series components. Extensive experiments on real-world urban datasets, including electricity demand, weather metrics, carbon intensity, and air pollution, demonstrate the effectiveness of DST across a range of forecast horizons, from several days to one month. Specifically, our approach achieves an average improvement of 2.89% to 9.10% in long-term forecasting accuracy over state-of-the-art time-series forecasting models.

data mining, forecasting, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.22474

Country: North America > Canada > Alberta (0.28)

Genre: