AITopics | stt

Collaborating Authors

stt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Spatiotemporal Tubes for Temporal Reach-Avoid-Stay Tasks using Physics-Informed Neural Networks

Basu, Ahan, Das, Ratnangshu, Jagtap, Pushpak

arXiv.org Artificial IntelligenceDec-10-2025

This paper presents a Spatiotemporal Tube (STT)-based control framework for general control-affine MIMO nonlinear pure-feedback systems with unknown dynamics to satisfy prescribed time reach-avoid-stay tasks under external disturbances. The STT is defined as a time-varying ball, whose center and radius are jointly approximated by a Physics-Informed Neural Network (PINN). The constraints governing the STT are first formulated as loss functions of the PINN, and a training algorithm is proposed to minimize the overall violation. The PINN being trained on certain collocation points, we propose a Lipschitz-based validity condition to formally verify that the learned PINN satisfies the conditions over the continuous time horizon. Building on the learned STT representation, an approximation-free closed-form controller is defined to guarantee satisfaction of the T-RAS specification. Finally, the effectiveness and scalability of the framework are validated through two case studies involving a mobile robot and an aerial vehicle navigating through cluttered environments.

artificial intelligence, machine learning, specification, (17 more...)

arXiv.org Artificial Intelligence

2512.08248

Country: Asia > India (0.29)

Genre: Research Report (0.40)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Real-Time Spatiotemporal Tubes for Dynamic Unsafe Sets

Das, Ratnangshu, Upadhyay, Siddhartha, Jagtap, Pushpak

arXiv.org Artificial IntelligenceDec-9-2025

This paper presents a real-time control framework for nonlinear pure-feedback systems with unknown dynamics to satisfy reach-avoid-stay tasks within a prescribed time in dynamic environments. To achieve this, we introduce a real-time spatiotemporal tube (STT) framework. An STT is defined as a time-varying ball in the state space whose center and radius adapt online using only real-time sensory input. A closed-form, approximation-free control law is then derived to constrain the system output within the STT, ensuring safety and task satisfaction. We provide formal guarantees for obstacle avoidance and on-time task completion. The effectiveness and scalability of the framework are demonstrated through simulations and hardware experiments on a mobile robot and an aerial vehicle, navigating in cluttered dynamic environments.

artificial intelligence, obstacle, real time system, (19 more...)

arXiv.org Artificial Intelligence

2512.06151

Country: Asia > India (0.29)

Genre: Research Report (0.50)

Industry: Government (0.46)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.49)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Add feedback

Spatiotemporal Tubes for Differential Drive Robots with Model Uncertainty

Das, Ratnangshu, Basu, Ahan, Verginis, Christos, Jagtap, Pushpak

arXiv.org Artificial IntelligenceDec-8-2025

This paper presents a Spatiotemporal Tube (STT)-based control framework for differential-drive mobile robots with dynamic uncertainties and external disturbances, guaranteeing the satisfaction of Temporal Reach-Avoid-Stay (T-RAS) specifications. The approach employs circular STT, characterized by smoothly time-varying center and radius, to define dynamic safe corridors that guide the robot from the start region to the goal while avoiding obstacles. In particular, we first develop a sampling-based synthesis algorithm to construct a feasible STT that satisfies the prescribed timing and safety constraints with formal guarantees. To ensure that the robot remains confined within this tube, we then design analytically a closed-form, approximation-free control law. The resulting controller is computationally efficient, robust to disturbances and {model uncertainties}, and requires no model approximations or online optimization. The proposed framework is validated through simulation studies on a differential-drive robot and benchmarked against state-of-the-art methods, demonstrating superior robustness, accuracy, and computational efficiency.

artificial intelligence, robot, specification, (17 more...)

arXiv.org Artificial Intelligence

2512.05495

Country: Asia > India (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Wieser, Frederico, Benfeghoul, Martin, Ammar, Haitham Bou, Wang, Jun, Fountas, Zafeirios

arXiv.org Artificial IntelligenceNov-27-2025

The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), two distinct architectures that leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute within decoder-only TFs. SDT augments a decoder-only stack with alternating Decision and Dynamic layers: a Decision layer computes a full block 'posterior' and a lightweight 'prior,' while a Dynamic layer employs fixed-capacity Top-K routing based on Bayesian surprise (Expected and Unexpected Change), maintaining a static compute graph. STT extends this conditional computation to the temporal domain: a transition network predicts residual updates, forming a temporal 'change hypothesis' that informs a router to dynamically execute or bypass TF blocks for each token, managing KV-cache contributions. Both architectures exhibit the predicted shift from novelty to prediction driven gating over training, suggesting alignment with surprise based principles. While operating at reduced capacity, they offer preliminary insights into the compute-accuracy trade-offs of conditional computation. The proposed architectures establish a flexible framework for efficiency, reducing self-attention computation by 75% and KV-cache requirements by 50% within each compute skipping layer, setting a pathway for more efficient models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.21408

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SNAP: Low-Latency Test-Time Adaptation with Sparse Updates

Cha, Hyeongheon, Kim, Dong Min, Chung, Hye Won, Gong, Taesik, Lee, Sung-Ju

arXiv.org Artificial IntelligenceNov-20-2025

Test-Time Adaptation (TTA) adjusts models using unlabeled test data to handle dynamic distribution shifts. However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. SNAP maintains competitive accuracy even when adapting based on only 1% of the incoming data stream, demonstrating its robustness under infrequent updates. Our method introduces two key components: (i) Class and Domain Representative Memory (CnDRM), which identifies and stores a small set of samples that are representative of both class and domain characteristics to support efficient adaptation with limited data; and (ii) Inference-only Batch-aware Memory Normalization (IoBMN), which dynamically adjusts normalization statistics at inference time by leveraging these representative samples, enabling efficient alignment to shifting target domains. Integrated with five state-of-the-art TTA algorithms, SNAP reduces latency by up to 93.12%, while keeping the accuracy drop below 3.3%, even across adaptation rates ranging from 1% to 50%. This demonstrates its strong potential for practical use on edge devices serving latency-sensitive applications. The source code is available at https://github.com/chahh9808/SNAP.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.15276

Country: Europe > Switzerland (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Hardware (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(2 more...)

Add feedback

Incorporating Social Awareness into Control of Unknown Multi-Agent Systems: A Real-Time Spatiotemporal Tubes Approach

Upadhyay, Siddhartha, Das, Ratnangshu, Jagtap, Pushpak

arXiv.org Artificial IntelligenceOct-30-2025

This paper presents a decentralized control framework that incorporates social awareness into multi-agent systems with unknown dynamics to achieve prescribed-time reach-avoid-stay tasks in dynamic environments. Each agent is assigned a social awareness index that quantifies its level of cooperation or self-interest, allowing heterogeneous social behaviors within the system. Building on the spatiotemporal tube (STT) framework, we propose a real-time STT framework that synthesizes tubes online for each agent while capturing its social interactions with others. A closed-form, approximation-free control law is derived to ensure that each agent remains within its evolving STT, thereby avoiding dynamic obstacles while also preventing inter-agent collisions in a socially aware manner, and reaching the target within a prescribed time. The proposed approach provides formal guarantees on safety and timing, and is computationally lightweight, model-free, and robust to unknown disturbances. The effectiveness and scalability of the framework are validated through simulation and hardware experiments on a 2D omnidirectional

agent, artificial intelligence, ctober 30, (14 more...)

arXiv.org Artificial Intelligence

2510.25597

Country: Asia > India (0.28)

Genre: Research Report (0.40)

Industry: Transportation (0.94)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control

Upadhyay, Siddhartha, Das, Ratnangshu, Jagtap, Pushpak

arXiv.org Artificial IntelligenceOct-14-2025

In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.

artificial intelligence, control effort, specification, (17 more...)

arXiv.org Artificial Intelligence

2510.11583

Country: Asia > India (0.29)

Genre: Research Report (0.50)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Which one Performs Better? Wav2Vec or Whisper? Applying both in Badini Kurdish Speech to Text (BKSTT)

Adnan, Renas, Hassani, Hossein

arXiv.org Artificial IntelligenceAug-14-2025

Speech-to-text (STT) systems have a wide range of applications. They are available in many languages, albeit at different quality levels. Although Kurdish is considered a less-resourced language from a processing perspective, SST is available for some of the Kurdish dialects, for instance, Sorani (Central Kurdish). However, that is not applied to other Kurdish dialects, Badini and Hawrami, for example. This research is an attempt to address this gap. Bandin, approximately, has two million speakers, and STT systems can help their community use mobile and computer-based technologies while giving their dialect more global visibility. We aim to create a language model based on Badini's speech and evaluate its performance. To cover a conversational aspect, have a proper confidence level of grammatical accuracy, and ready transcriptions, we chose Badini kids' stories, eight books including 78 stories, as the textual input. Six narrators narrated the books, which resulted in approximately 17 hours of recording. We cleaned, segmented, and tokenized the input. The preprocessing produced nearly 15 hours of speech, including 19193 segments and 25221 words. We used Wav2Vec2-Large-XLSR-53 and Whisper-small to develop the language models. The experiments indicate that the transcriptions process based on the Wav2Vec2-Large-XLSR-53 model provides a significantly more accurate and readable output than the Whisper-small model, with 90.38% and 65.45% readability, and 82.67% and 53.17% accuracy, respectively.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.09957

Country: Asia > Middle East > Iraq > Kurdistan Region (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Advancing STT for Low-Resource Real-World Speech

D'Intino, Flavio, Hutter, Hans-Peter

arXiv.org Artificial IntelligenceJun-11-2025

Swiss German is a low-resource language represented by diverse dialects that differ significantly from Standard German and from each other, lacking a standardized written form. As a result, transcribing Swiss German involves translating into Standard German. Existing datasets have been collected in controlled environments, yielding effective speech-to-text (STT) models, but these models struggle with spontaneous conversational speech. This paper, therefore, introduces the new SRB-300 dataset, a 300-hour annotated speech corpus featuring real-world long-audio recordings from 39 Swiss German radio and TV stations. It captures spontaneous speech across all major Swiss dialects recorded in various realistic environments and overcomes the limitation of prior sentence-level corpora. We fine-tuned multiple OpenAI Whisper models on the SRB-300 dataset, achieving notable enhancements over previous zero-shot performance metrics. Improvements in word error rate (WER) ranged from 19% to 33%, while BLEU scores increased between 8% and 40%. The best fine-tuned model, large-v3, achieved a WER of 17.1% and a BLEU score of 74.8. This advancement is crucial for developing effective and robust STT systems for Swiss German and other low-resource languages in real-world contexts.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-93429-2_20

2506.08836

Country:

Europe (1.00)
North America > United States (0.93)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge

Chan, Chi-Min, Xu, Chunpu, Ji, Jiaming, Ye, Zhen, Wen, Pengcheng, Jiang, Chunyang, Yang, Yaodong, Xue, Wei, Han, Sirui, Guo, Yike

arXiv.org Artificial IntelligenceMay-20-2025

The current focus of AI research is shifting from emphasizing model training towards enhancing evaluation quality, a transition that is crucial for driving further advancements in AI systems. Traditional evaluation methods typically rely on reward models assigning scalar preference scores to outputs. Although effective, such approaches lack interpretability, leaving users often uncertain about why a reward model rates a particular response as high or low. The advent of LLM-as-a-Judge provides a more scalable and interpretable method of supervision, offering insights into the decision-making process. Moreover, with the emergence of large reasoning models, which consume more tokens for deeper thinking and answer refinement, scaling test-time computation in the LLM-as-a-Judge paradigm presents an avenue for further boosting performance and providing more interpretability through reasoning traces. In this paper, we introduce $\textbf{J1-7B}$, which is first supervised fine-tuned on reflection-enhanced datasets collected via rejection-sampling and subsequently trained using Reinforcement Learning (RL) with verifiable rewards. At inference time, we apply Simple Test-Time Scaling (STTS) strategies for additional performance improvement. Experimental results demonstrate that $\textbf{J1-7B}$ surpasses the previous state-of-the-art LLM-as-a-Judge by $ \textbf{4.8}$\% and exhibits a $ \textbf{5.1}$\% stronger scaling trend under STTS. Additionally, we present three key findings: (1) Existing LLM-as-a-Judge does not inherently exhibit such scaling trend. (2) Model simply fine-tuned on reflection-enhanced datasets continues to demonstrate similarly weak scaling behavior. (3) Significant scaling trend emerges primarily during the RL phase, suggesting that effective STTS capability is acquired predominantly through RL training.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.11875

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback