AITopics | dtype

Collaborating Authors

dtype

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Neural Information Processing SystemsFeb-18-2026, 03:01:15 GMT

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

5b4e9aa703d0bfa11041debaa2d1b633-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 20:44:19 GMT

For the results shown in Figure 5 (bottom), the function codes and signatures were trained for 10 epochs on 8192 samples with Shampoo [25]. We again used 6 random seeds, and for each set of trainable parameters, we grid-searched the learning rate.

artificial intelligence, machine learning, validation performance, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering

Xie, Tianxin, Yang, Shan, Li, Chenxing, Yu, Dong, Liu, Li

arXiv.org Artificial IntelligenceOct-28-2025

Text-to-speech (TTS) has shown great progress in recent years. However, most existing TTS systems offer only coarse and rigid emotion control, typically via discrete emotion labels or a carefully crafted and detailed emotional text prompt, making fine-grained emotion manipulation either inaccessible or unstable. These models also require extensive, high-quality datasets for training. To address these limitations, we propose EmoSteer-TTS, a novel training-free approach, to achieve fine-grained speech emotion control (conversion, interpolation, erasure) by activation steering. We first empirically observe that modifying a subset of the internal activations within a flow matching-based TTS model can effectively alter the emotional tone of synthesized speech. Building on this insight, we then develop a training-free and efficient algorithm, including activation extraction, emotional token searching, and inference-time steering, which can be seamlessly integrated into a wide range of pretrained models (e.g., F5-TTS, CosyVoice2, and E2-TTS). In addition, to derive effective steering vectors, we construct a curated emotional speech dataset with diverse speakers. Extensive experiments demonstrate that EmoSteer-TTS enables fine-grained, interpretable, and continuous control over speech emotion, outperforming the state-of-the-art (SOTA). To the best of our knowledge, this is the first method that achieves training-free and continuous fine-grained emotion control in TTS. Demo samples are available at https://emosteer-tts-demo.pages.dev/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.03543

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
(2 more...)

Add feedback

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Neural Information Processing SystemsOct-10-2025, 16:29:36 GMT

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Towards a Large Physics Benchmark

Barman, Kristian G., Caron, Sascha, Hasibi, Faegheh, Shalugin, Eugene, Marcet, Yoris, Otte, Johannes, de Regt, Henk W., Moody, Merijn

arXiv.org Artificial IntelligenceJul-30-2025

We introduce a benchmark framework developed by and for the scientific community to evaluate, monitor and steer large language model development in fundamental physics. Building on philosophical concepts of scientific understanding and creativity, we develop a scoring system in which each question is scored by an expert for its correctness, difficulty, and surprise. The questions are of three forms: (i) multiple-choice questions for conceptual understanding, (ii) analytical problems requiring mathematical derivation, and (iii) openended tasks requiring complex problem solving. Our current dataset contains diverse set of examples, including a machine learning challenge to classify high-energy physics events, such as the four top quark signal. To ensure continued relevance, we propose a living benchmark, where physicists contribute questions, for instance alongside new publications. We invite contributions via: http://www.physicsbenchmarks.org/. We hope that this benchmark will enable a targeted AI development that can make a meaningful contribution to fundamental physics research.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.21695

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast and Simplex: 2-Simplicial Attention in Triton

Roy, Aurko, Chou, Timothy, Duvvuri, Sai Surya, Chen, Sijia, Yu, Jiecao, Wang, Xiaodong, Zaheer, Manzil, Anil, Rohan

arXiv.org Artificial IntelligenceJul-4-2025

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is becoming less valid. This shift highlights the need for architectures that prioritize token efficiency. In this work, we investigate the use of the 2-simplicial Transformer, an architecture that generalizes standard dot-product attention to trilinear functions through an efficient Triton kernel implementation. We demonstrate that the 2-simplicial Transformer achieves better token efficiency than standard Transformers: for a fixed token budget, similarly sized models outperform their dot-product counterparts on tasks involving mathematics, coding, reasoning, and logic. We quantify these gains by demonstrating that $2$-simplicial attention changes the exponent in the scaling laws for knowledge and reasoning tasks compared to dot product attention.

dtype, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.02754

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > Menlo Park (0.05)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Ma, Yan, Du, Linge, Shen, Xuyang, Chen, Shaoxiang, Li, Pengfei, Ren, Qibing, Ma, Lizhuang, Dai, Yuchao, Liu, Pengfei, Yan, Junjie

arXiv.org Artificial IntelligenceJun-3-2025

Reinforcement learning (RL) has significantly advanced the reasoning capabilities of vision-language models (VLMs). However, the use of RL beyond reasoning tasks remains largely unexplored, especially for perceptionintensive tasks like object detection and grounding. We propose V-Triune, a Visual Triple Unified Reinforcement Learning system that enables VLMs to jointly learn visual reasoning and perception tasks within a single training pipeline. V-Triune comprises triple complementary components: Sample-Level Data Formatting (to unify diverse task inputs), Verifier-Level Reward Computation (to deliver custom rewards via specialized verifiers) , and Source-Level Metric Monitoring (to diagnose problems at the data-source level). We further introduce a novel Dynamic IoU reward, which provides adaptive, progressive, and definite feedback for perception tasks handled by V-Triune. Our approach is instantiated within off-the-shelf RL training framework using open-source 7B and 32B backbone models. The resulting model, dubbed Orsta (One RL to See Them All), demonstrates consistent improvements across both reasoning and perception tasks. This broad capability is significantly shaped by its training on a diverse dataset, constructed around four representative visual reasoning tasks (Math, Puzzle, Chart, and Science) and four visual perception tasks (Grounding, Detection, Counting, and OCR). Subsequently, Orsta achieves substantial gains on MEGA-Bench Core, with improvements ranging from +2.1 to an impressive +14.1 across its various 7B and 32B model variants, with performance benefits extending to a wide range of downstream tasks. These results highlight the effectiveness and scalability of our unified RL approach for VLMs. The V-Triune system, along with the Orsta models, is publicly available at https://github.com/MiniMax-AI.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.18129

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

TileLang: A Composable Tiled Programming Model for AI Systems

Wang, Lei, Cheng, Yu, Shi, Yining, Tang, Zhengju, Mo, Zhiwen, Xie, Wenhao, Ma, Lingxiao, Xia, Yuqing, Xue, Jilong, Yang, Fan, Yang, Zhi

arXiv.org Artificial IntelligenceApr-29-2025

Modern AI workloads rely heavily on optimized computing kernels for both training and inference. These AI kernels follow well-defined data-flow patterns, such as moving tiles between DRAM and SRAM and performing a sequence of computations on those tiles. However, writing high-performance kernels remains complex despite the clarity of these patterns. Achieving peak performance requires careful, hardware-centric optimizations to fully leverage modern accelerators. While domain-specific compilers attempt to reduce the burden of writing high-performance kernels, they often struggle with usability and expressiveness gaps. In this paper, we present TileLang, a generalized tiled programming model for more efficient AI Kernel programming. TileLang decouples scheduling space (thread binding, layout, tensorize and pipeline) from dataflow, and encapsulated them as a set of customization annotations and primitives. This approach allows users to focus on the kernel's data-flow itself, while leaving most other optimizations to compilers. We conduct comprehensive experiments on commonly-used devices, across numerous experiments, our evaluation shows that TileLang can achieve state-of-the-art performance in key kernels, demonstrating that its unified block-and-thread paradigm and transparent scheduling capabilities deliver both the power and flexibility demanded by modern AI system development.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.17577

Country: