AITopics

Country: Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Neural Information Processing SystemsFeb-12-2026, 19:23:40 GMT

84b20b1f5a0d103f5710bb67a043cd78-AuthorFeedback.pdf

history, information, trueskill, (15 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Neural Information Processing SystemsFeb-10-2026, 11:14:03 GMT

SupplementaryMaterialfor TowardsEfficient3DObjectDetectionwith KnowledgeDistillation

Mimic [11] and GID-F [6]); the other is assigning a one-hot maskmfeat ormcls to calculate imitation loss on some critical regions (e.g.

artificial intelligence, cp-pillar-v0, detector, (17 more...)

Technology: Information Technology > Artificial Intelligence (0.96)

arXiv.org Artificial IntelligenceNov-18-2025

Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

Huang, Hugo

Reinforcement learning (RL) in 3D environments with high-dimensional sensory input poses two major challenges: (1) the high memory consumption induced by memory buffers required to stabilise learning, and (2) the complexity of learning in partially observable Markov Decision Processes (POMDPs). This project addresses these challenges by proposing two novel input representations: SS-only and RGB+SS, both employing semantic segmentation on RGB colour images. Experiments were conducted in deathmatches of ViZDoom, utilizing perfect segmentation results for controlled evaluation. Our results showed that SS-only was able to reduce the memory consumption of memory buffers by at least 66.6%, and up to 98.6% when a vectorisable lossless compression technique with minimal overhead such as run-length encoding is applied. Meanwhile, RGB+SS significantly enhances RL agents' performance with the additional semantic information provided. Furthermore, we explored density-based heatmapping as a tool to visualise RL agents' movement patterns and evaluate their suitability for data collection. A brief comparison with a previous approach highlights how our method overcame common pitfalls in applying semantic segmentation in 3D environments like ViZDoom.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2511.11703

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Kautsar, Muhammad Dehan Al, Koto, Fajri

Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer

arXiv.org Artificial IntelligenceOct-8-2025

Tokenization defines the foundation of multilingual language models by determining how words are represented and shared across languages. However, existing methods often fail to support effective cross-lingual transfer because semantically equivalent words are assigned distinct embeddings. For example, "I eat rice" in English and "Ina cin shinkafa" in Hausa are typically mapped to different vocabulary indices, preventing shared representations and limiting cross-lingual generalization. We introduce parallel tokenizers. This new framework trains tokenizers monolingually and then aligns their vocabularies exhaustively using bilingual dictionaries or word-to-word translation, ensuring consistent indices for semantically equivalent words. This alignment enforces a shared semantic space across languages while naturally improving fertility balance. To assess their effectiveness, we pretrain a transformer encoder from scratch on thirteen low-resource languages and evaluate it on sentiment analysis, hate speech detection, emotion classification, and sentence embedding similarity. Across all tasks, models trained with parallel tokenizers outperform conventional multilingual baselines, confirming that rethinking tokenization is essential for advancing multilingual representation learning--especially in low-resource settings.

computational linguistic, large language model, machine learning, (21 more...)

2510.06128

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Neural Information Processing SystemsOct-3-2025, 03:21:58 GMT

dataset release, tournament evaluation, architectural design, input representation, and other insights

We want to thank the reviewers for their helpful comments. The dataset will be made available to any interested researchers. We agree with R3 that there are a lot of non-trivial modeling choices in our architecture. We call the first one unit-based and the latter token-based. We apologize for writing some of the claims without referring to the evidence, like "orders from the last movement Our input representation is a result of both empirical findings and domain knowledge.

artificial intelligence, input representation, tournament evaluation, (16 more...)

Industry: Construction & Engineering (0.42)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Djork-Arné Clevert, Andreas Mayr, Thomas Unterthiner, Sepp Hochreiter

Rectified Factor Networks

Neural Information Processing SystemsOct-2-2025, 15:02:42 GMT

We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means.

artificial intelligence, machine learning, representation, (20 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts (0.04)
Europe > Austria > Upper Austria > Linz (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Brändle, Sven, Aczel, Till, Plesner, Andreas, Wattenhofer, Roger

From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks

arXiv.org Artificial IntelligenceOct-1-2025

Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes. Figure 1: DLGNs (blue) consistently outperform MLPs (red) across classification tasks with up to 2000 classes. The result illustrates the potential of logic-gate-based architectures to remain effective when applied to large-scale classification problems. Deep artificial neural networks have improved immensely in the last few years, exhibiting impressive performance across a wide range of tasks (Golroudbari & Sabour, 2023; Noor & Ige, 2024; Ekun-dayo & Ezugwu, 2025). However, these improvements come with rapidly growing computational costs (Thompson et al., 2020; Rosenfeld, 2021; Tripp et al., 2024). This constrains their deployment in many real-world environments, particularly on edge devices and mobile phones (Zhang et al., 2020; Zheng, 2025).

artificial intelligence, deep learning, machine learning, (16 more...)

2509.25933

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)

Park, Joonyong, Saito, Daisuke, Minematsu, Nobuaki

MixedG2P-T5: G2P-free Speech Synthesis for Mixed-script texts using Speech Self-Supervised Learning and Language Model

arXiv.org Artificial IntelligenceSep-3-2025

--This study presents a novel approach to voice synthesis that can substitute the traditional grapheme-to-phoneme (G2P) conversion by using a deep learning-based model that generates discrete tokens directly from speech. Utilizing a pre-trained voice SSL model, we train a T5 encoder to produce pseudo-language labels from mixed-script texts (e.g., containing Kanji and Kana). This method eliminates the need for manual phonetic transcription, reducing costs and enhancing scalability, especially for large non-transcribed audio datasets. Our model matches the performance of conventional G2P-based text-to-speech systems and is capable of synthesizing speech that retains natural linguistic and paralinguistic features, such as accents and intonations. Speech synthesis refers to the technology by which machines automatically generate speech audio signals and is commonly known as text-to-speech (TTS).

machine learning, natural language, predictor, (19 more...)

2509.01391

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report (1.00)

Industry: Law > Civil Rights & Constitutional Law (0.41)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceMay-27-2025

Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

Han, William, Duan, Chaojing, Cen, Zhepeng, Yao, Yihang, Song, Xiaoyu, Mhaskar, Atharva, Leong, Dylan, Rosenberg, Michael A., Liu, Emerson, Zhao, Ding

Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

large language model, machine learning, natural language, (19 more...)

2505.18847

Country: Asia (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)