Goto

Collaborating Authors

 berlin



LingGym: How Far Are LLMs from Thinking Like Field Linguists?

Yang, Changbing, Ma, Franklin, Shi, Freda, Zhu, Jian

arXiv.org Artificial Intelligence

This paper introduces LingGym, a new benchmark that evaluates LLMs' capacity for meta-linguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18 typologically diverse reference grammars. Unlike previous work that focuses on specific downstream tasks, we assess whether LLMs can generalize linguistic inference across low-resource languages and structures not seen during training. We present a controlled evaluation task: Word-Gloss Inference, in which the model must infer a missing word and gloss from context using varying levels of linguistic information (e.g., glosses, grammatical explanations, translations). Our results show that incorporating structured linguistic cues leads to consistent improvements in reasoning performance across all models. This work highlights both the promise and current limitations of using LLMs for typologically informed linguistic analysis and low-resource language documentation.


Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials

Lautenschlager, Thomas, Friederich, Nils, Sitcheu, Angelo Jovin Yamachui, Nau, Katja, Hayot, Gaëlle, Dickmeis, Thomas, Mikut, Ralf

arXiv.org Artificial Intelligence

High-throughput toxicity testing offers a fast and cost-effective way to test large amounts of compounds. A key component for such systems is the automated evaluation via machine learning models. In this paper, we address critical challenges in this domain and demonstrate how representations learned via self-supervised learning can effectively identify toxicant-induced changes. We provide a proof-of-concept that utilizes the publicly available EmbryoNet dataset, which contains ten zebrafish embryo phenotypes elicited by various chemical compounds targeting different processes in early embryonic development. Our analysis shows that the learned representations using self-supervised learning are suitable for effectively distinguishing between the modes-of-action of different compounds. Finally, we discuss the integration of machine learning models in a physical toxicity testing device in the context of the TOXBOX project.


A Reference LPF methods on AlpacaFarm 568 Having defined and validated the pairwise feedback simulator and evaluations in AlpacaFarm, we

Neural Information Processing Systems

A.1 Methods that directly learn from pairwise feedback To start, we describe the step of training the surrogate reward model. We adapt this approach in AlpacaFarm as a two-step method. In Appendix F, we include our preliminary study of multi-round expert iteration. We find exactly this result with the simulator. Figure 5: Our simulated annotators are cheap and match well with human annotators.


The best gadgets and gear we saw at IFA 2025 in Berlin

Popular Science

We may earn revenue from the products available on this page and participate in affiliate programs. Under the spidery lattice of Berlin's Funkturm radio tower, the IFA 2025 consumer electronics expo spills through every lair and level of the Messe grounds. These century-old show floors began as broadcasting industry exhibition halls and never stopped mutating. I have a diagram to guide me, but it might as well be a Zelda minimap leading me through this boss fight masquerading as a building. It's a series of architecture you operate: escalators are levers, skybridges, secret passages; many different doors, literal puzzle switches unlocking the next gear reveal .


Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning

Fauré, Guilhem, Sadeghi, Mostafa, Bigeard, Sam, Ouni, Slim

arXiv.org Artificial Intelligence

One of the main challenges in neural sign language production (SLP) lies in the high intra-class variability of signs, arising from signer morphology and stylistic variety in the training data. To improve robustness to such variations, we propose two enhancements to the standard Progressive Transformers (PT) architecture (Saunders et al., 2020). First, we encode poses using bone rotations in quaternion space and train with a geodesic loss to improve the accuracy and clarity of angular joint movements. Second, we introduce a contrastive loss to structure decoder embeddings by semantic similarity, using either gloss overlap or SBERT-based sentence similarity, aiming to filter out anatomical and stylistic features that do not convey relevant semantic information. On the Phoenix14T dataset, the contrastive loss alone yields a 16% improvement in Probability of Correct Keypoint over the PT baseline. When combined with quaternion-based pose encoding, the model achieves a 6% reduction in Mean Bone Angle Error. These results point to the benefit of incorporating skeletal structure modeling and semantically guided contrastive objectives on sign pose representations into the training of Transformer-based SLP models.


Evaluation of a Sign Language Avatar on Comprehensibility, User Experience \& Acceptability

Wasserroth, Fenya, Avramidis, Eleftherios, Czehmann, Vera, Kojic, Tanja, Nunnari, Fabrizio, Möller, Sebastian

arXiv.org Artificial Intelligence

This paper presents an investigation into the impact of adding adjustment features to an existing sign language (SL) avatar on a Microsoft Hololens 2 device. Through a detailed analysis of interactions of expert German Sign Language (DGS) users with both adjustable and non-adjustable avatars in a specific use case, this study identifies the key factors influencing the comprehensibility, the user experience (UX), and the acceptability of such a system. Despite user preference for adjustable settings, no significant improvements in UX or comprehensibility were observed, which remained at low levels, amid missing SL elements (mouthings and facial expressions) and implementation issues (indistinct hand shapes, lack of feedback and menu positioning). Hedonic quality was rated higher than pragmatic quality, indicating that users found the system more emotionally or aesthetically pleasing than functionally useful. Stress levels were higher for the adjustable avatar, reflecting lower performance, greater effort and more frustration. Additionally, concerns were raised about whether the Hololens adjustment gestures are intuitive and easy to familiarise oneself with. While acceptability of the concept of adjustability was generally positive, it was strongly dependent on usability and animation quality. This study highlights that personalisation alone is insufficient, and that SL avatars must be comprehensible by default. Key recommendations include enhancing mouthing and facial animation, improving interaction interfaces, and applying participatory design.


The TUB Sign Language Corpus Collection

Avramidis, Eleftherios, Czehmann, Vera, Deckert, Fabian, Hufe, Lorenz, Lipski, Aljoscha, Villalobos, Yuni Amaloa Quintero, Rhee, Tae Kwon, Shi, Mengqian, Stölting, Lennart, Nunnari, Fabrizio, Möller, Sebastian

arXiv.org Artificial Intelligence

We present a collection of parallel corpora of 12 sign languages in video format, together with subtitles in the dominant spoken languages of the corresponding countries. The entire collection includes more than 1,300 hours in 4,381 video files, accompanied by 1,3~M subtitles containing 14~M tokens. Most notably, it includes the first consistent parallel corpora for 8 Latin American sign languages, whereas the size of the German Sign Language corpora is ten times the size of the previously available corpora. The collection was created by collecting and processing videos of multiple sign languages from various online sources, mainly broadcast material of news shows, governmental bodies and educational channels. The preparation involved several stages, including data collection, informing the content creators and seeking usage approvals, scraping, and cropping. The paper provides statistics on the collection and an overview of the methods used to collect the data.


No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence

van der Aalst, Wil M. P.

arXiv.org Artificial Intelligence

The uptake of Artificial Intelligence (AI) impacts the way we work, interact, do business, and conduct research. However, organizations struggle to apply AI successfully in industrial settings where the focus is on end-to-end operational processes. Here, we consider generative, predictive, and prescriptive AI and elaborate on the challenges of diagnosing and improving such processes. We show that AI needs to be grounded using Object-Centric Process Mining (OCPM). Process-related data are structured and organization-specific and, unlike text, processes are often highly dynamic. OCPM is the missing link connecting data and processes and enables different forms of AI. We use the term Process Intelligence (PI) to refer to the amalgamation of process-centric data-driven techniques able to deal with a variety of object and event types, enabling AI in an organizational context. This paper explains why AI requires PI to improve operational processes and highlights opportunities for successfully combining OCPM and generative, predictive, and prescriptive AI.


The Importance of Facial Features in Vision-based Sign Language Recognition: Eyes, Mouth or Full Face?

Pham, Dinh Nam, Avramidis, Eleftherios

arXiv.org Artificial Intelligence

Non-manual facial features play a crucial role in sign language communication, yet their importance in automatic sign language recognition (ASLR) remains underexplored. While prior studies have shown that incorporating facial features can improve recognition, related work often relies on hand-crafted feature extraction and fails to go beyond the comparison of manual features versus the combination of manual and facial features. In this work, we systematically investigate the contribution of distinct facial regionseyes, mouth, and full faceusing two different deep learning models (a CNN-based model and a transformer-based model) trained on an SLR dataset of isolated signs with randomly selected classes. Through quantitative performance and qualitative saliency map evaluation, we reveal that the mouth is the most important non-manual facial feature, significantly improving accuracy. Our findings highlight the necessity of incorporating facial features in ASLR.