Goto

Collaborating Authors

 Media


The Forgotten Code: Validating a Century-Old Translation System with AI

arXiv.org Artificial Intelligence

A pioneering rule-based mechanical translation system (precursor of modern RBMTs) was first presented in December 1929 by its inventor, Federico Pucci, who later published the full method in a book titled "Il traduttore meccanico ed il metodo per corrispondersi fra Europei conoscendo ciascuno solo la propria lingua: Parte I", in Salerno (Italy), in 1931. This study illustrates how AI breathes new life into the system of international keys and ideograms devised by Pucci to translate from/into any Romance language (at least as a first step). The methodology involves having the AIs retranslate, following Pucci's method, the two text excerpts originally translated in 1931 and clearly documented in his publication: a passage from Dante's La Vita Nuova, translated from Italian into French, and a passage from Voltaire's Zadig, translated from French into Italian. The result is notable: the two texts, translated 94 years apart using the same method--by Pucci in 1931 and by AIs in 2025--show a low average difference, with only minor variations observed. With Pucci's system thus validated, it became feasible to have the AIs reproduce the excerpts in English, Spanish, and German according to his method. The results were consistent, and Pucci--via Artificial Intelligence--was tasked with translating more modern and technical texts, thereby reviving, nearly a century later, an invention that had remained almost entirely unknown and never applied beyond its creator, now brought to wider attention and opened to possible experimentation. Such a demonstration would not only affirm Pucci's historical status but also place him among the precursors and intellectual contributors to machine translation, whose work merits examination alongside figures such as Troyanskij, Booth, and Weaver, with possible consequences for how the history of the field is understood.


Hermes 4 Technical Report

arXiv.org Artificial Intelligence

We present Hermes 4, a family of hybrid reasoning models that combine structured, multi-turn reasoning with broad instruction-following ability. We describe the challenges encountered during data curation, synthesis, training, and evaluation, and outline the solutions employed to address these challenges at scale. We comprehensively evaluate across mathematical reasoning, coding, knowledge, comprehension, and alignment benchmarks, and we report both quantitative performance and qualitative behavioral analysis. To support open research, all model weights are published publicly at https://huggingface.co/collections/NousResearch/hermes-4-collection-68a731bfd452e20816725728


Towards Temporal Knowledge-Base Creation for Fine-Grained Opinion Analysis with Language Models

arXiv.org Artificial Intelligence

We propose a scalable method for constructing a temporal opinion knowledge base with large language models (LLMs) as automated annotators. Despite the demonstrated utility of time-series opinion analysis of text for downstream applications such as forecasting and trend analysis, existing methodologies underexploit this potential due to the absence of temporally grounded fine-grained annotations. Our approach addresses this gap by integrating well-established opinion mining formulations into a declarative LLM annotation pipeline, enabling structured opinion extraction without manual prompt engineering. We define three data models grounded in sentiment and opinion mining literature, serving as schemas for structured representation. We perform rigorous quantitative evaluation of our pipeline using human-annotated test samples. We carry out the final annotations using two separate LLMs, and inter-annotator agreement is computed label-wise across the fine-grained opinion dimensions, analogous to human annotation protocols. The resulting knowledge base encapsulates time-aligned, structured opinions and is compatible with applications in Retrieval-Augmented Generation (RAG), temporal question answering, and timeline summarisation.


Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification

arXiv.org Artificial Intelligence

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed between the classes - known as the problem of imbalanced data. Synthetic oversampling is a popular approach to imbalanced classification. The idea is to generate synthetic observations in the minority class to balance the classes in the training set. Many general-purpose oversampling methods can be applied to text data; however, imbalanced text data poses a number of distinctive difficulties that stem from the unique nature of text compared to other domains. One such factor is that when the sample size of text increases, the sample vocabulary (i.e., feature space) is likely to grow as well. We introduce a novel Markov chain based text oversampling method. The transition probabilities are estimated from the minority class but also partly from the majority class, thus allowing the minority feature space to expand in oversampling. We evaluate our approach against prominent oversampling methods and show that our approach is able to produce highly competitive results against the other methods in several real data examples, especially when the imbalance is severe.


Towards Multi-Aspect Diversification of News Recommendations Using Neuro-Symbolic AI for Individual and Societal Benefit

arXiv.org Artificial Intelligence

News recommendations are complex, with diversity playing a vital role. So far, existing literature predominantly focuses on specific aspects of news diversity, such as viewpoints. In this paper, we introduce multi-aspect diversification in four distinct recommendation modes and outline the nuanced challenges in diversifying lists, sequences, summaries, and interactions. Our proposed research direction combines symbolic and subsymbolic artificial intelligence, leveraging both knowledge graphs and rule learning. We plan to evaluate our models using user studies to not only capture behavior but also their perceived experience. Our vision to balance news consumption points to other positive effects for users (e.g., increased serendipity) and society (e.g., decreased polarization).


Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction

arXiv.org Artificial Intelligence

Modern deep learning developments create new opportunities for 3D mapping technology, scene reconstruction pipelines, and virtual reality development. Despite advances in 3D deep learning technology, direct training of deep learning models on 3D data faces challenges due to the high dimensionality inherent in 3D data and the scarcity of labeled datasets. Structure-from-motion (SfM) and Simultaneous Localization and Mapping (SLAM) exhibit robust performance when applied to structured indoor environments but often struggle with ambiguous features in unstructured environments. These techniques often struggle to generate detailed geometric representations effective for downstream tasks such as rendering and semantic analysis. Current limitations require the development of 3D representation methods that combine traditional geometric techniques with deep learning capabilities to generate robust geometry-aware deep learning models. The dissertation provides solutions to the fundamental challenges in 3D vision by developing geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction. The integration of geometric priors or constraints, such as including depth information, surface normals, and equivariance into deep learning models, enhances both the accuracy and robustness of geometric representations. This study systematically investigates key components of 3D vision, including camera pose estimation, point cloud registration, depth estimation, and high-fidelity 3D reconstruction, demonstrating their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.


Journalists' Perceptions of Artificial Intelligence and Disinformation Risks

arXiv.org Artificial Intelligence

This study examines journalists' perceptions of the impact of artificial intelligence (AI) on disinformation, a growing concern in journalism due to the rapid expansion of generative AI and its influence on news production and media organizations. Using a quantitative approach, a structured survey was administered to 504 journalists in the Basque Country, identified through official media directories and with the support of the Basque Association of Journalists. This survey, conducted online and via telephone between May and June 2024, included questions on sociodemographic and professional variables, as well as attitudes toward AI's impact on journalism. The results indicate that a large majority of journalists (89.88%) believe AI will considerably or significantly increase the risks of disinformation, and this perception is consistent across genders and media types, but more pronounced among those with greater professional experience. Statistical analyses reveal a significant association between years of experience and perceived risk, and between AI use and risk perception. The main risks identified are the difficulty in detecting false content and deepfakes, and the risk of obtaining inaccurate or erroneous data. Co-occurrence analysis shows that these risks are often perceived as interconnected. These findings highlight the complex and multifaceted concerns of journalists regarding AI's role in the information ecosystem.


Music Genre Classification Using Machine Learning Techniques

arXiv.org Artificial Intelligence

This paper presents a comparative analysis of machine learning methodologies for automatic music genre classification. We evaluate the performance of classical classifiers, including Support Vector Machines (SVM) and ensemble methods, trained on a comprehensive set of hand-crafted audio features, against a Convolutional Neural Network (CNN) operating on Mel spectrograms. The study is conducted on the widely-used GTZAN dataset. Our findings demonstrate a noteworthy result: the SVM, leveraging domain-specific feature engineering, achieves superior classification accuracy compared to the end-to-end CNN model. We attribute this outcome to the data-constrained nature of the benchmark dataset, where the strong inductive bias of engineered features provides a regularization effect that mitigates the risk of overfitting inherent in high-capacity deep learning models. This work underscores the enduring relevance of traditional feature extraction in practical audio processing tasks and provides a critical perspective on the universal applicability of deep learning, especially for moderately sized datasets.


Bridging Thoughts and Words: Graph-Based Intent-Semantic Joint Learning for Fake News Detection

arXiv.org Artificial Intelligence

Fake news detection is an important and challenging task for defending online information integrity. Existing state-of-the-art approaches typically extract news semantic clues, such as writing patterns that include emotional words, stylistic features, etc. However, detectors tuned solely to such semantic clues can easily fall into surface detection patterns, which can shift rapidly in dynamic environments, leading to limited performance in the evolving news landscape. To address this issue, this paper investigates a novel perspective by incorporating news intent into fake news detection, bridging intents and semantics together. The core insight is that by considering news intents, one can deeply understand the inherent thoughts behind news deception, rather than the surface patterns within words alone. To achieve this goal, we propose Graph-based Intent-Semantic Joint Modeling (InSide) for fake news detection, which models deception clues from both semantic and intent signals via graph-based joint learning. Specifically, InSide reformulates news semantic and intent signals into heterogeneous graph structures, enabling long-range context interaction through entity guidance and capturing both holistic and implementation-level intent via coarse-to-fine intent modeling. To achieve better alignment between semantics and intents, we further develop a dynamic pathway-based graph alignment strategy for effective message passing and aggregation across these signals by establishing a common space. Extensive experiments on four benchmark datasets demonstrate the superiority of the proposed InSide compared to state-of-the-art methods.


From Discord to Harmony: Decomposed Consonance-based Training for Improved Audio Chord Estimation

arXiv.org Artificial Intelligence

Audio Chord Estimation (ACE) holds a pivotal role in music information research, having garnered attention for over two decades due to its relevance for music transcription and analysis. Despite notable advancements, challenges persist in the task, particularly concerning unique characteristics of harmonic content, which have resulted in existing systems' performances reaching a glass ceiling. These challenges include annotator subjectivity, where varying interpretations among annotators lead to inconsistencies, and class imbalance within chord datasets, where certain chord classes are over-represented compared to others, posing difficulties in model training and evaluation. As a first contribution, this paper presents an evaluation of inter-annotator agreement in chord annotations, using metrics that extend beyond traditional binary measures. In addition, we propose a consonance-informed distance metric that reflects the perceptual similarity between harmonic annotations. Our analysis suggests that consonance-based distance metrics more effectively capture musically meaningful agreement between annotations. Expanding on these findings, we introduce a novel ACE conformer-based model that integrates consonance concepts into the model through consonance-based label smoothing. The proposed model also addresses class imbalance by separately estimating root, bass, and all note activations, enabling the reconstruction of chord labels from decomposed outputs.