Goto

Collaborating Authors

 markup


Structured Document Translation via Format Reinforcement Learning

Song, Haiyue, Eschbach-Dymanus, Johannes, Kaing, Hour, Honda, Sumire, Tanaka, Hideki, Buschbeck, Bianka, Utiyama, Masao

arXiv.org Artificial Intelligence

Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, we propose \textbf{Format Reinforcement Learning (FormatRL)}, which employs Group Relative Policy Optimization on top of a supervised fine-tuning model to directly optimize novel structure-aware rewards: 1) TreeSim, which measures structural similarity between predicted and reference XML trees and 2) Node-chrF, which measures translation quality at the level of XML nodes. Additionally, we apply StrucAUC, a fine-grained metric distinguishing between minor errors and major structural failures. Experiments on the SAP software-documentation benchmark demonstrate improvements across six metrics and an analysis further shows how different reward functions contribute to improvements in both structural and translation quality.


Reverse Browser: Vector-Image-to-Code Generator

Toth-Czifra, Zoltan

arXiv.org Artificial Intelligence

Automating the conversion of user interface design into code (image-to-code or image-to-UI) is an active area of software engineering research. However, the state-of-the-art solutions do not achieve high fidelity to the original design, as evidenced by benchmarks. In this work, I approach the problem differently: I use vector images instead of bitmaps as model input. I create several large datasets for training machine learning models. I evaluate the available array of Image Quality Assessment (IQA) algorithms and introduce a new, multi-scale metric. I then train a large open-weights model and discuss its limitations.


Improving French Synthetic Speech Quality via SSML Prosody Control

Ouali, Nassima Ould, Sani, Awais Hussain, Bueno, Ruben, Dauvet, Jonah, Horstmann, Tim Luka, Moulines, Eric

arXiv.org Artificial Intelligence

Despite recent advances, synthetic voices often lack expressiveness due to limited prosody control in commercial text-to-speech (TTS) systems. We introduce the first end-to-end pipeline that inserts Speech Synthesis Markup Language (SSML) tags into French text to control pitch, speaking rate, volume, and pause duration. We employ a cascaded architecture with two QLoRA-fine-tuned Qwen 2.5-7B models: one predicts phrase-break positions and the other performs regression on prosodic targets, generating commercial TTS-compatible SSML markup. Evaluated on a 14-hour French podcast corpus, our method achieves 99.2% F1 for break placement and reduces mean absolute error on pitch, rate, and volume by 25-40% compared with prompting-only large language models (LLMs) and a BiLSTM baseline. In perceptual evaluation involving 18 participants across over 9 hours of synthesized audio, SSML-enhanced speech generated by our pipeline significantly improves naturalness, with the mean opinion score increasing from 3.20 to 3.87 (p < 0.005). Additionally, 15 of 18 listeners preferred our enhanced synthesis. These results demonstrate substantial progress in bridging the expressiveness gap between synthetic and natural French speech. Our code is publicly available at https://github.com/hi-paris/Prosody-Control-French-TTS.


AlphaDent: A dataset for automated tooth pathology detection

Sosnin, Evgeniy I., Vasilev, Yuriy L., Solovyev, Roman A., Stempkovskiy, Aleksandr L., Telpukhov, Dmitry V., Vasilev, Artem A., Amerikanov, Aleksandr A., Romanov, Aleksandr Y.

arXiv.org Artificial Intelligence

In this article, we present a new unique dataset for dental research - AlphaDent. This dataset is based on the DSLR camera photographs of the teeth of 295 patients and contains over 1200 images. The dataset is labeled for solving the instance segmentation problem and is divided into 9 classes. The article provides a detailed description of the dataset and the labeling format. The article also provides the details of the experiment on neural network training for the Instance Segmentation problem using this dataset. The results obtained show high quality of predictions. The dataset is published under an open license; and the training/inference code and model weights are also available under open licenses.


Newegg has RTX 5090 cards in stock at base price right now

PCWorld

It's been seven months since Nvidia launched its flagship RTX 5090 card to a hungry audience of PC gamers… and people building AI data centers… and a bunch of scalpers trying to bilk them all. In that time, I've yet to see one actually available to purchase at the alleged base price of two thousand dollarydoos. As of just before 11 AM Eastern US time, Newegg has one for the base price. Specifically this one, the Zotac Gaming Solid model, a basic triple-fan design which apparently has the reference PCB with no overclock. As the good Lord intended.


Investigation finds Match Group failed to act on reports of sexual assault

Engadget

A new investigation from The Markup claims the parent company of Tinder, Hinge, OKCupid and other dating apps turns a blind eye to allegedly abusive users on its platforms. The 18-month investigation found instances in which users who were repeatedly reported for drugging or assaulting their dates remained on the apps. One such case involves a Colorado-based cardiologist named Stephen Matthews. Over several years, multiple women on Match's platforms reported him for drugging or raping them. Despite these reports, his Tinder profile was at one point given Standout status, reserved for popular profiles and often requiring in-app currency to interact with.


Rape under wraps: how Tinder, Hinge and their corporate owner chose profits over safety

The Guardian

The Dating Apps Reporting Project is an 18-month investigation. It was produced in partnership with the Pulitzer Center's AI Accountability Network and the Markup, now a part of CalMatters, and co-published with the Guardian and the 19th. When a young woman in Denver met up with a smiling cardiologist she matched with on the dating app Hinge, she had no way of knowing that the company behind the app had already received reports from two other women who had accused him of rape. She met the 34-year-old doctor with green eyes and thinning hair at Highland Tap & Burger, a sports bar in a trendy neighborhood. It went well enough that she accepted an invitation to go back to his apartment. As she emerged from his bathroom, he handed her a tequila soda. What transpired over the next 24 hours, according to court testimony, reads like every person's dating app nightmare. After sipping the drink, the woman started to lose control. She fell to the ground, and the man started to film her. He put her in a headlock, kissing her forehead; she struggled to free herself but managed to grab her things and leave. He followed her out the door, holding her shoes and trying to force her back inside, but she was able to call an Uber, vomiting in the car on the way home. She woke up at home, soaking wet on her bathroom floor, the key to her house still in her door. She continued vomiting for hours.


Investigating on RLHF methodology

Kutalev, Alexey, Markoff, Sergei

arXiv.org Artificial Intelligence

In this article, we investigate the alignment of Large Language Models according to human preferences. We discuss the features of training a Preference Model, which simulates human preferences, and the methods and details we found essential for achieving the best results. We also discuss using Reinforcement Learning to fine-tune Large Language Models and describe the challenges we faced and the ways to overcome them. Additionally, we present our experience with the Direct Preference Optimization method, which enables us to align a Large Language Model with human preferences without creating a separate Preference Model. As our contribution, we introduce the approach for collecting a preference dataset through perplexity filtering, which makes the process of creating such a dataset for a specific Language Model much easier and more cost-effective.


Word-wise intonation model for cross-language TTS systems

A., Tomilov A., Y., Gromova A., N, Svischev A.

arXiv.org Artificial Intelligence

In this paper we propose a word-wise intonation model for Russian language and show how it can be generalized for other languages. The proposed model is suitable for automatic data markup and its extended application to text-to-speech systems. It can also be implemented for an intonation contour modeling by using rule-based algorithms or by predicting contours with language models. The key idea is a partial elimination of the variability connected with different placements of a stressed syllable in a word. It is achieved with simultaneous applying of pitch simplification with a dynamic time warping clustering. The proposed model could be used as a tool for intonation research or as a backbone for prosody description in text-to-speech systems. As the advantage of the model, we show its relations with the existing intonation systems as well as the possibility of using language models for prosody prediction. Finally, we demonstrate some practical evidence of the system robustness to parameter variations.


Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web

Lin, Kate, Alrashed, Tarfah, Noy, Natasha

arXiv.org Artificial Intelligence

The Web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities; rather, they are intricately connected through complex relationships. Semantic relationships between datasets provide critical insights for research and decision-making processes. In this paper, we study dataset relationships from the perspective of users who discover, use, and share datasets on the Web: what relationships are important for different tasks? What contextual information might users want to know? We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery. We develop a series of methods to identify these relationships and compare their performance on a large corpus of datasets generated from Web pages with schema.org markup. We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%. Finally, we highlight gaps in available semantic markup for datasets and discuss how incorporating comprehensive semantics can facilitate the identification of dataset relationships. By providing a comprehensive overview of dataset relationships at scale, this paper sets a benchmark for future research.