Goto

Collaborating Authors

 emphasis


Quantifying Climate Policy Action and Its Links to Development Outcomes: A Cross-National Data-Driven Analysis

Dutta, Aditi

arXiv.org Artificial Intelligence

Addressing climate change effectively requires more than cataloguing the number of policies in place; it calls for tools that can reveal their thematic priorities and their tangible impacts on development outcomes. Existing assessments often rely on qualitative descriptions or composite indices, which can mask crucial differences between key domains such as mitigation, adaptation, disaster risk management, and loss and damage. To bridge this gap, we develop a quantitative indicator of climate policy orientation by applying a multilingual transformer-based language model to official national policy documents, achieving a classification accuracy of 0.90 (F1-score). Linking these indicators with World Bank development data in panel regressions reveals that mitigation policies are associated with higher GDP and GNI; disaster risk management correlates with greater GNI and debt but reduced foreign direct investment; adaptation and loss and damage show limited measurable effects. This integrated NLP-econometric framework enables comparable, theme-specific analysis of climate governance, offering a scalable method to monitor progress, evaluate trade-offs, and align policy emphasis with development goals.


Zadie Smith on Politics, Turning Fifty, and Mind Control

The New Yorker

The author's new essay collection, "Dead and Alive," addresses debates on representation in literature, feminism, and how our phones have radicalized us. Since Zadie Smith published her début novel, " White Teeth," twenty-five years ago, she has been a bold and original voice in literature. But those who aren't familiar with Smith's work outside of fiction are missing out. As an essayist, in and other publications, Smith writes with great nuance about culture, technology, gentrification, politics. "There's really not a topic that wouldn't benefit from her insight," David Remnick says. He spoke with Smith about her new collection of essays, " Dead and Alive ."


StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

Chen, Xi, Song, Yuchen, Nakamura, Satoshi

arXiv.org Artificial Intelligence

EmphST -Bench To guide algorithm exploration and evaluate the performance of our model, we design an evaluation pipeline for the emphasis-preserving speech-to-speech translation system. Given the lack of ready-to-use benchmarks for this important task, we leverage LLMs to translate the test set from the StressTest [21] corpus into the target language and then filter the results via human experts. This process creates a high-quality benchmark dataset, EmphST -Bench, with manually verified emphasis alignments between source and target utterances, ensuring reliable assessment of cross-lingual emphasis preservation. The human filtering step focuses on correcting any discrepancies in semantic equivalence, contrastive focus, and emotional intensity, resulting in a robust evaluation set that closely mirrors real-world linguistic nuances. EmphST -Bench consists of carefully selected parallel samples from English (source) to Chinese (target), providing a standardized resource for evaluating stress-aware S2ST systems. We report the statistics of EmphST -Bench in Table. 1. T able 1: Statistics of the EmphST -Bench dataset.Statistic V alue Number of Samples 218 Avg.


Intentional Gesture: Deliver Your Intentions with Gestures for Speech

Liu, Pinxin, Liu, Haiyang, Song, Luchuan, Corso, Jason J., Xu, Chenliang

arXiv.org Artificial Intelligence

When humans speak, gestures help convey communicative intentions, such as adding emphasis or describing concepts. However, current co-speech gesture generation methods rely solely on superficial linguistic cues (e.g. speech audio or text transcripts), neglecting to understand and leverage the communicative intention that underpins human gestures. This results in outputs that are rhythmically synchronized with speech but are semantically shallow. To address this gap, we introduce Intentional-Gesture, a novel framework that casts gesture generation as an intention-reasoning task grounded in high-level communicative functions. First, we curate the InG dataset by augmenting BEAT-2 with gesture-intention annotations (i.e., text sentences summarizing intentions), which are automatically annotated using large vision-language models. Next, we introduce the Intentional Gesture Motion Tokenizer to leverage these intention annotations. It injects high-level communicative functions (e.g., intentions) into tokenized motion representations to enable intention-aware gesture synthesis that are both temporally aligned and semantically meaningful, achieving new state-of-the-art performance on the BEAT-2 benchmark. Our framework offers a modular foundation for expressive gesture generation in digital humans and embodied AI. Project Page: https://andypinxinliu.github.io/Intentional-Gesture


Human-AI Collaboration Increases Efficiency in Regulatory Writing

Eser, Umut, Gozin, Yael, Stallons, L. Jay, Caroline, Ari, Preusse, Martin, Rice, Brandon, Wright, Scott, Robertson, Andrew

arXiv.org Artificial Intelligence

Background: Investigational New Drug (IND) application preparation is time-intensive and expertise-dependent, slowing early clinical development. Objective: To evaluate whether a large language model (LLM) platform (AutoIND) can reduce first-draft composition time while maintaining document quality in regulatory submissions. Methods: Drafting times for IND nonclinical written summaries (eCTD modules 2.6.2, 2.6.4, 2.6.6) generated by AutoIND were directly recorded. For comparison, manual drafting times for IND summaries previously cleared by the U.S. FDA were estimated from the experience of regulatory writers ($\geq$6 years) and used as industry-standard benchmarks. Quality was assessed by a blinded regulatory writing assessor using seven pre-specified categories: correctness, completeness, conciseness, consistency, clarity, redundancy, and emphasis. Each sub-criterion was scored 0-3 and normalized to a percentage. A critical regulatory error was defined as any misrepresentation or omission likely to alter regulatory interpretation (e.g., incorrect NOAEL, omission of mandatory GLP dose-formulation analysis). Results: AutoIND reduced initial drafting time by $\sim$97% (from $\sim$100 h to 3.7 h for 18,870 pages/61 reports in IND-1; and to 2.6 h for 11,425 pages/58 reports in IND-2). Quality scores were 69.6\% and 77.9\% for IND-1 and IND-2. No critical regulatory errors were detected, but deficiencies in emphasis, conciseness, and clarity were noted. Conclusions: AutoIND can dramatically accelerate IND drafting, but expert regulatory writers remain essential to mature outputs to submission-ready quality. Systematic deficiencies identified provide a roadmap for targeted model improvements.


You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

Tuttösí, Paige, Yeung, H. Henny, Wang, Yue, Aucouturier, Jean-Julien, Lim, Angelica

arXiv.org Artificial Intelligence

We present the first text-to-speech (TTS) system tailored to second language (L2) speakers. We use duration differences between American English tense (longer) and lax (shorter) vowels to create a "clarity mode" for Matcha-TTS. Our perception studies showed that French-L1, English-L2 listeners the participants had fewer (at least 9.15%) transcription errors when using our clarity mode, and found it more encouraging and respectful than overall slowed down speech. Remarkably, listeners were not aware of these effects: despite the decreased word error rate in clarity mode, listeners still believed that slowing all target words was the most intelligible, suggesting that actual intelligibility does not correlate with perceived intelligibility. Additionally, we found that Whisper-ASR did not use the same cues as L2 speakers to differentiate difficult vowels and is not sufficient to assess the intelligibility of TTS systems for these individuals.


Emphasis Sensitivity in Speech Representations

Cassini, Shaun, Hain, Thomas, Ragni, Anton

arXiv.org Artificial Intelligence

This work investigates whether modern speech models are sensitive to prosodic emphasis - whether they encode emphasized and neutral words in systematically different ways. Prior work typically relies on isolated acoustic correlates (e.g., pitch, duration) or label prediction, both of which miss the relational structure of emphasis. This paper proposes a residual-based framework, defining emphasis as the difference between paired neutral and emphasized word representations. Analysis on self-supervised speech models shows that these residuals correlate strongly with duration changes and perform poorly at word identity prediction, indicating a structured, relational encoding of prosodic emphasis. In ASR fine-tuned models, residuals occupy a subspace up to 50% more compact than in pre-trained models, further suggesting that emphasis is encoded as a consistent, low-dimensional transformation that becomes more structured with task-specific learning.


Thank you for pointing out the need for more emphasis on our setting, such as the consideration of only

Neural Information Processing Systems

We thank the reviewers for their valuable feedback. We will emphasize this in the abstract and more clearly throughout the paper. Shanmugam et al. (2015) prove minimax bounds, which both involve the size of largest maximal clique. R2 points out the paper might appear dense and not accessible to non-expert audiences. In fact, any apparent simplicity in the proofs is partially the result of careful definitions.


Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation

Salmanpour, Mohammad R., Pouria, Amir Hossein, Falahati, Sonia, Taeb, Shahram, Mehrnia, Somayeh Sadat, Maghsudi, Mehdi, Jouzdani, Ali Fathi, Oveisi, Mehrdad, Hacihaliloglu, Ilker, Rahmim, Arman

arXiv.org Artificial Intelligence

Background: CT imaging is vital for lung cancer management, offering detailed visualization for AI-based prognosis. However, supervised learning SL models require large labeled datasets, limiting their real-world application in settings with scarce annotations. Methods: We analyzed CT scans from 977 patients across 12 datasets extracting 1218 radiomics features using Laplacian of Gaussian and wavelet filters via PyRadiomics Dimensionality reduction was applied with 56 feature selection and extraction algorithms and 27 classifiers were benchmarked A semi supervised learning SSL framework with pseudo labeling utilized 478 unlabeled and 499 labeled cases Model sensitivity was tested in three scenarios varying labeled data in SL increasing unlabeled data in SSL and scaling both from 10 percent to 100 percent SHAP analysis was used to interpret predictions Cross validation and external testing in two cohorts were performed. Results: SSL outperformed SL, improving overall survival prediction by up to 17 percent. The top SSL model, Random Forest plus XGBoost classifier, achieved 0.90 accuracy in cross-validation and 0.88 externally. SHAP analysis revealed enhanced feature discriminability in both SSL and SL, especially for Class 1 survival greater than 4 years. SSL showed strong performance with only 10 percent labeled data, with more stable results compared to SL and lower variance across external testing, highlighting SSL's robustness and cost effectiveness. Conclusion: We introduced a cost-effective, stable, and interpretable SSL framework for CT-based survival prediction in lung cancer, improving performance, generalizability, and clinical readiness by integrating SHAP explainability and leveraging unlabeled data.


Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

Nishida, Naoto, Ishiguro, Yoshio, Rekiomto, Jun, Yamashita, Naomi

arXiv.org Artificial Intelligence

In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.