tse
We may not have flying cars, but we have flying umbrellas
Inventor John Tse has gone high-tech to keep raindrops from falling on your head. Breakthroughs, discoveries, and DIY tips sent six days a week. You wouldn't think it, but for years people have looked at the humble umbrella and seen more than just a way to keep dry during a rainstorm. They see it as a challenge. Are there ways to use it we never thought of before?
PMA-Diffusion: A Physics-guided Mask-Aware Diffusion Framework for TSE from Sparse Observations
Liu, Lindong, Jin, Zhixiong, Choi, Seongjin
High-resolution highway traffic state information is essential for Intelligent Transportation Systems, but typical traffic data acquired from loop detectors and probe vehicles are often too sparse and noisy to capture the detailed dynamics of traffic flow. We propose PMA-Diffusion, a physics-guided mask-aware diffusion framework that reconstructs unobserved highway speed fields from sparse, incomplete observations. Our approach trains a diffusion prior directly on sparsely observed speed fields using two mask-aware training strategies: Single-Mask and Double-Mask. At the inference phase, the physics-guided posterior sampler alternates reverse-diffusion updates, observation projection, and physics-guided projection based on adaptive anisotropic smoothing to reconstruct the missing speed fields. The proposed framework is tested on the I-24 MOTION dataset with varying visibility ratios. Even under severe sparsity, with only 5% visibility, PMA-Diffusion outperforms other baselines across three reconstruction error metrics. Furthermore, PMA-diffusion trained with sparse observation nearly matches the performance of the baseline model trained on fully observed speed fields. The results indicate that combining mask-aware diffusion priors with a physics-guided posterior sampler provides a reliable and flexible solution for traffic state estimation under realistic sensing sparsity.
Multilingual Target-Stance Extraction
Social media enables data-driven analysis of public opinion on contested issues. Target-Stance Extraction (TSE) is the task of identifying the target discussed in a document and the document's stance towards that target. Many works classify stance towards a given target in a multilingual setting, but all prior work in TSE is English-only. This work introduces the first multilingual TSE benchmark, spanning Catalan, Estonian, French, Italian, Mandarin, and Spanish corpora. It manages to extend the original TSE pipeline to a multilingual setting without requiring separate models for each language. Our model pipeline achieves a modest F1 score of 12.78, underscoring the increased difficulty of the multilingual task relative to English-only setups and highlighting target prediction as the primary bottleneck. We are also the first to demonstrate the sensitivity of TSE's F1 score to different target verbalizations. Together these serve as a much-needed baseline for resources, algorithms, and evaluation criteria in multilingual TSE.
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Wang, Wen, Fang, Bozhen, Jing, Chenchen, Shen, Yongliang, Shen, Yangyi, Wang, Qiuyu, Ouyang, Hao, Chen, Hao, Shen, Chunhua
Diffusion large language models (dLLMs) generate text through iterative denoising, yet current decoding strategies discard rich intermediate predictions in favor of the final output. Our work here reveals a critical phenomenon, temporal oscillation, where correct answers often emerge in the middle process, but are overwritten in later denoising steps. To address this issue, we introduce two complementary methods that exploit temporal consistency: 1) Temporal Self-Consistency Voting, a training-free, test-time decoding strategy that aggregates predictions across denoising steps to select the most consistent output; and 2) a post-training method termed Temporal Consistency Reinforcement, which uses Temporal Semantic Entropy (TSE), a measure of semantic stability across intermediate predictions, as a reward signal to encourage stable generations. Empirical results across multiple benchmarks demonstrate the effectiveness of our approach. Using the negative TSE reward alone, we observe a remarkable average improvement of 24.7% on the Countdown dataset over an existing dLLM. Combined with the accuracy reward, we achieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and 25.3% on Countdown, respectively. Our findings underscore the untapped potential of temporal dynamics in dLLMs and offer two simple yet effective tools to harness them.
Japanese AI developer Alt goes bust after accounting fraud
Japanese artificial intelligence developer Alt, which revealed accounting irregularities recently, has filed for bankruptcy protection with the Tokyo District Court under the civil rehabilitation law. The court accepted the application, according to the Tokyo-based company's announcement Wednesday. Alt left debts totaling about 2.4 billion ( 16.1 million) and aims for its rehabilitation by finding a sponsor entity that will take over its operations. Alt's line of business includes a service to create meeting summaries using AI. The company went public on the Tokyo Stock Exchange's Growth section for startups in October 2024, 10 years after its establishment in 2014.
Unimodal Strategies in Density-Based Clustering
Nir, Oron, Tenenbaum, Jay, Shamir, Ariel
Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset
Liu, Rui, Gao, Pu, Xi, Jiatian, Sisman, Berrak, Busso, Carlos, Li, Haizhou
Text-based speech editing (TSE) modifies speech using only text, eliminating re-recording. However, existing TSE methods, mainly focus on the content accuracy and acoustic consistency of synthetic speech segments, and often overlook the emotional shifts or inconsistency issues introduced by text changes. To address this issue, we propose EmoCorrector, a novel post-correction scheme for TSE. EmoCorrector leverages Retrieval-Augmented Generation (RAG) by extracting the edited text's emotional features, retrieving speech samples with matching emotions, and synthesizing speech that aligns with the desired emotion while preserving the speaker's identity and quality. To support the training and evaluation of emotional consistency modeling in TSE, we pioneer the benchmarking Emotion Correction Dataset for TSE (ECD-TSE). The prominent aspect of ECD-TSE is its inclusion of $<$text, speech$>$ paired data featuring diverse text variations and a range of emotional expressions. Subjective and objective experiments and comprehensive analysis on ECD-TSE confirm that EmoCorrector significantly enhances the expression of intended emotion while addressing emotion inconsistency limitations in current TSE methods. Code and audio examples are available at https://github.com/AI-S2-Lab/EmoCorrector.
Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning
Zhang, Jinghan, Mo, Fengran, Wang, Xiting, Liu, Kunpeng
Recent advances in large language models (LLMs) have demonstrated their potential in handling complex reasoning tasks, which are usually achieved by constructing a thought chain to guide the model to solve the problem with multi-step thinking. However, existing methods often remain confined to previously explored solution spaces and thus overlook the critical blind spot within LLMs' cognitive range. To address these issues, we design the Thought Space Explorer (TSE), a novel framework to expand and optimize thought structures to guide LLMs to explore their blind spots of thinking. By generating new reasoning steps and branches based on the original thought structure with various designed strategies, TSE broadens the thought space and alleviates the impact of blind spots for LLM reasoning. Experimental results on multiple levels of reasoning tasks demonstrate the efficacy of TSE. We also conduct extensive analysis to understand how structured and expansive thought can contribute to unleashing the potential of LLM reasoning capabilities.
Investigation of Speaker Representation for Target-Speaker Speech Processing
Ashihara, Takanori, Moriya, Takafumi, Horiguchi, Shota, Peng, Junyi, Ochiai, Tsubasa, Delcroix, Marc, Matsuura, Kohei, Sato, Hiroshi
Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection (p-VAD), are important for extracting information about a desired speaker's speech even when it is corrupted by interfering speakers. While most studies have focused on training schemes or system architectures for each specific task, the auxiliary network for embedding target-speaker cues has not been investigated comprehensively in a unified cross-task evaluation. Therefore, this paper aims to address a fundamental question: what is the preferred speaker embedding for TS tasks? To this end, for the TS-ASR, TSE, and p-VAD tasks, we compare pre-trained speaker encoders (i.e., self-supervised or speaker recognition models) that compute speaker embeddings from pre-recorded enrollment speech of the target speaker with ideal speaker embeddings derived directly from the target speaker's identity in the form of a one-hot vector. To further understand the properties of ideal speaker embedding, we optimize it using a gradient-based approach to improve performance on the TS task. Our analysis reveals that speaker verification performance is somewhat unrelated to TS task performances, the one-hot vector outperforms enrollment-based ones, and the optimal embedding depends on the input mixture.
Can Large Language Models Address Open-Target Stance Detection?
Akash, Abu Ubaida, Fahmy, Ahmed, Trabelsi, Amine
Stance detection (SD) identifies a text's position towards a target, typically labeled as favor, against, or none. We introduce Open-Target Stance Detection (OTSD), the most realistic task where targets are neither seen during training nor provided as input. We evaluate Large Language Models (LLMs) GPT-4o, GPT-3.5, Llama-3, and Mistral, comparing their performance to the only existing work, Target-Stance Extraction (TSE), which benefits from predefined targets. Unlike TSE, OTSD removes the dependency of a predefined list, making target generation and evaluation more challenging. We also provide a metric for evaluating target quality that correlates well with human judgment. Our experiments reveal that LLMs outperform TSE in target generation when the real target is explicitly and not explicitly mentioned in the text. Likewise, for stance detection, LLMs excel in explicit cases with comparable performance in non-explicit in general.