Goto

Collaborating Authors

 flicker


Why you 'see' things in the dark, according to an ophthalmologist

Popular Science

Why you'see' things in the dark, according to an ophthalmologist Science explains why we see flickers of light and patterns in the darkness. Our eyes sometimes really do play tricks on us at night. Breakthroughs, discoveries, and DIY tips sent every weekday. In 1999, Daniel Myrick and Eduardo Sánchez shot one of the definitive horror films of the era on a budget of roughly $60,000. is a study in omission, in the conspicuous absence of the visual effects characteristic to the genre. In lieu of baroque prosthetic gore and over-the-top CGI effects, the movie leans into silence and darkness for much of its 81-minute run time.


Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents

Xie, Wenda, Guo, Chao, Wang, Yanqing Jing. Junle, Lv, Yisheng, Wang, Fei-Yue

arXiv.org Artificial Intelligence

Although LLMs have been widely adopted for creative content generation, a single-pass process often struggles to produce high-quality long narratives. How to effectively revise and improve long narrative scripts like scriptwriters remains a significant challenge, as it demands a comprehensive understanding of the entire context to identify global structural issues and local detailed flaws, as well as coordinating revisions at multiple granularities and locations. Direct modifications by LLMs typically introduce inconsistencies between local edits and the overall narrative requirements. To address these issues, we propose Dramaturge, a task and feature oriented divide-and-conquer approach powered by hierarchical multiple LLM agents. It consists of a Global Review stage to grasp the overall storyline and structural issues, a Scene-level Review stage to pinpoint detailed scene and sentence flaws, and a Hierarchical Coordinated Revision stage that coordinates and integrates structural and detailed improvements throughout the script. The top-down task flow ensures that high-level strategies guide local modifications, maintaining contextual consistency. The review and revision workflow follows a coarse-to-fine iterative process, continuing through multiple rounds until no further substantive improvements can be made. Comprehensive experiments show that Dra-maturge significantly outperforms all baselines in terms of script-level overall quality and scene-level details. Our approach is plug-and-play and can be easily integrated into existing methods to improve the generated scripts.



Continuous Rating as Reliable Human Evaluation of Simultaneous Speech Translation

Javorský, Dávid, Macháček, Dominik, Bojar, Ondřej

arXiv.org Artificial Intelligence

Simultaneous speech translation (SST) can be evaluated on simulated online events where human evaluators watch subtitled videos and continuously express their satisfaction by pressing buttons (so called Continuous Rating). Continuous Rating is easy to collect, but little is known about its reliability, or relation to comprehension of foreign language document by SST users. In this paper, we contrast Continuous Rating with factual questionnaires on judges with different levels of source language knowledge. Our results show that Continuous Rating is easy and reliable SST quality assessment if the judges have at least limited knowledge of the source language. Our study indicates users' preferences on subtitle layout and presentation style and, most importantly, provides a significant evidence that users with advanced source language knowledge prefer low latency over fewer re-translations.



The First Comprehensive Dataset with Multiple Distortion Types for Visual Just-Noticeable Differences

Liu, Yaxuan, Jin, Jian, Xue, Yuan, Lin, Weisi

arXiv.org Artificial Intelligence

Recently, with the development of deep learning, a number of Just Noticeable Difference (JND) datasets have been built for JND modeling. However, all the existing JND datasets only label the JND points based on the level of compression distortion. Hence, JND models learned from such datasets can only be used for image/video compression. As known, JND is a major characteristic of the human visual system (HVS), which reflects the maximum visual distortion that the HVS can tolerate. Hence, a generalized JND modeling should take more kinds of distortion types into account. To benefit JND modeling, this work establishes a generalized JND dataset with a coarse-to-fine JND selection, which contains 106 source images and 1,642 JND maps, covering 25 distortion types. To this end, we proposed a coarse JND candidate selection scheme to select the distorted images from the existing Image Quality Assessment (IQA) datasets as JND candidates instead of generating JND maps ourselves. Then, a fine JND selection is carried out on the JND candidates with a crowdsourced subjective assessment.


Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation

Amrhein, Chantal, Haddow, Barry

arXiv.org Artificial Intelligence

For real-life applications, it is crucial that end-to-end spoken language translation models perform well on continuous audio, without relying on human-supplied segmentation. For online spoken language translation, where models need to start translating before the full utterance is spoken, most previous work has ignored the segmentation problem. In this paper, we compare various methods for improving models' robustness towards segmentation errors and different segmentation strategies in both offline and online settings and report results on translation quality, flicker and delay. Our findings on five different language pairs show that a simple fixed-window audio segmentation can perform surprisingly well given the right conditions.


Simultaneous Translation for Unsegmented Input: A Sliding Window Approach

Sen, Sukanta, Bojar, Ondřej, Haddow, Barry

arXiv.org Artificial Intelligence

In the cascaded approach to spoken language translation (SLT), the ASR output is typically punctuated and segmented into sentences before being passed to MT, since the latter is typically trained on written text. However, erroneous segmentation, due to poor sentence-final punctuation by the ASR system, leads to degradation in translation quality, especially in the simultaneous (online) setting where the input is continuously updated. To reduce the influence of automatic segmentation, we present a sliding window approach to translate raw ASR outputs (online or offline) without needing to rely on an automatic segmenter. We train translation models using parallel windows (instead of parallel sentences) extracted from the original training data. At test time, we translate at the window level and join the translated windows using a simple approach to generate the final translation. Experiments on English-to-German and English-to-Czech show that our approach improves 1.3--2.0 BLEU points over the usual ASR-segmenter pipeline, and the fixed-length window considerably reduces flicker compared to a baseline retranslation-based online SLT system.


MeetDot: Videoconferencing with Live Translation Captions

Arkhangorodsky, Arkady, Chu, Christopher, Fang, Scot, Huang, Yiqi, Jiang, Denglin, Nagesh, Ajay, Zhang, Boliang, Knight, Kevin

arXiv.org Artificial Intelligence

We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated evaluation suite to optimize key intrinsic evaluation metrics such as accuracy, latency and erasure. Finally, we present an innovative cross-lingual word-guessing game as an extrinsic evaluation metric to measure end-to-end system performance. We plan to make our system open-source for research purposes.


"Unconditional Belief in Heat," by Anna Journey

The New Yorker

I would've stabbed the man's hand had he not jerked it away--this is what I usually say toward the end of the story. I've told for almost twenty years, I'm a junior in college towelling my wet hair as I walk from my bathroom through the hall, headed to my bedroom, at two in the morning. I see you, motherfucker, and the hand jerks back. When I call 911 and reach, incredibly, a busy signal, I phone Ed instead, who will drive over, remove his old A.C. unit, take it to his new place. I would've stabbed the hand that tried to steal my A.C.