dev
Prompt Optimization as a State-Space Search Problem
Language Models are extremely susceptible to performance collapse with even small changes to input prompt strings. Libraries such as DSpy (from Stanford NLP) avoid this problem through demonstration-based prompt optimisation. Inspired by this, I propose an alternative approach that treats prompt optimisation as a classical state-space search problem. I model the prompt space as a graph where nodes represent prompt states and edges correspond to deliberate transformations such as shortening, adding examples, or re- ordering content. Using beam search and random walk algorithms, I systematically explore this space, evaluating candidates on development sets and pruning unpromising branches. Across five NLP tasks (sentiment classification, question answering, summarisation, reason- ing, and natural language inference), I find that even shallow search configurations (beam width=2, depth=2) improve upon seed prompts on development sets. For instance, beam search achieves development accuracy gains from 0.40 to 0.80 on reasoning tasks, though test set improvements are more modest (0.20 to 0.50), indicating overfitting to the develop- ment heuristic. Analysis of successful optimisation paths reveals that transformations that make prompts concise appear most frequently, while verbosity operators are never selected. My results validate prompt optimization as a search problem and suggest that with greater computational resources and improved evaluation metrics, deeper exploration could yield more robust prompts that generalize beyond development sets. Code and implementation are available at [https://github.com/MaanasTaneja/PromptOptimiser].
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Austria (0.04)
- Asia > China > Beijing > Beijing (0.04)
Steiner Traveling Salesman Problem with Quantum Annealing
Ciacco, Alessia, Guerriero, Francesca, Osaba, Eneko
The Steiner Traveling Salesman Problem (STSP) is a variant of the classical Traveling Salesman Problem. The STSP involves incorporating steiner nodes, which are extra nodes not originally part of the required visit set but that can be added to the route to enhance the overall solution and minimize the total travel cost. Given the NP-hard nature of the STSP, we propose a quantum approach to address it. Specifically, we employ quantum annealing using D-Wave's hardware to explore its potential for solving this problem. To enhance computational feasibility, we develop a preprocessing method that effectively reduces the network size. Our experimental results demonstrate that this reduction technique significantly decreases the problem complexity, making the Quadratic Unconstrained Binary Optimization formulation, the standard input for quantum annealers, better suited for existing quantum hardware. Furthermore, the results highlight the potential of quantum annealing as a promising and innovative approach for solving the STSP.
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.05)
- Europe > Italy > Calabria (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Dissecting Transformers: A CLEAR Perspective towards Green AI
Jain, Hemang, Goyal, Shailender, Pandey, Divyansh, Vaidhyanathan, Karthik
The rapid adoption of Large Language Models (LLMs) has raised significant environmental concerns. Unlike the one-time cost of training, LLM inference occurs continuously at a global scale and now dominates the AI energy footprint. Yet, most sustainability studies report only coarse, model-level metrics due to the lack of fine-grained measurement methods, treating energy efficiency more as an afterthought than as a primary objective. We present the first fine-grained empirical analysis of inference energy across core components of transformer architecture. We propose a novel methodology, Component-Level Energy Assessment via Repeated sampling (CLEAR), to overcome temporal mismatch between microsecond scale component execution and monitoring of millisecond (ms) scale energy sensors. Using CLEAR, we evaluate 15 models spanning four distinct architecture types and consistently keep component-wise energy variance below 9.5\% while capturing more than 90\% of the model's total energy as individual components. Our empirical analysis reveals that Attention blocks consume significantly more energy per floating-point operation (FLOP), indicating that energy consumption is not proportionally aligned with FLOP counts. This shows that FLOPs alone fail to capture the true energy cost at a component level. Our findings establish detailed component-level energy baselines and provide insight as an initial step to build energy-efficient transformer models through component-level optimizations.
- North America > United States (0.28)
- Asia > India > Telangana > Hyderabad (0.04)
Five years later, has sci-fi cult hit Devs aged well?
March 2020 was an inauspicious time, I think we can agree. This may be why Devs, an eight-part sci-fi series by Alex Garland that debuted as the world went into lockdown, didn't attract as large an audience as it could have – we certainly had other things to worry about. I was, I confess, one of the many people who missed it. There are lots of reasons why I have recently rectified that: Garland was on my mind after watching 28 Years Later, for which he wrote the screenplay, and the cold, dark world of Devs was also the perfect antidote to the heatwave this column was written under. But the main reason is that five strange years have passed since the show aired, and I was intrigued to see how it looked, at half a decade's remove.
- Media > Film (0.36)
- Leisure & Entertainment (0.36)
DESign: Dynamic Context-Aware Convolution and Efficient Subnet Regularization for Continuous Sign Language Recognition
Liu, Sheng, Yu, Yiheng, Feng, Yuan, Xu, Min, Jin, Zhelun, Jiang, Yining, Yuan, Tiantian
Although dynamic convolutions are ideal for this task, they mainly focus on spatial modeling and fail to capture the temporal dynamics and contextual dependencies. T o address this, we propose DESign, a novel framework that incorporates Dynamic Context-A ware Convolution (DCAC) and Subnet Regularization Connectionist T emporal Classification (SR-CTC). DCAC dynamically captures the inter-frame motion cues that constitute signs and uniquely adapts convolutional weights in a fine-grained manner based on contextual information, enabling the model to better generalize across diverse signing behaviors and boost recognition accuracy. Furthermore, we observe that existing methods still rely on only a limited number of frames for parameter updates during training, indicating that CTC learning overfits to a dominant path. T o address this, SR-CTC regularizes training by applying supervision to subnetworks, encouraging the model to explore diverse CTC alignment paths and effectively preventing overfitting. A classifier-sharing strategy in SR-CTC further strengthens multi-scale consistency. Notably, SR-CTC introduces no inference overhead and can be seamlessly integrated into existing CSLR models to boost performance. Extensive ablations and visualizations further validate the effectiveness of the proposed methods. Results on mainstream CSLR datasets (i.e., PHOENIX14, PHOENIX14-T, CSL-Daily) demonstrate that DESign achieves state-of-the-art performance.
- Europe > United Kingdom > England (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Europe > France (0.04)
Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
Macháček, Dominik, Polák, Peter
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the performance by prompting to inject in-domain terminology, and we accommodate context. Our cascaded systems further use EuroLLM for unbounded simultaneous translation. Compared to the Organizers' baseline, our systems improve by 2 BLEU points on Czech to English and 13-22 BLEU points on English to German, Chinese and Japanese on the development sets. Additionally, we also propose a new enhanced measure of speech recognition latency.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Czechia (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (3 more...)
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?
Góngora, Santiago, Sastre, Ignacio, Robaina, Santiago, Remersaro, Ignacio, Chiruzzo, Luis, Rosá, Aiala
In this paper, we present the RETUYT-INCO participation at the BEA 2025 shared task. Our participation was characterized by the decision of using relatively small models, with fewer than 1B parameters. This self-imposed restriction tries to represent the conditions in which many research labs or institutions are in the Global South, where computational power is not easily accessible due to its prohibitive cost. Even under this restrictive self-imposed setting, our models managed to stay competitive with the rest of teams that participated in the shared task. According to the $exact\ F_1$ scores published by the organizers, the performance gaps between our models and the winners were as follows: $6.46$ in Track 1; $10.24$ in Track 2; $7.85$ in Track 3; $9.56$ in Track 4; and $13.13$ in Track 5. Considering that the minimum difference with a winner team is $6.46$ points -- and the maximum difference is $13.13$ -- according to the $exact\ F_1$ score, we find that models with a size smaller than 1B parameters are competitive for these tasks, all of which can be run on computers with a low-budget GPU or even without a GPU.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
An approach to melodic segmentation and classification based on filtering with the Haar-wavelet
Velarde, Gissel, Weyde, Tillman, Meredith, David
We present a novel method of classification and segmentation of melodies in symbolic representation. The method is based on filtering pitch as a signal over time with the Haar-wavelet, and we evaluate it on two tasks. The filtered signal corresponds to a single-scale signal ws from the continuous Haar wavelet transform. The melodies are first segmented using local maxima or zero-crossings of w_s. The segments of w_s are then classified using the k-nearest neighbour algorithm with Euclidian and city-block distances. The method proves more effective than using unfiltered pitch signals and Gestalt-based segmentation when used to recognize the parent works of segments from Bach's Two-Part Inventions (BWV 772-786). When used to classify 360 Dutch folk tunes into 26 tune families, the performance of the method is comparable to the use of pitch signals, but not as good as that of string-matching methods based on multiple features.
- Europe > Denmark > North Jutland > Aalborg (0.40)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (14 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)