South America
Thus Spake Long-Context Large Language Model
Liu, Xiaoran, Li, Ruixiao, Huang, Mianqiu, Liu, Zhigeng, Song, Yuerong, Guo, Qipeng, He, Siyang, Wang, Qiqi, Li, Linlin, Liu, Qun, Zhou, Yaqian, Huang, Xuanjing, Qiu, Xipeng
Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies. Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, We will illustrate how LLM struggles between the tremendous need for a longer context and its equal need to accept the fact that it is ultimately finite. To achieve this, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we will present 10 unanswered questions currently faced by long-context LLMs. We hope this survey can serve as a systematic introduction to the research on long-context LLMs.
Adversarial Training for Defense Against Label Poisoning Attacks
Bal, Melis Ilayda, Cevher, Volkan, Muehlebach, Michael
As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks. These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications. In this paper, we propose FLORAL, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an attacker, who strategically poisons critical training labels, and the model, which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training. We provide a theoretical analysis of our algorithm's convergence properties and empirically evaluate FLORAL's effectiveness across diverse classification tasks. Compared to robust baselines and foundation models such as RoBERTa, FLORAL consistently achieves higher robust accuracy under increasing attacker budgets. These results underscore the potential of FLORAL to enhance the resilience of machine learning models against label poisoning threats, thereby ensuring robust classification in adversarial settings.
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Fayad, Ibrahim, Zimmer, Max, Schwartz, Martin, Ciais, Philippe, Gieseke, Fabian, Belouze, Gabriel, Brood, Sarah, De Truchis, Aurelien, d'Aspremont, Alexandre
Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
Zรผfle, Maike, Papi, Sara, Savoldi, Beatrice, Gaido, Marco, Bentivogli, Luisa, Niehues, Jan
Scientific communication is receiving increasing attention in natural language processing, especially to help researches access, summarize, and generate content. One emerging application in this area is Speech-to-Abstract Generation (SAG), which aims to automatically generate abstracts from recorded scientific presentations. SAG enables researchers to efficiently engage with conference talks, but progress has been limited by a lack of large-scale datasets. To address this gap, we introduce NUTSHELL, a novel multimodal dataset of *ACL conference talks paired with their corresponding abstracts. We establish strong baselines for SAG and evaluate the quality of generated abstracts using both automatic metrics and human judgments. Our results highlight the challenges of SAG and demonstrate the benefits of training on NUTSHELL. By releasing NUTSHELL under an open license (CC-BY 4.0), we aim to advance research in SAG and foster the development of improved models and evaluation methods.
HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Aavang, Rasmus, Rizzi, Giovanni, Bรธggild, Rasmus, Iolov, Alexandre, Zhang, Mike, Bjerva, Johannes
The U.S. Securities and Exchange Commission (SEC) requires that public companies file financial reports tagging numbers with the machine readable inline eXtensible Business Reporting Language (iXBRL) standard. However, the highly complex and highly granular taxonomy defined by iXBRL limits label transferability across domains. In this paper, we introduce the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset, designed to facilitate numerical KPI extraction at specified levels of granularity from unstructured financial text. Our approach organizes a 218,126-label hierarchy using a taxonomy based grouping method, investigating which taxonomy layer provides the most meaningful structure. HiFi-KPI comprises ~1.8M paragraphs and ~5M entities, each linked to a label in the iXBRL-specific calculation and presentation taxonomies. We provide baselines using encoder-based approaches and structured extraction using Large Language Models (LLMs). To simplify LLM inference and evaluation, we additionally release HiFi-KPI Lite, a manually curated subset with four expert-mapped labels. We publicly release all artifacts.
Joint Value Estimation and Bidding in Repeated First-Price Auctions
Wen, Yuxiao, Han, Yanjun, Zhou, Zhengyuan
We study regret minimization in repeated first-price auctions (FPAs), where a bidder observes only the realized outcome after each auction -- win or loss. This setup reflects practical scenarios in online display advertising where the actual value of an impression depends on the difference between two potential outcomes, such as clicks or conversion rates, when the auction is won versus lost. We analyze three outcome models: (1) adversarial outcomes without features, (2) linear potential outcomes with features, and (3) linear treatment effects in features. For each setting, we propose algorithms that jointly estimate private values and optimize bidding strategies, achieving near-optimal regret bounds. Notably, our framework enjoys a unique feature that the treatments are also actively chosen, and hence eliminates the need for the overlap condition commonly required in causal inference.
Achieving Fair PCA Using Joint Eigenvalue Decomposition
Rathore, Vidhi, Manwani, Naresh
Principal Component Analysis (PCA) is a widely used method for dimensionality reduction, but it often overlooks fairness, especially when working with data that includes demographic characteristics. This can lead to biased representations that disproportionately affect certain groups. To address this issue, our approach incorporates Joint Eigenvalue Decomposition (JEVD), a technique that enables the simultaneous diagonalization of multiple matrices, ensuring both fair and efficient representations. We formally show that the optimal solution of JEVD leads to a fair PCA solution. By integrating JEVD with PCA, we strike an optimal balance between preserving data structure and promoting fairness across diverse groups. We demonstrate that our method outperforms existing baseline approaches in fairness and representational quality on various datasets. It retains the core advantages of PCA while ensuring that sensitive demographic attributes do not create disparities in the reduced representation.
Trump right to engage Putin on peace talks, says minister
US President Donald Trump was right to re-establish links with Russian leader Vladimir Putin to set up peace talks to end the war in Ukraine, a senior Labour minister has said. Education Secretary Bridget Phillipson said there could be "no negotiated peace without Russia" and that Trump's approach had brought "Russians to the table". The US president has faced a backlash for excluding Ukraine from talks after his aides met Russian officials in Saudi Arabia this week. Trump has also suggested Ukraine may be a bystander, saying it has "no cards" in the deal. Prime Minister Sir Keir Starmer will meet Trump in Washington this week and press for Ukraine to be "at the heart" of any peace talks.
Facilitating Emergency Vehicle Passage in Congested Urban Areas Using Multi-agent Deep Reinforcement Learning
Emergency Response Time (ERT) is crucial for urban safety, measuring cities' ability to handle medical, fire, and crime emergencies. In NYC, medical ERT increased 72% from 7.89 minutes in 2014 to 14.27 minutes in 2024, with half of delays due to Emergency Vehicle (EMV) travel times. Each minute's delay in stroke response costs 2 million brain cells, while cardiac arrest survival drops 7-10% per minute. This dissertation advances EMV facilitation through three contributions. First, EMVLight, a decentralized multi-agent reinforcement learning framework, integrates EMV routing with traffic signal pre-emption. It achieved 42.6% faster EMV travel times and 23.5% improvement for other vehicles. Second, the Dynamic Queue-Jump Lane system uses Multi-Agent Proximal Policy Optimization for coordinated lane-clearing in mixed autonomous and human-driven traffic, reducing EMV travel times by 40%. Third, an equity study of NYC Emergency Medical Services revealed disparities across boroughs: Staten Island faces delays due to sparse signalized intersections, while Manhattan struggles with congestion. Solutions include optimized EMS stations and improved intersection designs. These contributions enhance EMV mobility and emergency service equity, offering insights for policymakers and urban planners to develop safer, more efficient transportation systems.
Geometric Properties and Graph-Based Optimization of Neural Networks: Addressing Non-Linearity, Dimensionality, and Scalability
Wienczkowski, Michael, Desta, Addisu, Ugochukwu, Paschal
Chronological Overview Table of Key Advancements in Graph-Based Neural Networks V. PROBLEM STATEMENT The key issue addressed in this research is the limited understanding of the geometric properties of neural networks, which affects both their interpretability and efficiency. The complexity of the network's geometry influences its learning process, impacting both optimization and generalization. This problem is significant because better geometric interpretations of neural networks can lead to improvements in various tasks, such as classification, optimization, and shape representation. A central challenge is the lack of understanding of the structure of data manifolds that influence how neural networks perform complex tasks. The geometric structures governing neural networks include the relationships between network layers, activation functions, and data manifolds, which directly impact performance in tasks like classification and optimization. The association between neural networks and geometric structures remains under-explored, and improving this understanding could result in more effective algorithms for managing complex data and optimizing performance. Additionally, the graph structure of neural networks plays a crucial role in their predictive performance, yet there is limited knowledge of how this structure influences accuracy. Optimizing the graph structure of neural networks could enhance their efficiency and generalizability across different datasets, which is also important for future hardware advancements. Ultimately, improving the geometric and structural comprehension of neural networks can lead to more robust and versatile models capable of performing across diverse tasks and platforms.