AITopics

Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies. Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, We will illustrate how LLM struggles between the tremendous need for a longer context and its equal need to accept the fact that it is ultimately finite. To achieve this, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we will present 10 unanswered questions currently faced by long-context LLMs. We hope this survey can serve as a systematic introduction to the research on long-context LLMs.

downstream performance, extending context window, language model inference, (17 more...)

2502.17129

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.45)

Industry:

Energy (0.92)
Education > Educational Setting > Continuing Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bal, Melis Ilayda, Cevher, Volkan, Muehlebach, Michael

Adversarial Training for Defense Against Label Poisoning Attacks

As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks. These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications. In this paper, we propose FLORAL, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an attacker, who strategically poisons critical training labels, and the model, which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training. We provide a theoretical analysis of our algorithm's convergence properties and empirically evaluate FLORAL's effectiveness across diverse classification tasks. Compared to robust baselines and foundation models such as RoBERTa, FLORAL consistently achieves higher robust accuracy under increasing attacker budgets. These results underscore the potential of FLORAL to enhance the resilience of machine learning models against label poisoning threats, thereby ensuring robust classification in adversarial settings.

accuracy, adv, dataset, (15 more...)

2502.17121

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications

Fayad, Ibrahim, Zimmer, Max, Schwartz, Martin, Ciais, Philippe, Gieseke, Fabian, Belouze, Gabriel, Brood, Sarah, De Truchis, Aurelien, d'Aspremont, Alexandre

Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.

cross-modal alignment, pixel-sized embedding, waveform, (11 more...)

2502.17066

Country:

Europe > France (0.05)
South America > Brazil > Mato Grosso (0.04)
Europe > Portugal > Braga > Braga (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Züfle, Maike, Papi, Sara, Savoldi, Beatrice, Gaido, Marco, Bentivogli, Luisa, Niehues, Jan

Scientific communication is receiving increasing attention in natural language processing, especially to help researches access, summarize, and generate content. One emerging application in this area is Speech-to-Abstract Generation (SAG), which aims to automatically generate abstracts from recorded scientific presentations. SAG enables researchers to efficiently engage with conference talks, but progress has been limited by a lack of large-scale datasets. To address this gap, we introduce NUTSHELL, a novel multimodal dataset of *ACL conference talks paired with their corresponding abstracts. We establish strong baselines for SAG and evaluate the quality of generated abstracts using both automatic metrics and human judgments. Our results highlight the challenges of SAG and demonstrate the benefits of training on NUTSHELL. By releasing NUTSHELL under an open license (CC-BY 4.0), we aim to advance research in SAG and foster the development of improved models and evaluation methods.

computational linguistic, dataset, evaluation, (16 more...)

2502.16942

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(11 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

Aavang, Rasmus, Rizzi, Giovanni, Bøggild, Rasmus, Iolov, Alexandre, Zhang, Mike, Bjerva, Johannes

The U.S. Securities and Exchange Commission (SEC) requires that public companies file financial reports tagging numbers with the machine readable inline eXtensible Business Reporting Language (iXBRL) standard. However, the highly complex and highly granular taxonomy defined by iXBRL limits label transferability across domains. In this paper, we introduce the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset, designed to facilitate numerical KPI extraction at specified levels of granularity from unstructured financial text. Our approach organizes a 218,126-label hierarchy using a taxonomy based grouping method, investigating which taxonomy layer provides the most meaningful structure. HiFi-KPI comprises ~1.8M paragraphs and ~5M entities, each linked to a label in the iXBRL-specific calculation and presentation taxonomies. We provide baselines using encoder-based approaches and structured extraction using Large Language Models (LLMs). To simplify LLM inference and evaluation, we additionally release HiFi-KPI Lite, a manually curated subset with four expert-mapped labels. We publicly release all artifacts.

dataset, taxonomy, zhang, (16 more...)

2502.15411

Country:

North America > United States (0.89)
South America (0.04)
North America > Central America (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.88)
Law > Business Law (0.68)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Wen, Yuxiao, Han, Yanjun, Zhou, Zhengyuan

Joint Value Estimation and Bidding in Repeated First-Price Auctions

arXiv.org Machine LearningFeb-24-2025

We study regret minimization in repeated first-price auctions (FPAs), where a bidder observes only the realized outcome after each auction -- win or loss. This setup reflects practical scenarios in online display advertising where the actual value of an impression depends on the difference between two potential outcomes, such as clicks or conversion rates, when the auction is won versus lost. We analyze three outcome models: (1) adversarial outcomes without features, (2) linear potential outcomes with features, and (3) linear treatment effects in features. For each setting, we propose algorithms that jointly estimate private values and optimize bidding strategies, achieving near-optimal regret bounds. Notably, our framework enjoys a unique feature that the treatments are also actively chosen, and hence eliminates the need for the overlap condition commonly required in causal inference.

assumption 2, auction, potential outcome, (15 more...)

arXiv.org Machine Learning

2502.17292

Country:

North America > United States > New York (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > Strength High (0.67)

Industry: Information Technology > Services (0.87)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Rathore, Vidhi, Manwani, Naresh

Achieving Fair PCA Using Joint Eigenvalue Decomposition

arXiv.org Machine LearningFeb-24-2025

Principal Component Analysis (PCA) is a widely used method for dimensionality reduction, but it often overlooks fairness, especially when working with data that includes demographic characteristics. This can lead to biased representations that disproportionately affect certain groups. To address this issue, our approach incorporates Joint Eigenvalue Decomposition (JEVD), a technique that enables the simultaneous diagonalization of multiple matrices, ensuring both fair and efficient representations. We formally show that the optimal solution of JEVD leads to a fair PCA solution. By integrating JEVD with PCA, we strike an optimal balance between preserving data structure and promoting fairness across diverse groups. We demonstrate that our method outperforms existing baseline approaches in fairness and representational quality on various datasets. It retains the core advantages of PCA while ensuring that sensitive demographic attributes do not create disparities in the reduced representation.

algorithm, matrix, pca, (11 more...)

arXiv.org Machine Learning

2502.16933

Country:

South America > Peru (0.04)
South America > Colombia (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Law (0.68)

Technology:

Information Technology > Data Science > Data Mining (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

BBC NewsFeb-23-2025, 12:21:12 GMT

Trump right to engage Putin on peace talks, says minister

US President Donald Trump was right to re-establish links with Russian leader Vladimir Putin to set up peace talks to end the war in Ukraine, a senior Labour minister has said. Education Secretary Bridget Phillipson said there could be "no negotiated peace without Russia" and that Trump's approach had brought "Russians to the table". The US president has faced a backlash for excluding Ukraine from talks after his aides met Russian officials in Saudi Arabia this week. Trump has also suggested Ukraine may be a bystander, saying it has "no cards" in the deal. Prime Minister Sir Keir Starmer will meet Trump in Washington this week and press for Ukraine to be "at the heart" of any peace talks.

artificial intelligence, peace talk, trump, (17 more...)

BBC News

Country:

North America > United States (1.00)
Europe > Ukraine (1.00)
Asia > Russia (1.00)
(17 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government > Russia Government (1.00)
Government > Regional Government > Asia Government > Russia Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.32)

arXiv.org Artificial IntelligenceFeb-23-2025

Facilitating Emergency Vehicle Passage in Congested Urban Areas Using Multi-agent Deep Reinforcement Learning

Su, Haoran

Emergency Response Time (ERT) is crucial for urban safety, measuring cities' ability to handle medical, fire, and crime emergencies. In NYC, medical ERT increased 72% from 7.89 minutes in 2014 to 14.27 minutes in 2024, with half of delays due to Emergency Vehicle (EMV) travel times. Each minute's delay in stroke response costs 2 million brain cells, while cardiac arrest survival drops 7-10% per minute. This dissertation advances EMV facilitation through three contributions. First, EMVLight, a decentralized multi-agent reinforcement learning framework, integrates EMV routing with traffic signal pre-emption. It achieved 42.6% faster EMV travel times and 23.5% improvement for other vehicles. Second, the Dynamic Queue-Jump Lane system uses Multi-Agent Proximal Policy Optimization for coordinated lane-clearing in mixed autonomous and human-driven traffic, reducing EMV travel times by 40%. Third, an equity study of NYC Emergency Medical Services revealed disparities across boroughs: Staten Island faces delays due to sparse signalized intersections, while Manhattan struggles with congestion. Solutions include optimized EMS stations and improved intersection designs. These contributions enhance EMV mobility and emergency service equity, offering insights for policymakers and urban planners to develop safer, more efficient transportation systems.

machine learning, natural language, reinforcement learning, (21 more...)

2502.16449

Country:

Asia > China (0.46)
North America > United States > Michigan (0.27)
North America > United States > New York > Richmond County > New York City (0.24)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > Experimental Study (0.67)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wienczkowski, Michael, Desta, Addisu, Ugochukwu, Paschal

Geometric Properties and Graph-Based Optimization of Neural Networks: Addressing Non-Linearity, Dimensionality, and Scalability

arXiv.org Artificial IntelligenceFeb-23-2025

Chronological Overview Table of Key Advancements in Graph-Based Neural Networks V. PROBLEM STATEMENT The key issue addressed in this research is the limited understanding of the geometric properties of neural networks, which affects both their interpretability and efficiency. The complexity of the network's geometry influences its learning process, impacting both optimization and generalization. This problem is significant because better geometric interpretations of neural networks can lead to improvements in various tasks, such as classification, optimization, and shape representation. A central challenge is the lack of understanding of the structure of data manifolds that influence how neural networks perform complex tasks. The geometric structures governing neural networks include the relationships between network layers, activation functions, and data manifolds, which directly impact performance in tasks like classification and optimization. The association between neural networks and geometric structures remains under-explored, and improving this understanding could result in more effective algorithms for managing complex data and optimizing performance. Additionally, the graph structure of neural networks plays a crucial role in their predictive performance, yet there is limited knowledge of how this structure influences accuracy. Optimizing the graph structure of neural networks could enhance their efficiency and generalizability across different datasets, which is also important for future hardware advancements. Ultimately, improving the geometric and structural comprehension of neural networks can lead to more robust and versatile models capable of performing across diverse tasks and platforms.

efficiency, graph, neural network, (15 more...)

2503.05761

Country:

North America > United States > Mississippi > Oktibbeha County > Starkville (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)