AITopics | Atlantic Ocean

Collaborating Authors

Atlantic Ocean

From Neva to A Highland Song, the Baftas are a reminder of how creative games can be

The GuardianMar-12-2025, 15:00:46 GMT

It's easy to feel a bit beset by doom these days. The other week, I watched the heinous AI-generated "Trump Gaza" video and was so appalled that I impulse-bought a kayaking guide book. It felt like the only sane response was to take to the water and paddle away. Video games are a reliable antidote to existential doom, but layoffs, corporate homogenisation and AI slop are all encroaching on my safe haven, making it more difficult to get a brief reprieve from what's happening in the outside world. Thank God, then, for the Bafta games awards nominations, which reliably remind me that video games are pretty great, actually.

artificial intelligence, category, monster hunter, (15 more...)

The Guardian

Country:

Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.25)
North America > United States > Indiana (0.05)
Europe > United Kingdom (0.05)
(2 more...)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

Ukraine launches biggest drone attack on Moscow, killing 2, as US talks begin

FOX NewsMar-11-2025, 11:17:05 GMT

Atlantic Council senior fellow Ariel Cohen and Heritage Foundation senior fellow Charles'Cully' Stimson discuss the state of the war amid White House tensions with President Zelenskyy. Ukraine launched its largest-ever drone attack on Moscow on Tuesday as a senior delegation met with Secretary of State Marco Rubio and National Security Advisor Mike Waltz in Saudi Arabia for talks about ending the war with Russia. A total of 337 drones were shot down Tuesday over Russia, including 91 in the Moscow area and 126 in the Kursk region bordering Ukraine, Reuters reported, citing Russia's defense ministry. Moscow-based meat producer Miratorg said two of its employees were killed by falling debris, while 18 other people – including three children – were injured after residential buildings were struck, officials told Reuters. Images taken in Russia showed damage to cars and apartment buildings in the wake of the attack, which temporarily shut down Moscow's four airports.

artificial intelligence, drone attack, moscow, (10 more...)

FOX News

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (1.00)
Asia > Russia (1.00)
North America > United States (0.73)
(4 more...)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.73)
Government > Regional Government > Europe Government > Ukraine Government (0.54)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback

ToolFuzz -- Automated Agent Tool Testing

Milev, Ivan, Balunović, Mislav, Baader, Maximilian, Vechev, Martin

arXiv.org Artificial IntelligenceMar-11-2025

Large Language Model (LLM) Agents leverage the advanced reasoning capabilities of LLMs in real-world applications. To interface with an environment, these agents often rely on tools, such as web search or database APIs. As the agent provides the LLM with tool documentation along the user query, the completeness and correctness of this documentation is critical. However, tool documentation is often over-, under-, or ill-specified, impeding the agent's accuracy. Standard software testing approaches struggle to identify these errors as they are expressed in natural language. Thus, despite its importance, there currently exists no automated method to test the tool documentation for agents. To address this issue, we present ToolFuzz, the first method for automated testing of tool documentations. ToolFuzz is designed to discover two types of errors: (1) user queries leading to tool runtime errors and (2) user queries that lead to incorrect agent responses. ToolFuzz can generate a large and diverse set of natural inputs, effectively finding tool description errors at a low false positive rate. Further, we present two straightforward prompt-engineering approaches. We evaluate all three tool testing approaches on 32 common LangChain tools and 35 newly created custom tools and 2 novel benchmarks to further strengthen the assessment. We find that many publicly available tools suffer from underspecification. Specifically, we show that ToolFuzz identifies 20x more erroneous inputs compared to the prompt-engineering approaches, making it a key component for building reliable AI agents.

agent, documentation, query, (14 more...)

arXiv.org Artificial Intelligence

2503.04479

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > San Jose (0.14)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)
Retail (0.93)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Analysis of Learning-based Offshore Wind Power Prediction Models with Various Feature Combinations

Fang, Linhan, Jiang, Fan, Toms, Ann Mary, Li, Xingpeng

arXiv.org Artificial IntelligenceMar-10-2025

Accurate wind speed prediction is crucial for designing and selecting sites for offshore wind farms. This paper investigates the effectiveness of various machine learning models in predicting offshore wind power for a site near the Gulf of Mexico by analyzing meteorological data. After collecting and preprocessing meteorological data, nine different input feature combinations were designed to assess their impact on wind power predictions at multiple heights. The results show that using wind speed as the output feature improves prediction accuracy by approximately 10% compared to using wind power as the output. In addition, the improvement of multi-feature input compared with single-feature input is not obvious mainly due to the poor correlation among key features and limited generalization ability of models. These findings underscore the importance of selecting appropriate output features and highlight considerations for using machine learning in wind power forecasting, offering insights that could guide future wind power prediction models and conversion techniques.

prediction, wind power, wind speed, (11 more...)

arXiv.org Artificial Intelligence

2503.13493

Country:

North America > Mexico (0.35)
Atlantic Ocean > Gulf of Mexico (0.25)
North America > United States > Texas > Harris County > Houston (0.05)
(9 more...)

Genre: Research Report > New Finding (0.69)

Industry: Energy > Renewable > Wind (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation

Sui, Runqi

arXiv.org Artificial IntelligenceMar-10-2025

Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by integrating external knowledge bases. However, this integration introduces a new security threat: adversaries can exploit the retrieval mechanism to inject malicious content into the knowledge base, thereby influencing the generated responses. Based on this attack vector, we propose CtrlRAG, a novel attack method designed for RAG system in the black-box setting, which aligns with real-world scenarios. Unlike existing attack methods, CtrlRAG introduces a perturbation mechanism using Masked Language Model (MLM) to dynamically optimize malicious content in response to changes in the retrieved context. Experimental results demonstrate that CtrlRAG outperforms three baseline methods in both Emotional Manipulation and Hallucination Amplification objectives. Furthermore, we evaluate three existing defense mechanisms, revealing their limited effectiveness against CtrlRAG and underscoring the urgent need for more robust defenses.

arxiv preprint arxiv, malicious text, rag system, (15 more...)

arXiv.org Artificial Intelligence

2503.0695

Country:

Africa > Ghana (0.28)
North America > United States (0.14)
Asia > South Korea (0.05)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments

Tang, Wenjie, Zhou, Yuan, Xu, Erqiang, Cheng, Keyan, Li, Minne, Xiao, Liquan

arXiv.org Artificial IntelligenceMar-7-2025

Large Language Model~(LLM) based agents have been increasingly popular in solving complex and dynamic tasks, which requires proper evaluation systems to assess their capabilities. Nevertheless, existing benchmarks usually either focus on single-objective tasks or use overly broad assessing metrics, failing to provide a comprehensive inspection of the actual capabilities of LLM-based agents in complicated decision-making tasks. To address these issues, we introduce DSGBench, a more rigorous evaluation platform for strategic decision-making. Firstly, it incorporates six complex strategic games which serve as ideal testbeds due to their long-term and multi-dimensional decision-making demands and flexibility in customizing tasks of various difficulty levels or multiple targets. Secondly, DSGBench employs a fine-grained evaluation scoring system which examines the decision-making capabilities by looking into the performance in five specific dimensions and offering a comprehensive assessment in a well-designed way. Furthermore, DSGBench also incorporates an automated decision-tracking mechanism which enables in-depth analysis of agent behaviour patterns and the changes in their strategies. We demonstrate the advances of DSGBench by applying it to multiple popular LLM-based agents and our results suggest that DSGBench provides valuable insights in choosing LLM-based agents as well as improving their future development. DSGBench is available at https://github.com/DeciBrain-Group/DSGBench.

agent, llm-based agent, opponent, (16 more...)

arXiv.org Artificial Intelligence

2503.06047

Country:

Asia > Russia (0.05)
Europe > United Kingdom > England (0.05)
Europe > Germany (0.05)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government > Military (1.00)
Leisure & Entertainment > Sports (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

NaijaNLP: A Survey of Nigerian Low-Resource Languages

Inuwa-Dutse, Isa

arXiv.org Artificial IntelligenceMar-6-2025

With over 500 languages in Nigeria, three languages -- Hausa, Yor\`ub\'a and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yor\`ub\'a, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the accurate representation of diacritics, remain under-explored. To advance NaijaNLP and LR-NLP more broadly, we emphasise the need for intensified efforts in resource enrichment, comprehensive annotation, and the development of open collaborative initiatives.

arxiv preprint arxiv, dataset, naijanlp, (14 more...)

arXiv.org Artificial Intelligence

2502.19784

Country:

Africa > Niger (0.14)
Africa > Cameroon (0.14)
Africa > Nigeria > Jigawa State > Dutse (0.05)
(29 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.46)
Media > News (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(3 more...)

Add feedback

Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation

Njifenjou, Ahmed, Sucal, Virgile, Jabaian, Bassam, Lefèvre, Fabrice

arXiv.org Artificial IntelligenceMar-5-2025

The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.

computational linguistic, persona, personachat, (10 more...)

arXiv.org Artificial Intelligence

2503.03462

Country:

North America > United States > California (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada > Ontario > Toronto (0.04)
(28 more...)

Genre:

Personal (0.67)
Research Report (0.63)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Kuznetsov, Kristian, Kushnareva, Laida, Druzhinina, Polina, Razzhigaev, Anton, Voznyuk, Anastasia, Piontkovskaya, Irina, Burnaev, Evgeny, Barannikov, Serguei

arXiv.org Artificial IntelligenceMar-5-2025

Artificial Text Detection (ATD) is becoming increasingly important with the rise of advanced Large Language Models (LLMs). Despite numerous efforts, no single algorithm performs consistently well across different types of unseen text or guarantees effective generalization to new LLMs. Interpretability plays a crucial role in achieving this goal. In this study, we enhance ATD interpretability by using Sparse Autoencoders (SAE) to extract features from Gemma-2-2b residual stream. We identify both interpretable and efficient features, analyzing their semantics and relevance through domain- and model-specific statistics, a steering approach, and manual or LLM-based interpretation. Our methods offer valuable insights into how texts from various models differ from human-written content. We show that modern LLMs have a distinct writing style, especially in information-dense domains, even though they can produce human-like outputs with personalized prompts.

arxiv, dataset, strong strengthening, (13 more...)

arXiv.org Artificial Intelligence

2503.03601

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Africa (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media (0.68)
Health & Medicine > Therapeutic Area (0.68)
Government > Regional Government > North America Government > United States Government (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems

Lapkovskis, Alfreds, Sedlak, Boris, Magnússon, Sindri, Dustdar, Schahram, Donta, Praveen Kumar

arXiv.org Artificial IntelligenceMar-5-2025

Ensuring Service Level Objectives (SLOs) in large-scale architectures, such as Distributed Computing Continuum Systems (DCCS), is challenging due to their heterogeneous nature and varying service requirements across different devices and applications. Additionally, unpredictable workloads and resource limitations lead to fluctuating performance and violated SLOs. To improve SLO compliance in DCCS, one possibility is to apply machine learning; however, the design choices are often left to the developer. To that extent, we provide a benchmark of Active Inference -- an emerging method from neuroscience -- against three established reinforcement learning algorithms (Deep Q-Network, Advantage Actor-Critic, and Proximal Policy Optimization). We consider a realistic DCCS use case: an edge device running a video conferencing application alongside a WebSocket server streaming videos. Using one of the respective algorithms, we continuously monitor key performance metrics, such as latency and bandwidth usage, to dynamically adjust parameters -- including the number of streams, frame rate, and resolution -- to optimize service quality and user experience. To test algorithms' adaptability to constant system changes, we simulate dynamically changing SLOs and both instant and gradual data-shift scenarios, such as network bandwidth limitations and fluctuating device thermal states. Although the evaluated algorithms all showed advantages and limitations, our findings demonstrate that Active Inference is a promising approach for ensuring SLO compliance in DCCS, offering lower memory usage, stable CPU utilization, and fast convergence.

algorithm, configuration, slo compliance, (14 more...)

arXiv.org Artificial Intelligence

2503.03274

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Spain (0.04)
Europe > Austria > Vienna (0.04)
Atlantic Ocean > North Atlantic Ocean > Baltic Sea (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback