AITopics

2503.22119

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)

Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead

Schlegel, Viktor, Bharath, Anil A, Zhao, Zilong, Yee, Kevin

Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive framework for understanding the landscape of privacy-preserving synthetic data, presenting the theoretical foundations of generative models and differential privacy followed by a review of state-of-the-art methods across tabular data, images, and text. Our synthesis of evaluation approaches highlights the fundamental trade-off between utility for down-stream tasks and privacy guarantees, while identifying critical research gaps: the lack of realistic benchmarks representing specialized domains and insufficient empirical evaluations required to contextualise formal guarantees. Through empirical analysis of four leading methods on five real-world datasets from specialized domains, we demonstrate significant performance degradation under realistic privacy constraints ($\epsilon \leq 4$), revealing a substantial gap between results reported on general domain benchmarks and performance on domain-specific data. %Our findings highlight key challenges including unaccounted privacy leakage, insufficient empirical verification of formal guarantees, and a critical deficit of realistic benchmarks. These challenges underscore the need for robust evaluation frameworks, standardized benchmarks for specialized domains, and improved techniques to address the unique requirements of privacy-sensitive fields such that this technology can deliver on its considerable potential.

knowledge management, large language model, machine learning, (22 more...)

2503.20846

Country:

Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Knowledge Management (1.00)
Information Technology > Information Management (1.00)
(9 more...)

Rothwell, Andrew, Moorkens, Joss, Svoboda, Tomas

Training in translation tools and technologies: Findings of the EMT survey 2023

This article reports on the third iteration of a survey of computerized tools and technologies taught as part of postgraduate translation training programmes. While the survey was carried out under the aegis of the EMT Network, more than half of responses are from outside that network. The results show the responsiveness of programmes to innovations in translation technology, with increased compulsory inclusion of machine translation, post-editing, and quality evaluation, and a rapid response to the release of generative tools. The flexibility required during the Covid-19 pandemic has also led to some lasting changes to programmes. While the range of tools being taught has continued to expand, programmes seem to be consolidating their core offering around cloud-based software with cost-free academic access. There has also been an increase in the embedding of professional contexts and workflows associated with translation technology. Generic file management and data security skills have increased in perceived importance, and legal and ethical issues related to translation data have also become more prominent. In terms of course delivery the shift away from conventional labs identified in EMT2017 has accelerated markedly, no doubt partly driven by the pandemic, accompanied by a dramatic expansion in the use of students' personal devices.

machine learning, natural language, programme, (20 more...)

2503.22735

Country:

Europe > United Kingdom (0.14)
Europe > Ireland (0.04)
Europe > Spain (0.04)
(25 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Information Technology > Security & Privacy (0.88)
Education > Educational Setting > Online (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Batanero, Eloy Anguiano, Fernández, Ángela, Barbero, Álvaro

Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control

The increasing complexity of power grid management, driven by the emergence of prosumers and the demand for cleaner energy solutions, has needed innovative approaches to ensure stability and efficiency. This paper presents a novel approach within the model-free framework of reinforcement learning, aimed at optimizing power network operations without prior expert knowledge. We introduce a masked topological action space, enabling agents to explore diverse strategies for cost reduction while maintaining reliable service using the state logic as a guide for choosing proper actions. Through extensive experimentation across 20 different scenarios in a simulated 5-substation environment, we demonstrate that our approach achieves a consistent reduction in power losses, while ensuring grid stability against potential blackouts. The results underscore the effectiveness of combining dynamic observation formalization with opponent-based training, showing a viable way for autonomous management solutions in modern energy systems or even for building a foundational model for this field.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2503.20688

Country:

Europe > Spain > Galicia > Madrid (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Promising Solution (0.86)
Overview > Innovation (0.68)
Research Report > New Finding (0.66)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Geographical hotspot prediction based on point cloud-voxel-community partition clustering

Tang, Yan

Existing solutions to the hotspot prediction problem in the field of geographic information remain at a relatively preliminary stage. This study presents a novel approach for detecting and predicting geographical hotspots, utilizing point cloud-voxel-community partition clustering. By analyzing high-dimensional data, we represent spatial information through point clouds, which are then subdivided into multiple voxels to enhance analytical efficiency. Our method identifies spatial voxels with similar characteristics through community partitioning, thereby revealing underlying patterns in hotspot distributions. Experimental results indicate that when applied to a dataset of archaeological sites in Turkey, our approach achieves a 19.31% increase in processing speed, with an accuracy loss of merely 6%, outperforming traditional clustering methods. This method not only provides a fresh perspective for hotspot prediction but also serves as an effective tool for high-dimensional data analysis.

artificial intelligence, hotspot prediction, machine learning, (14 more...)

2503.21084

Country:

Asia > Middle East > Republic of Türkiye (0.26)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models

Guo, Siyuan, Liu, Huiwu, Chen, Xiaolong, Xie, Yuming, Zhang, Liang, Han, Tao, Chen, Hechang, Chang, Yi, Wang, Jun

In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding the dynamically evolving code structure of the target software. To achieve this, we propose a case-based reasoning (CBR) system utilizing a 4R cycle (i.e., retrieve, reuse, revise, and retain), which maintains and leverages a case bank of test intent descriptions and corresponding test scripts to facilitate LLMs for test script generation. To improve user experience further, we introduce Re4, an optimization method for the CBR system, comprising reranking-based retrieval finetuning and reinforced reuse finetuning. Specifically, we first identify positive examples with high semantic and script similarity, providing reliable pseudo-labels for finetuning the retriever model without costly labeling. Then, we apply supervised finetuning, followed by a reinforcement learning finetuning stage, to align LLMs with our production scenarios, ensuring the faithful reuse of retrieved cases. Extensive experimental results on two product development units from Huawei Datacom demonstrate the superiority of the proposed CBR+Re4. Notably, we also show that the proposed Re4 method can help alleviate the repetitive generation issues with LLMs.

large language model, machine learning, natural language, (17 more...)

2503.20576

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Jilin Province > Changchun (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Ferrag, Mohamed Amine, Tihanyi, Norbert, Debbah, Merouane

Reasoning Beyond Limits: Advances and Open Problems for LLMs

Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2503.22732

Country:

North America > United States (0.27)
Asia > Middle East > UAE (0.04)
Africa > Middle East > Algeria > Guelma Province > Guelma (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.92)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Maratea, Antonio, Perna, Rita

Active Data Sampling and Generation for Bias Remediation

Adequate sampling space coverage is the keystone to effectively train trustworthy Machine Learning models. Unfortunately, real data do carry several inherent risks due to the many potential biases they exhibit when gathered without a proper random sampling over the reference population, and most of the times this is way too expensive or time consuming to be a viable option. Depending on how training data have been gathered, unmitigated biases can lead to harmful or discriminatory consequences that ultimately hinders large scale applicability of pre-trained models and undermine their truthfulness or fairness expectations. In this paper, a mixed active sampling and data generation strategy -- called samplation -- is proposed as a mean to compensate during fine-tuning of a pre-trained classifer the unfair classifications it produces, assuming that the training data come from a non-probabilistic sampling schema. Given a pre-trained classifier, first a fairness metric is evaluated on a test set, then new reservoirs of labeled data are generated and finally a number of reversely-biased artificial samples are generated for the fine-tuning of the model. Using as case study Deep Models for visual semantic role labeling, the proposed method has been able to fully cure a simulated gender bias starting from a 90/10 imbalance, with only a small percentage of new data and with a minor effect on accuracy.

artificial intelligence, big data and data science, machine learning, (13 more...)

2503.20414

Country: North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

A Survey on Event-driven 3D Reconstruction: Development under Different Categories

Xu, Chuanzhi, Zhou, Haoxian, Chen, Haodong, Chung, Vera, Qu, Qiang

--Event cameras have gained increasing attention for 3D reconstruction due to their high temporal resolution, low latency, and high dynamic range. They capture per-pixel brightness changes asynchronously, allowing accurate reconstruction under fast motion and challenging lighting conditions. In this survey, we provide a comprehensive review of event-driven 3D reconstruction methods, including stereo, monocular, and multimodal systems. We further categorize recent developments based on geometric, learning-based, and hybrid approaches. Emerging trends, such as neural radiance fields and 3D Gaussian splatting with event data, are also covered. The related works are structured chronologically to illustrate the innovations and progression within the field. T o support future research, we also highlight key research gaps and future research directions in dataset, experiment, evaluation, event representation, etc. Event cameras, also known as neuromorphic cameras, silicon retina, or dynamic vision sensors, are bio-inspired sensors that respond asynchronously to changes in brightness.

artificial intelligence, machine learning, reconstruction, (18 more...)

2503.19753

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMar-25-2025

Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review

Al-Azzawi, Mays, Doan, Dung, Sipola, Tuomo, Hautamäki, Jari, Kokkonen, Tero

Institute of Information Technology Jamk University of Applied Sciences PO Box 207, FI-40101, Jyv askyl a, Finland Abstract The progress of artificial intelligence (AI) has made sophisticated methods available for cyberattacks and red team activities. The new methods can also accelerate the execution of the attacks. This review article examines the use of AI technologies in cyber-security attacks. It also tries to describe typical targets for such attacks. We employed a scoping review methodology to analyze articles and identify AI methods, targets, and models that red teams can utilize to simulate cybercrime. From the 470 records screened, 11 were included in the review. Various cyberattack methods were identified, targeting sensitive data, systems, social media profiles, passwords, and URLs. The application of AI in cybercrime to develop versatile attack models presents an increasing threat. Furthermore, AI-based techniques in red team use can provide new ways to address these issues. Keywords: Artificial intelligence, red team, red teaming, cyberattack, cybersecurity. 1 Introduction The possibility of artificial intelligence (AI) simulating human behavior has emerged as a significant cybersecurity threat.

artificial intelligence, machine learning, natural language, (17 more...)

2503.19626

Country:

Europe > Finland (0.24)
Oceania > Australia (0.04)
North America > United States > Kansas > Leavenworth County > Leavenworth (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)