AITopics | log file

Collaborating Authors

log file

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting

Arora, Nirmit, Joel, Sathvik, Kavathekar, Ishan, Palak, null, Gandhi, Rohan, Pandya, Yash, Ganu, Tanuja, Kanade, Aditya, Nambi, Akshay

arXiv.org Artificial IntelligenceNov-17-2025

LLM-based agents are increasingly deployed in multi-agent systems (MAS). As these systems move toward real-world applications, their security becomes paramount. Existing research largely evaluates single-agent security, leaving a critical gap in understanding the vulnerabilities introduced by multi-agent design. However, existing systems fall short due to lack of unified frameworks and metrics focusing on unique rejection modes in MAS. We present SafeAgents, a unified and extensible framework for fine-grained security assessment of MAS. SafeAgents systematically exposes how design choices such as plan construction strategies, inter-agent context sharing, and fallback behaviors affect susceptibility to adversarial prompting. We introduce Dharma, a diagnostic measure that helps identify weak links within multi-agent pipelines. Using SafeAgents, we conduct a comprehensive study across five widely adopted multi-agent architectures (centralized, decentralized, and hybrid variants) on four datasets spanning web tasks, tool use, and code generation. Our findings reveal that common design patterns carry significant vulnerabilities. For example, centralized systems that delegate only atomic instructions to sub-agents obscure harmful objectives, reducing robustness. Our results highlight the need for security-aware design in MAS. Link to code is https://github.com/microsoft/SafeAgents

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.10949

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

Naik, Akshat, Quinn, Patrick, Bosch, Guillermo, Gouné, Emma, Zabala, Francisco Javier Campos, Brown, Jason Ross, Young, Edward James

arXiv.org Artificial IntelligenceOct-2-2025

As Large Language Model (LLM) agents become more widespread, associated misalignment risks increase. While prior research has studied agents' ability to produce harmful outputs or follow malicious instructions, it remains unclear how likely agents are to spontaneously pursue unintended goals in realistic deployments. In this work, we approach misalignment as a conflict between the internal goals pursued by the model and the goals intended by its deployer. We introduce a misalignment propensity benchmark, \textsc{AgentMisalignment}, a benchmark suite designed to evaluate the propensity of LLM agents to misalign in realistic scenarios. Evaluations cover behaviours such as avoiding oversight, resisting shutdown, sandbagging, and power-seeking. Testing frontier models, we find that more capable agents tend to exhibit higher misalignment on average. We also systematically vary agent personalities through different system prompts and observe that persona characteristics can strongly and unpredictably influence misalignment, sometimes more than the choice of model itself. Our results reveal the limitations of current alignment methods for autonomous LLM agents and underscore the need to rethink misalignment in realistic deployment settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.04018

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

SchemaCoder: Automatic Log Schema Extraction Coder with Residual Q-Tree Boosting

Wan, Lily Jiaxin, Ho, Chia-Tung, Liang, Rongjian, Yu, Cunxi, Chen, Deming, Ren, Haoxing

arXiv.org Artificial IntelligenceAug-27-2025

Log schema extraction is the process of deriving human-readable templates from massive volumes of log data, which is essential yet notoriously labor-intensive. Recent studies have attempted to streamline this task by leveraging Large Language Models (LLMs) for automated schema extraction. However, existing methods invariably rely on predefined regular expressions, necessitating human domain expertise and severely limiting productivity gains. To fundamentally address this limitation, we introduce SchemaCoder, the first fully automated schema extraction framework applicable to a wide range of log file formats without requiring human customization within the flow. At its core, SchemaCoder features a novel Residual Question-Tree (Q-Tree) Boosting mechanism that iteratively refines schema extraction through targeted, adaptive queries driven by LLMs. Particularly, our method partitions logs into semantic chunks via context-bounded segmentation, selects representative patterns using embedding-based sampling, and generates schema code through hierarchical Q-Tree-driven LLM queries, iteratively refined by our textual-residual evolutionary optimizer and residual boosting. Experimental validation demonstrates SchemaCoder's superiority on the widely-used LogHub-2.0 benchmark, achieving an average improvement of 21.3% over state-of-the-arts.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.18554

Country: North America > United States (0.68)

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Add feedback

Supplementary Material: Learning Distilled Collaboration Graph for Multi-Agent Perception

Neural Information Processing SystemsAug-18-2025, 23:02:55 GMT

V ehicles are spawned in CARLA via SUMO, and managed by the Traffic Manager. We employ the dataset format of the nuScenes and extend it to multi-agent scenarios, seen in Fig. IV. Each log file can produce 100 scenes, and each scene includes 100 frames. The input BEV map's dimension is (c, w,h) = (13, 256, 256). II.1 Architecture of student/teacher encoder We describe the architecture of the encoder below.

artificial intelligence, batchnorm2d, vehicle, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.07)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)

Add feedback

"Give me the code" -- Log Analysis of First-Year CS Students' Interactions With GPT

Alves, Pedro, Cipriano, Bruno Pereira

arXiv.org Artificial IntelligenceDec-1-2024

The impact of Large Language Models (LLMs) like GPT-3, GPT-4, and Bard in computer science (CS) education is expected to be profound. Students now have the power to generate code solutions for a wide array of programming assignments. For first-year students, this may be particularly problematic since the foundational skills are still in development and an over-reliance on generative AI tools can hinder their ability to grasp essential programming concepts. This paper analyzes the prompts used by 69 freshmen undergraduate students to solve a certain programming problem within a project assignment, without giving them prior prompt training. We also present the rules of the exercise that motivated the prompts, designed to foster critical thinking skills during the interaction. Despite using unsophisticated prompting techniques, our findings suggest that the majority of students successfully leveraged GPT, incorporating the suggested solutions into their projects. Additionally, half of the students demonstrated the ability to exercise judgment in selecting from multiple GPT-generated solutions, showcasing the development of their critical thinking skills in evaluating AI-generated code.

chatgpt, interaction, student, (15 more...)

arXiv.org Artificial Intelligence

2411.17855

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Portugal (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

What makes a good BIM design: quantitative linking between design behavior and quality

Ni, Xiang-Rui, Pan, Peng, Lin, Jia-Rui

arXiv.org Artificial IntelligenceNov-14-2024

In the Architecture Engineering & Construction (AEC) industry, how design behaviors impact design quality remains unclear. This study proposes a novel approach, which, for the first time, identifies and quantitatively describes the relationship between design behaviors and quality of design based on Building Information Modeling (BIM). Real-time collection and log mining are integrated to collect raw data of design behaviors. Feature engineering and various machine learning models are then utilized for quantitative modeling and interpretation. Results confirm an existing quantifiable relationship which can be learned by various models. The best-performing model using Extremely Random Trees achieved an R2 value of 0.88 on the test set. Behavioral features related to designer's skill level and changes of design intentions are identified to have significant impacts on design quality. These findings deepen our understanding of the design process and help forming BIM designs with better quality.

design behavior, design quality, designer, (15 more...)

arXiv.org Artificial Intelligence

2411.09481

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Portugal > Faro > Faro (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.93)

Industry:

Construction & Engineering (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Diagnosing Robotics Systems Issues with Large Language Models

Herrmann, Jordis Emilia, Gopinath, Aswath Mandakath, Norrlof, Mikael, Müller, Mark Niklas

arXiv.org Artificial IntelligenceOct-6-2024

Quickly resolving issues reported in industrial applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness in analyzing IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, a proprietary system diagnostics benchmark for robotics, containing over 2500 reported issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that QLoRA finetuning can be sufficient to let a 7B-parameter model outperform GPT-4 in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves similar approval ratings as our reference labels.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.09084

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.94)
Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Explainable Evolution Strategies with Large Language Models

Baumann, Jill, Kramer, Oliver

arXiv.org Artificial IntelligenceJul-11-2024

This paper introduces an approach that integrates self-adaptive Evolution Strategies (ES) with Large Language Models (LLMs) to enhance the explainability of complex optimization processes. By employing a self-adaptive ES equipped with a restart mechanism, we effectively navigate the challenging landscapes of benchmark functions, capturing detailed logs of the optimization journey, including fitness evolution, step-size adjustments, and restart events due to stagnation. An LLM is then utilized to process these logs, generating concise, user-friendly summaries that highlight key aspects such as convergence behavior, optimal fitness achievements, and encounters with local optima. Our case study on the Rastrigin function demonstrates how our approach makes the complexities of ES optimization transparent and accessible. Our findings highlight the potential of using LLMs to bridge the gap between advanced optimization algorithms and their interpretability.

fitness, fitness value, optimization process, (16 more...)

arXiv.org Artificial Intelligence

2407.08331

Country: Europe > Germany > Lower Saxony > Oldenburg (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5

Wang, Qiao, Rose, Ralph, Orita, Naho, Sugawara, Ayaka

arXiv.org Artificial IntelligenceMar-4-2024

A common way of assessing language learners' mastery of vocabulary is via multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of test items can be laborious for individual teachers or in large-scale language programs. In this paper, we evaluate a new method for automatically generating these types of questions using large language models (LLM). The VocaTT (vocabulary teaching and training) engine is written in Python and comprises three basic steps: pre-processing target word lists, generating sentences and candidate word options using GPT, and finally selecting suitable word options. To test the efficiency of this system, 60 questions were generated targeting academic words. The generated items were reviewed by expert reviewers who judged the well-formedness of the sentences and word options, adding comments to items judged not well-formed. Results showed a 75% rate of well-formedness for sentences and 66.85% rate for suitable word options. This is a marked improvement over the generator used earlier in our research which did not take advantage of GPT's capabilities. Post-hoc qualitative analysis reveals several points for improvement in future work including cross-referencing part-of-speech tagging, better sentence validation, and improving GPT prompts.

distractor, question stem, syntax, (17 more...)

arXiv.org Artificial Intelligence

2403.02078

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)

Add feedback

LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

Mäntylä, Mika, Wang, Yuqing, Nyyssölä, Jesse

arXiv.org Artificial IntelligenceJan-19-2024

This paper introduces LogLead, a tool designed for efficient log analysis benchmarking. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have Loaders for eight systems that are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty, TrainTicket, and GC Webshop). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to five supervised and four unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We show that log loading from raw file to dataframe is over 10x faster with LogLead compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. Our brief benchmarking on HDFS indicates that log representations extending beyond the bag-of-words approach offer limited additional benefits. Tool URL: https://github.com/EvoTestOps/LogLead

dataframe, log message, loglead, (14 more...)

arXiv.org Artificial Intelligence

2311.11809

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback