AITopics | loader

Collaborating Authors

loader

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

Yang, Zonghan, Wang, Shengjie, Fu, Kelin, He, Wenyang, Xiong, Weimin, Liu, Yibo, Miao, Yibo, Gao, Bofei, Wang, Yejie, Ma, Yingwei, Li, Yanhao, Liu, Yue, Hu, Zhenxing, Zhang, Kaitai, Wang, Shuyi, Chen, Huarong, Sung, Flood, Liu, Yang, Gao, Yang, Yang, Zhilin, Liu, Tianyu

arXiv.org Artificial IntelligenceDec-9-2025

A contiguous chunk of lines to search for in the existing sourcecode 4. The dividing line: =======5. The lines to replace into the source code6. The end of the replace block: >>>>>>> REPLACEHere is an example: '''python ### mathweb/flask/app.py<<<<<<< SEARCH from flask import Flask ======= import math from flask import Flask >>>>>>> REPLACE ''' Please note that the * SEARCH/REPLACE * edit REQUIRES PROPER INDENTATION.If you would like to add the line ' print(x)', you mustfully write that out, with all those spaces before the code!Wrap the * SEARCH/REPLACE * edit in blocks '''python...'''.The summary of the key differences between the trajectories should bein the thinking part.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2509.23045

Country: Europe > Austria (0.27)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
(2 more...)

Add feedback

A GPU-Accelerated RAG-Based Telegram Assistant for Supporting Parallel Processing Students

Tel-Zur, Guy

arXiv.org Artificial IntelligenceNov-18-2025

This project addresses a critical pedagogical need: offering students continuous, on-demand academic assistance beyond conventional reception hours. I present a domain-specific Retrieval-Augmented Generation (RAG) system powered by a quantized Mistral-7B Instruct model and deployed as a Telegram bot. The assistant enhances learning by delivering real-time, personalized responses aligned with the "Introduction to Parallel Processing" course materials. GPU acceleration significantly improves inference latency, enabling practical deployment on consumer hardware. This approach demonstrates how consumer GPUs can enable affordable, private, and effective AI tutoring for HPC education.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.11947

Genre: Instructional Material > Course Syllabus & Notes (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science

Yang, Xu, Yang, Xiao, Fang, Shikai, Zhang, Yifei, Wang, Jian, Xian, Bowen, Li, Qizheng, Li, Jingyuan, Xu, Minrui, Li, Yuante, Pan, Haoran, Zhang, Yuge, Liu, Weiqing, Shen, Yelong, Chen, Weizhu, Bian, Jiang

arXiv.org Artificial IntelligenceOct-2-2025

Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. Although crowd-sourcing platforms alleviate some challenges, high-level machine learning engineering (MLE) tasks remain labor-intensive and iterative. We introduce R&D-Agent, a comprehensive, decoupled, and extensible framework that formalizes the MLE process. R&D-Agent defines the MLE workflow into two phases and six components, turning agent design for MLE from ad-hoc craftsmanship into a principled, testable process. Although several existing agents report promising gains on their chosen components, they can mostly be summarized as a partial optimization from our framework's simple baseline. Inspired by human experts, we designed efficient and effective agents within this framework that achieve state-of-the-art performance. Evaluated on MLE-Bench, the agent built on R&D-Agent ranks as the top-performing machine learning engineering agent, achieving 35.1% any medal rate, demonstrating the ability of the framework to speed up innovation and improve accuracy across a wide range of data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.14738

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Towards a Large Physics Benchmark

Barman, Kristian G., Caron, Sascha, Hasibi, Faegheh, Shalugin, Eugene, Marcet, Yoris, Otte, Johannes, de Regt, Henk W., Moody, Merijn

arXiv.org Artificial IntelligenceJul-30-2025

We introduce a benchmark framework developed by and for the scientific community to evaluate, monitor and steer large language model development in fundamental physics. Building on philosophical concepts of scientific understanding and creativity, we develop a scoring system in which each question is scored by an expert for its correctness, difficulty, and surprise. The questions are of three forms: (i) multiple-choice questions for conceptual understanding, (ii) analytical problems requiring mathematical derivation, and (iii) openended tasks requiring complex problem solving. Our current dataset contains diverse set of examples, including a machine learning challenge to classify high-energy physics events, such as the four top quark signal. To ensure continued relevance, we propose a living benchmark, where physicists contribute questions, for instance alongside new publications. We invite contributions via: http://www.physicsbenchmarks.org/. We hope that this benchmark will enable a targeted AI development that can make a meaningful contribution to fundamental physics research.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.21695

Genre: Research Report (0.50)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning

Damie, Marc, Cyffers, Edwige

arXiv.org Artificial IntelligenceMay-28-2025

Decentralized machine learning - where each client keeps its own data locally and uses its own computational resources to collaboratively train a model by exchanging peer-to-peer messages - is increasingly popular, as it enables better scalability and control over the data. A major challenge in this setting is that learning dynamics depend on the topology of the communication graph, which motivates the use of real graph datasets for benchmarking decentralized algorithms. Unfortunately, existing graph datasets are largely limited to for-profit social networks crawled at a fixed point in time and often collected at the user scale, where links are heavily influenced by the platform and its recommendation algorithms. The Fediverse, which includes several free and open-source decentralized social media platforms such as Mastodon, Misskey, and Lemmy, offers an interesting real-world alternative. We introduce Fedivertex, a new dataset of 182 graphs, covering seven social networks from the Fediverse, crawled weekly over 14 weeks. We release the dataset along with a Python package to facilitate its use, and illustrate its utility on several tasks, including a new defederation task, which captures a process of link deletion observed on these networks.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2505.20882

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.84)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Application-oriented automatic hyperparameter optimization for spiking neural network prototyping

Fra, Vittorio

arXiv.org Artificial IntelligenceFeb-13-2025

Hyperparameter optimization (HPO) is of paramount importance in the development of high-performance, specialized artificial intelligence (AI) models, ranging from well-established machine learning (ML) solutions to the deep learning (DL) domain and the field of spiking neural networks (SNNs). The latter introduce further complexity due to the neuronal computational units and their additional hyperparameters, whose inadequate setting can dramatically impact the final model performance. At the cost of possible reduced generalization capabilities, the most suitable strategy to fully disclose the power of SNNs is to adopt an application-oriented approach and perform extensive HPO experiments. To facilitate these operations, automatic pipelines are fundamental, and their configuration is crucial. In this document, the Neural Network Intelligence (NNI) toolkit is used as reference framework to present one such solution, with a use case example providing evidence of the corresponding results. In addition, a summary of published works employing the presented pipeline is reported as possible source of insights into application-oriented HPO experiments for SNN prototyping.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.12172

Country:

Europe > Switzerland (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Deep Learning Model Security: Threats and Defenses

Wang, Tianyang, Bi, Ziqian, Zhang, Yichao, Liu, Ming, Hsieh, Weiche, Feng, Pohsun, Yan, Lawrence K. Q., Wen, Yizhu, Peng, Benji, Liu, Junyu, Chen, Keyu, Zhang, Sen, Li, Ming, Jiang, Chuanqi, Song, Xinyuan, Yang, Junjie, Jing, Bowen, Ren, Jintao, Song, Junhao, Tseng, Hong-Ming, Chen, Silin, Wang, Yunze, Liang, Chia Xin, Xu, Jiawei, Pan, Xuanhe, Wang, Jinlang, Niu, Qian

arXiv.org Artificial IntelligenceDec-15-2024

Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential privacy, and federated learning, highlighting their strengths and limitations. Advanced methods like contrastive and self-supervised learning are presented for enhancing robustness. The survey concludes with future directions, emphasizing automated defenses, zero-trust architectures, and the security challenges of large AI models. A balanced approach to performance and security is essential for developing reliable deep learning systems.

artificial intelligence, machine learning, torch, (18 more...)

arXiv.org Artificial Intelligence

2412.08969

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii (0.04)
(10 more...)

Genre:

Overview (1.00)
Workflow (0.94)
Instructional Material (0.92)
Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Chen, Ziru, Chen, Shijie, Ning, Yuting, Zhang, Qianheng, Wang, Boshi, Yu, Botao, Li, Yifei, Liao, Zeyi, Wei, Chen, Lu, Zitong, Dey, Vishal, Xue, Mingyi, Baker, Frazier N., Burns, Benjamin, Adu-Ampratwum, Daniel, Huang, Xuhui, Ning, Xia, Gao, Song, Su, Yu, Sun, Huan

arXiv.org Artificial IntelligenceOct-23-2024

The advancements of language language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about their true capabilities. In this work, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using our benchmark, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands CodeAct, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. In addition, we evaluate OpenAI o1 with direct prompting and self-debug, which demonstrates the effectiveness of increasing inference-time compute. Still, our results underscore the limitations of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.0508

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Ohio (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Law (0.93)
Government > Regional Government > North America Government > United States Government (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Trirat, Patara, Jeong, Wonyong, Hwang, Sung Ju

arXiv.org Artificial IntelligenceOct-3-2024

Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user's task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.02958

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Food & Agriculture > Agriculture (0.67)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

Zhang, Songyang, Zhang, Chuyu, Hu, Yingfan, Shen, Haowen, Liu, Kuikun, Ma, Zerun, Zhou, Fengzhe, Zhang, Wenwei, He, Xuming, Lin, Dahua, Chen, Kai

arXiv.org Artificial IntelligenceJul-15-2024

While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation framework includes an evaluation dataset and two evaluation modes. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. The two evaluation modes assess LLMs' ability with and without human assistance. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.

benchmark, cibench, llm, (15 more...)

arXiv.org Artificial Intelligence

2407.10499

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback