AITopics | debug

Collaborating Authors

debug

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

019fa4fdf1c04cf73ba25aa2223769cd-Paper.pdf

Neural Information Processing SystemsApr-30-2026, 20:25:40 GMT

artificial intelligence, machine learning, maverick rating, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

019fa4fdf1c04cf73ba25aa2223769cd-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 07:19:49 GMT

cfd ebug, maverick rating, recommendation accuracy, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Agent-Environment Alignment via Automated Interface Generation

Liu, Kaiming, Lei, Xuanyu, Wang, Ziyue, Li, Peng, Liu, Yang

arXiv.org Artificial IntelligenceMay-28-2025

Large language model (LLM) agents have shown impressive reasoning capabilities in interactive decision-making tasks. These agents interact with environment through intermediate interfaces, such as predefined action spaces and interaction rules, which mediate the perception and action. However, mismatches often happen between the internal expectations of the agent regarding the influence of its issued actions and the actual state transitions in the environment, a phenomenon referred to as \textbf{agent-environment misalignment}. While prior work has invested substantially in improving agent strategies and environment design, the critical role of the interface still remains underexplored. In this work, we empirically demonstrate that agent-environment misalignment poses a significant bottleneck to agent performance. To mitigate this issue, we propose \textbf{ALIGN}, an \underline{A}uto-A\underline{l}igned \underline{I}nterface \underline{G}e\underline{n}eration framework that alleviates the misalignment by enriching the interface. Specifically, the ALIGN-generated interface enhances both the static information of the environment and the step-wise observations returned to the agent. Implemented as a lightweight wrapper, this interface achieves the alignment without modifying either the agent logic or the environment code. Experiments across multiple domains including embodied tasks, web navigation and tool-use, show consistent performance improvements, with up to a 45.67\% success rate improvement observed in ALFWorld. Meanwhile, ALIGN-generated interface can generalize across different agent architectures and LLM backbones without interface regeneration. Code and experimental results are available at https://github.com/THUNLP-MT/ALIGN.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.21055

Country:

Asia > Middle East > UAE (0.45)
Europe > Austria (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

debug-gym: A Text-Based Environment for Interactive Debugging

Yuan, Xingdi, Moss, Morgane M, Feghali, Charbel El, Singh, Chinmay, Moldavskaya, Darya, MacPhee, Drew, Caccia, Lucas, Pereira, Matheus, Kim, Minseon, Sordoni, Alessandro, Côté, Marc-Alexandre

arXiv.org Artificial IntelligenceMar-27-2025

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.21557

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.67)
Leisure & Entertainment (0.46)
Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation

Jansen, Peter, Tafjord, Oyvind, Radensky, Marissa, Siangliulue, Pao, Hope, Tom, Mishra, Bhavana Dalvi, Majumder, Bodhisattwa Prasad, Weld, Daniel S., Clark, Peter

arXiv.org Artificial IntelligenceMar-20-2025

Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.22708

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Arizona (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(9 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Application-oriented automatic hyperparameter optimization for spiking neural network prototyping

Fra, Vittorio

arXiv.org Artificial IntelligenceFeb-13-2025

Hyperparameter optimization (HPO) is of paramount importance in the development of high-performance, specialized artificial intelligence (AI) models, ranging from well-established machine learning (ML) solutions to the deep learning (DL) domain and the field of spiking neural networks (SNNs). The latter introduce further complexity due to the neuronal computational units and their additional hyperparameters, whose inadequate setting can dramatically impact the final model performance. At the cost of possible reduced generalization capabilities, the most suitable strategy to fully disclose the power of SNNs is to adopt an application-oriented approach and perform extensive HPO experiments. To facilitate these operations, automatic pipelines are fundamental, and their configuration is crucial. In this document, the Neural Network Intelligence (NNI) toolkit is used as reference framework to present one such solution, with a use case example providing evidence of the corresponding results. In addition, a summary of published works employing the presented pipeline is reported as possible source of insights into application-oriented HPO experiments for SNN prototyping.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.12172

Country:

Europe > Switzerland (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

Jiao, Pengkun, Zhao, Na, Chen, Jingjing, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceNov-5-2024

Open-set single-source domain generalization aims to use a single-source domain to learn a robust model that can be generalized to unknown target domains with both domain shifts and label shifts. The scarcity of the source domain and the unknown data distribution of the target domain pose a great challenge for domain-invariant feature learning and unknown class recognition. In this paper, we propose a novel learning approach based on domain expansion and boundary growth to expand the scarce source samples and enlarge the boundaries across the known classes that indirectly broaden the boundary between the known and unknown classes. Specifically, we achieve domain expansion by employing both background suppression and style augmentation on the source data to synthesize new samples. Then we force the model to distill consistent knowledge from the synthesized samples so that the model can learn domain-invariant information. Furthermore, we realize boundary growth across classes by using edge maps as an additional modality of samples when training multi-binary classifiers. In this way, it enlarges the boundary between the inliers and outliers, and consequently improves the unknown class recognition during open-set generalization. Extensive experiments show that our approach can achieve significant improvements and reach state-of-the-art performance on several cross-domain image classification datasets.

classifier, domain generalization, unknown class, (14 more...)

arXiv.org Artificial Intelligence

2411.0292

Country:

Asia > Singapore (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

The AI-Powered Future of Coding Is Near

WIREDJul-18-2024, 16:00:00 GMT

I am by no means a skilled coder, but thanks to a free program called SWE-agent, I was just able to debug and fix a gnarly problem involving a misnamed file within different code repositories on the software-hosting site GitHub. I pointed SWE-agent at an issue on GitHub and watched as it went through the code and reasoned about what might be wrong. It correctly determined that the root cause of the bug was a line that pointed to the wrong location for a file, then navigated through the project, located the file, and amended the code so that everything ran properly. It's the kind of thing that an inexperienced developer (such as myself) might spend hours trying to debug. Many coders already use artificial intelligence to write software more quickly.

agent, swe-agent, swe-bench, (11 more...)

WIRED

Country: Asia > Singapore (0.06)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.38)

Add feedback

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

Allamanis, Miltiadis, Panthaplackel, Sheena, Yin, Pengcheng

arXiv.org Artificial IntelligenceFeb-13-2024

To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains. In this work, we introduce round-trip correctness (RTC) as an alternative evaluation method. RTC allows Code LLM evaluation on a broader spectrum of real-world software domains without the need for costly human curation. RTC rests on the idea that we can ask a model to make a prediction (e.g., describe some code using natural language), feed that prediction back (e.g., synthesize code from the predicted description), and check if this round-trip leads to code that is semantically equivalent to the original input. We show how to employ RTC to evaluate code synthesis and editing. We find that RTC strongly correlates with model performance on existing narrow-domain code synthesis benchmarks while allowing us to expand to a much broader set of domains and tasks which was not previously possible without costly human annotations.

detection, github, unsupervised evaluation, (14 more...)

arXiv.org Artificial Intelligence

2402.08699

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Quality of Auto-Generated Code

#artificialintelligenceNov-16-2021, 14:07:45 GMT

Kevlin Henney and I were riffing on some ideas about GitHub Copilot, the tool for automatically generating code base on GPT-3's language model, trained on the body of code that's in GitHub. This article poses some questions and (perhaps) some answers, without trying to present any conclusions. First, we wondered about code quality. There are lots of ways to solve a given programming problem; but most of us have some ideas about what makes code "good" or "bad." Is it readable, is it well-organized?

auto-generated code, code quality, copilot, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback