literature review
MLR-Bench: Evaluating AIAgents on Open-Ended Machine Learning Research Hui Chen Miao Xiong Yujie Lu Wei Han Ailin Deng Yufei He Jiaying Wu Yibo Li
Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLMbased reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e.g., in 80% of the cases) produce fabricated or invalidated experimental results--posing a major barrier to scientific reliability.
Invisible Load: Uncovering the Challenges of Neurodivergent Women in Software Engineering
Zaib, Munazza, Wang, Wei, Hidellaarachchi, Dulaji, Siddiqui, Isma Farah
Neurodivergent women in Software Engineering (SE) encounter distinctive challenges at the intersection of gender bias and neurological differences. To the best of our knowledge, no prior work in SE research has systematically examined this group, despite increasing recognition of neurodiversity in the workplace. Underdiagnosis, masking, and male-centric workplace cultures continue to exacerbate barriers that contribute to stress, burnout, and attrition. In response, we propose a hybrid methodological approach that integrates InclusiveMag's inclusivity framework with the GenderMag walkthrough process, tailored to the context of neurodivergent women in SE. The overarching design unfolds across three stages, scoping through literature review, deriving personas and analytic processes, and applying the method in collaborative workshops. We present a targeted literature review that synthesize challenges into cognitive, social, organizational, structural and career progression challenges neurodivergent women face in SE, including how under/late diagnosis and masking intensify exclusion. These findings lay the groundwork for subsequent stages that will develop and apply inclusive analytic methods to support actionable change.
Interview with Frida Hartman: Studying bias in AI-based recruitment tools
In a new series of interviews, we're meeting some of the PhD students that were selected to take part in the Doctoral Consortium at the European Conference on Artificial Intelligence (ECAI-2025) . In the second interview of the series, we caught up with Frida Hartman to find out how her PhD is going so far, and plans for the next steps in her investigations. Frida, along with co-authors Mario Mirabile and Michele Dusi, was also the winner of the ECAI-2025 Diversity & Inclusion Competition, for work entitled . This award was presented at the closing ceremony of the conference. Could start by giving us a quick introduction to yourself and the topic that you're working on?
Artificial Intelligence and Accounting Research: A Framework and Agenda
Stratopoulos, Theophanis C., Wang, Victor Xiaoqi
Recent advances in artificial intelligence, particularly generative AI (GenAI) and large language models (LLMs), are fundamentally transforming accounting research, creating both opportunities and competitive threats for scholars. This paper proposes a framework that classifies AI-accounting research along two dimensions: research focus (accounting-centric versus AI-centric) and methodological approach (AI-based versus traditional methods). We apply this framework to papers from the IJAIS special issue and recent AI-accounting research published in leading accounting journals to map existing studies and identify research opportunities. Using this same framework, we analyze how accounting researchers can leverage their expertise through strategic positioning and collaboration, revealing where accounting scholars' strengths create the most value. We further examine how GenAI and LLMs transform the research process itself, comparing the capabilities of human researchers and AI agents across the entire research workflow. This analysis reveals that while GenAI democratizes certain research capabilities, it simultaneously intensifies competition by raising expectations for higher-order contributions where human judgment, creativity, and theoretical depth remain valuable. These shifts call for reforming doctoral education to cultivate comparative advantages while building AI fluency.
Interview with Mario Mirabile: trust in multi-agent systems
In a new series of interviews, we're meeting some of the PhD students that were selected to take part in the Doctoral Consortium at the European Conference on Artificial Intelligence (ECAI 2025) . During the conference in Bologna, we caught up with Mario Mirabile who is studying for his PhD in trustworthy AI and multi-agent systems at the University of Santiago de Compostela and is a Research Fellow in human-AI interaction at the University of Bologna. Mario, along with co-authors Frida Hartman and Michele Dusi, was also the winner of the ECAI-2025 Diversity & Inclusion Competition, for work entitled . This award was presented at the closing ceremony of the conference. Could you start by giving us an introduction to the topic you are working on?