Goto

Collaborating Authors

 method section




Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

Miyai, Atsuyuki, Toyooka, Mashiro, Otonari, Takashi, Zhao, Zaiying, Aizawa, Kiyoharu

arXiv.org Artificial Intelligence

Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, and iteratively conducts experiments until improvements are realized, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. Through our experiments, the Jr. AI Scientist successfully generated new research papers that build upon real NeurIPS, IJCV, and ICLR works by proposing and implementing novel methods. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We believe this study clarifies the current role and limitations of AI Scientist systems, offering insights into the areas that still require human expertise and the risks that may emerge as these systems evolve.





version of our paper, we shall clarify the details in Section 3 (R2), and make intuition in the methods section much

Neural Information Processing Systems

We thank the reviewers for the detailed comments, suggestions, and a positive assessment of our work. We will correct for color schemes in all figures (R1). We have also made captions of figures cleaner (R3). We have added a description of the setup to the paper. In Fig 5 (left), DisCor actually outperforms Unif( s,a) on these environments.


What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction

Li, Lingbo, Mathrani, Anuradha, Susnjak, Teo

arXiv.org Artificial Intelligence

Automating data extraction from full-text randomised controlled trials (RCTs) for meta-analysis remains a significant challenge. This study evaluates the practical performance of three LLMs (Gemini-2.0-flash, Grok-3, GPT-4o-mini) across tasks involving statistical results, risk-of-bias assessments, and study-level characteristics in three medical domains: hypertension, diabetes, and orthopaedics. We tested four distinct prompting strategies (basic prompting, self-reflective prompting, model ensemble, and customised prompts) to determine how to improve extraction quality. All models demonstrate high precision but consistently suffer from poor recall by omitting key information. We found that customised prompts were the most effective, boosting recall by up to 15\%. Based on this analysis, we propose a three-tiered set of guidelines for using LLMs in data extraction, matching data types to appropriate levels of automation based on task complexity and risk. Our study offers practical advice for automating data extraction in real-world meta-analyses, balancing LLM efficiency with expert oversight through targeted, task-specific automation.


Snapshot multi-spectral imaging through defocusing and a Fourier imager network

Yang, Xilin, Fanous, Michael John, Chen, Hanlong, Lee, Ryan, Costa, Paloma Casteleiro, Li, Yuhang, Huang, Luzhe, Zhang, Yijie, Ozcan, Aydogan

arXiv.org Artificial Intelligence

Multi-spectral imaging, which simultaneously captures the spatial and spectral information of a scene, is widely used across diverse fields, including remote sensing, biomedical imaging, and agricultural monitoring. Here, we introduce a snapshot multi-spectral imaging approach employing a standard monochrome image sensor with no additional spectral filters or customized components. Our system leverages the inherent chromatic aberration of wavelength-dependent defocusing as a natural source of physical encoding of multi-spectral information; this encoded image information is rapidly decoded via a deep learning-based multi-spectral Fourier Imager Network (mFIN). We experimentally tested our method with six illumination bands and demonstrated an overall accuracy of 92.98% for predicting the illumination channels at the input and achieved a robust multi-spectral image reconstruction on various test objects. This deep learning-powered framework achieves high-quality multi-spectral image reconstruction using snapshot image acquisition with a monochrome image sensor and could be useful for applications in biomedicine, industrial quality control, and agriculture, among others.


Reviews: Improving Simple Models with Confidence Profiles

Neural Information Processing Systems

The authors introduce ProfWeight - a method for transferring knowledge from a teacher model to a student model. A "confidence profile" (taken from classification layers placed throughout the network) is used to determine which training samples are easy and which are hard. The loss function for the student model is weighted to favor learning the easier samples. The authors test this method on CIFAR10 and a real-world dataset. Quality: The idea presented by this paper is interesting and well-motivated. The method and results could be presented with more clarity, and the paper could benefit from some additional empirical analysis.