Problem Solving
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Wu, Jinyang, Feng, Mingkuan, Zhang, Shuai, Jin, Ruihan, Che, Feihu, Wen, Zengqi, Tao, Jianhua
Multimodal large language models (MLLMs) exhibit impressive capabilities but still face challenges in complex visual reasoning. While recent efforts attempt to enhance MLLMs' reasoning by incorporating OpenAI o1-like structured thinking through explicit search structures or teacher-guided distillation, they often struggle to balance performance and efficiency. A critical limitation is their heavy reliance on extensive data and search spaces, resulting in low-efficiency implicit insight extraction and data utilization. To address this, we propose AStar, an Automated Structured thinking paradigm for multimodal reasoning via Monte Carlo Tree Search (MCTS). AStar automatically derives high-level cognitive reasoning patterns from limited data using MCTS-powered hierarchical structures. Building on these explicit patterns, we design a unified reasoning framework that seamlessly integrates models' internal reasoning capabilities and external reasoning guidelines, enabling efficient inference with minimal tree iterations. This novel paradigm strikes a compelling balance between performance and efficiency. Extensive experiments demonstrate AStar's effectiveness, achieving superior accuracy (54.0$\%$) on the MathVerse benchmark with a 7B backbone, surpassing GPT-4o (50.2$\%$) while maintaining substantial data and computational efficiency.
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
Eger, Steffen, Cao, Yong, D'Souza, Jennifer, Geiger, Andreas, Greisinger, Christian, Gross, Stephanie, Hou, Yufang, Krenn, Brigitte, Lauscher, Anne, Li, Yizhi, Lin, Chenghua, Moosavi, Nafise Sadat, Zhao, Wei, Miller, Tristan
With the advent of large multimodal language models, science is now at a threshold of an AI-based technological transformation. Recently, a plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. This includes all aspects of the research cycle, especially (1) searching for relevant literature; (2) generating research ideas and conducting experimentation; generating (3) text-based and (4) multimodal content (e.g., scientific figures and diagrams); and (5) AI-based automatic peer review. In this survey, we provide an in-depth overview over these exciting recent developments, which promise to fundamentally alter the scientific research process for good. Our survey covers the five aspects outlined above, indicating relevant datasets, methods and results (including evaluation) as well as limitations and scope for future research. Ethical concerns regarding shortcomings of these tools and potential for misuse (fake science, plagiarism, harms to research integrity) take a particularly prominent place in our discussion. We hope that our survey will not only become a reference guide for newcomers to the field but also a catalyst for new AI-based initiatives in the area of "AI4Science".
Review for NeurIPS paper: RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Weaknesses: The paper only shows proxy task complicated search space may not work as well as using a simple search task without much approximation. It doesn't really tell us what happens if a complicated search space can be efficiently explored on the real task. In this sense, this paper is only a reflection of current practice, without providing a clear direction forward. In fact, the simplification of this paper (reducing the search space to number of op to apply, and the shared magnitude of ops) seems like an over-kill. By doing that, it misses an opportunity to answer some interesting question, such as: "Does assigning a different magnitude to different ops useful at all in auto data augmentation"?
Review for NeurIPS paper: RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
This paper got mixed reviews. The original ratings are 6,5,5,6. On the positive side, reviewers think the paper solves an important problem. Data augmentation is recognized to be an important step for improving machine learning model performance. However, existing auto data augmentation methods are typically very costly.
Iterate to Accelerate: A Unified Framework for Iterative Reasoning and Feedback Convergence
Iterative methods lie at the heart of numerous optimization and reasoning algorithms, ranging from classical mirror descent and dynamic programming to modern deep learning architectures that exhibit chain-of-thought reasoning. Traditional acceleration techniques, such as Nesterov's momentum, have shown that carefully designed iterative schemes can significantly improve convergence rates in convex settings. However, many practical applications operate in non-Euclidean spaces and are subject to state-dependent perturbations or even adversarial disturbances, motivating the need for a more general analysis. In this work, we develop a comprehensive framework that unifies a wide class of iterative reasoning processes using the language of Bregman divergences.
Adaptation of Task Goal States from Prior Knowledge
Costinescu, Andrei, Burschka, Darius
This paper presents a framework to define a task with freedom and variability in its goal state. A robot could use this to observe the execution of a task and target a different goal from the observed one; a goal that is still compatible with the task description but would be easier for the robot to execute. We define the model of an environment state and an environment variation, and present experiments on how to interactively create the variation from a single task demonstration and how to use this variation to create an execution plan for bringing any environment into the goal state.
Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.
Review for NeurIPS paper: Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
Additional Feedback: Algorithm 2. X_t is never defined. I assumed that X_t is defined by Equation 2 like Algorithm 1. Authors mentioned the same computational budget for acquisition function optimization. What is the optimizer though? Constrained optimization of the acquisition function inside H_t (Equation 3) does not seem trivial. It isn't mentioned anywhere how the acquisition funciton was optimized.
Review for NeurIPS paper: Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
The paper has been discussed after the rebuttal that the reviewers found useful and actionable (e.g., concerns about the confidence bound). The paper is recommended for acceptance. All reviewers have acknowledged that the paper is well motivated, well written and establishes a nice interplay between theory and a practical problem of interest.
The Logical Implication Steering Method for Conditional Interventions on Transformer Generation
The field of mechanistic interpretability in pre-trained transformer models has demonstrated substantial evidence supporting the ''linear representation hypothesis'', which is the idea that high level concepts are encoded as vectors in the space of activations of a model. Studies also show that model generation behavior can be steered toward a given concept by adding the concept's vector to the corresponding activations. We show how to leverage these properties to build a form of logical implication into models, enabling transparent and interpretable adjustments that induce a chosen generation behavior in response to the presence of any given concept. Our method, Logical Implication Model Steering (LIMS), unlocks new hand engineered reasoning capabilities by integrating neuro-symbolic logic into pre-trained transformer models.