Search
Evolving Interpretable Visual Classifiers with Large Language Models
Chiquier, Mia, Mall, Utkarsh, Vondrick, Carl
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses a large language model and its in-context learning abilities to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable fine-grained classifiers. We outperform the latest baselines by 18.4% on five fine-grained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information about class names.
Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information
Chae, Kyubyung, Choi, Jaepill, Jo, Yohan, Kim, Taesup
A primary challenge in abstractive summarization is hallucination -- the phenomenon where a model generates plausible text that is absent in the source text. We hypothesize that the domain (or topic) of the source text triggers the model to generate text that is highly probable in the domain, neglecting the details of the source text. To alleviate this model bias, we introduce a decoding strategy based on domain-conditional pointwise mutual information. This strategy adjusts the generation probability of each token by comparing it with the token's marginal probability within the domain of the source text. According to evaluation on the XSUM dataset, our method demonstrates improvement in terms of faithfulness and source relevance. The code is publicly available at \url{https://github.com/qqplot/dcpmi}.
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
Sun, Haojun, Tang, Chen, Wang, Zhi, Meng, Yuan, jiang, Jingyan, Ma, Xinzhu, Zhu, Wenwu
Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational demands inherent to these models present challenges for deployment, quantization is thus widely used to lower the bit-width for reducing the storage and computing overheads. Current quantization methodologies primarily focus on model-side optimization, disregarding the temporal dimension, such as the length of the timestep sequence, thereby allowing redundant timesteps to continue consuming computational resources, leaving substantial scope for accelerating the generative process. In this paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and quantization to achieve a superior performance-efficiency trade-off, addressing both temporal and model optimization aspects. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process, thereby mitigating the explosive combinations of timesteps. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance, thus rectifying performance degradation observed in prior studies. To expedite the evaluation of fine-grained quantization, we further devise a super-network to serve as a precision solver by leveraging shared quantization results. These two design components are seamlessly integrated within our framework, enabling rapid joint exploration of the exponentially large decision space via a gradient-free evolutionary search algorithm.
Monte Carlo Search Algorithms Discovering Monte Carlo Tree Search Exploration Terms
Monte Carlo Tree Search and Monte Carlo Search have good results for many combinatorial problems. In this paper we propose to use Monte Carlo Search to design mathematical expressions that are used as exploration terms for Monte Carlo Tree Search algorithms. The optimized Monte Carlo Tree Search algorithms are PUCT and SHUSS. We automatically design the PUCT and the SHUSS root exploration terms. For small search budgets of 32 evaluations the discovered root exploration terms make both algorithms competitive with usual PUCT.
Integrating Hyperparameter Search into Model-Free AutoML with Context-Free Grammars
Vรกzquez, Hernรกn Ceferino, Sanchez, Jorge, Carrascosa, Rafael
Automated Machine Learning (AutoML) has become increasingly popular in recent years due to its ability to reduce the amount of time and expertise required to design and develop machine learning systems. This is very important for the practice of machine learning, as it allows building strong baselines quickly, improving the efficiency of the data scientists, and reducing the time to production. However, despite the advantages of AutoML, it faces several challenges, such as defining the solutions space and exploring it efficiently. Recently, some approaches have been shown to be able to do it using tree-based search algorithms and context-free grammars. In particular, GramML presents a model-free reinforcement learning approach that leverages pipeline configuration grammars and operates using Monte Carlo tree search. However, one of the limitations of GramML is that it uses default hyperparameters, limiting the search problem to finding optimal pipeline structures for the available data preprocessors and models. In this work, we propose an extension to GramML that supports larger search spaces including hyperparameter search. We evaluated the approach using an OpenML benchmark and found significant improvements compared to other state-of-the-art techniques.
Hierarchical Experience-informed Navigation for Multi-modal Quadrupedal Rebar Grid Traversal
Asselmeier, Max, Ivanova, Jane, Zhou, Ziyi, Vela, Patricio A., Zhao, Ye
This study focuses on a layered, experience-based, multi-modal contact planning framework for agile quadrupedal locomotion over a constrained rebar environment. To this end, our hierarchical planner incorporates locomotion-specific modules into the high-level contact sequence planner and solves kinodynamically-aware trajectory optimization as the low-level motion planner. Through quantitative analysis of the experience accumulation process and experimental validation of the kinodynamic feasibility of the generated locomotion trajectories, we demonstrate that the experience planning heuristic offers an effective way of providing candidate footholds for a legged contact planner. Additionally, we introduce a guiding torso path heuristic at the global planning level to enhance the navigation success rate in the presence of environmental obstacles. Our results indicate that the torso-path guided experience accumulation requires significantly fewer offline trials to successfully reach the goal compared to regular experience accumulation. Finally, our planning framework is validated in both dynamics simulations and real hardware implementations on a quadrupedal robot provided by Skymul Inc.
Box Facets and Cut Facets of Lifted Multicut Polytopes
Naumann, Lucas Fabian, Irmai, Jannik, Zhao, Shengxian, Andres, Bjoern
The lifted multicut problem is a combinatorial optimization problem whose feasible solutions relate one-to-one to the decompositions of a graph $G = (V, E)$. Given an augmentation $\widehat{G} = (V, E \cup F)$ of $G$ and given costs $c \in \mathbb{R}^{E \cup F}$, the objective is to minimize the sum of those $c_{uw}$ with $uw \in E \cup F$ for which $u$ and $w$ are in distinct components. For $F = \emptyset$, the problem specializes to the multicut problem, and for $E = \tbinom{V}{2}$ to the clique partitioning problem. We study a binary linear program formulation of the lifted multicut problem. More specifically, we contribute to the analysis of the associated lifted multicut polytopes: Firstly, we establish a necessary, sufficient and efficiently decidable condition for a lower box inequality to define a facet. Secondly, we show that deciding whether a cut inequality of the binary linear program defines a facet is NP-hard.
Minimax Optimal Goodness-of-Fit Testing with Kernel Stein Discrepancy
Hagrass, Omar, Sriperumbudur, Bharath, Balasubramanian, Krishnakumar
We explore the minimax optimality of goodness-of-fit tests on general domains using the kernelized Stein discrepancy (KSD). The KSD framework offers a flexible approach for goodness-of-fit testing, avoiding strong distributional assumptions, accommodating diverse data structures beyond Euclidean spaces, and relying only on partial knowledge of the reference distribution, while maintaining computational efficiency. We establish a general framework and an operator-theoretic representation of the KSD, encompassing many existing KSD tests in the literature, which vary depending on the domain. We reveal the characteristics and limitations of KSD and demonstrate its non-optimality under a certain alternative space, defined over general domains when considering $\chi^2$-divergence as the separation metric. To address this issue of non-optimality, we propose a modified, minimax optimal test by incorporating a spectral regularizer, thereby overcoming the shortcomings of standard KSD tests. Our results are established under a weak moment condition on the Stein kernel, which relaxes the bounded kernel assumption required by prior work in the analysis of kernel-based hypothesis testing. Additionally, we introduce an adaptive test capable of achieving minimax optimality up to a logarithmic factor by adapting to unknown parameters. Through numerical experiments, we illustrate the superior performance of our proposed tests across various domains compared to their unregularized counterparts.
RoT: Enhancing Large Language Models with Reflection on Search Trees
Hui, Wenyang, Jiang, Chengyue, Wang, Yan, Tu, Kewei
Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.
Monte Carlo Tree Search with Boltzmann Exploration
Painter, Michael, Baioumy, Mohamed, Hawes, Nick, Lacerda, Bruno
Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.