Zhu, Zhigang
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers
Qi, Jianing, Tang, Hao, Zhu, Zhigang
Recent advancements in test time compute, particularly through the use of verifier models, have significantly enhanced the reasoning capabilities of Large Language Models (LLMs). This generator-verifier approach closely resembles the actor-critic framework in reinforcement learning (RL). However, current verifier models in LLMs often rely on supervised fine-tuning without temporal difference learning such as Q-learning. This paper introduces VerifierQ, a novel approach that integrates Offline Q-learning into LLM verifier models. We address three key challenges in applying Q-learning to LLMs: (1) handling utterance-level Markov Decision Processes (MDPs), (2) managing large action spaces, and (3) mitigating overestimation bias. VerifierQ introduces a modified Bellman update for bounded Q-values, incorporates Implicit Q-learning (IQL) for efficient action space management, and integrates a novel Conservative Q-learning (CQL) formulation for balanced Q-value estimation. Our method enables parallel Q-value computation and improving training efficiency. While recent work has explored RL techniques like MCTS for generators, VerifierQ is among the first to investigate the verifier (critic) aspect in LLMs through Q-learning. This integration of RL principles into verifier models complements existing advancements in generator techniques, potentially enabling more robust and adaptive reasoning in LLMs. Experimental results on mathematical reasoning tasks demonstrate VerifierQ's superior performance compared to traditional supervised fine-tuning approaches, with improvements in efficiency, accuracy and robustness. By enhancing the synergy between generation and evaluation capabilities, VerifierQ contributes to the ongoing evolution of AI systems in addressing complex cognitive tasks across various domains.
Robots in the Garden: Artificial Intelligence and Adaptive Landscapes
Zhang, Zihao, Epstein, Susan L., Breen, Casey, Xia, Sophia, Zhu, Zhigang, Volkmann, Christian
This paper introduces ELUA, the Ecological Laboratory for Urban Agriculture, a collaboration among landscape architects, architects and computer scientists who specialize in artificial intelligence, robotics and computer vision. ELUA has two gantry robots, one indoors and the other outside on the rooftop of a 6-story campus building. Each robot can seed, water, weed, and prune in its garden. To support responsive landscape research, ELUA also includes sensor arrays, an AI-powered camera, and an extensive network infrastructure. This project demonstrates a way to integrate artificial intelligence into an evolving urban ecosystem, and encourages landscape architects to develop an adaptive design framework where design becomes a long-term engagement with the environment.
Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling
Olmschenk, Greg, Tang, Hao, Zhu, Zhigang
Gatherings of thousands to millions of people occur frequently foran enormous variety of events, and automated counting of these high density crowds is used for safety, management, andmeasuring significance of these events. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks is less effective than our alternative inverse k-nearest neighbor (ikNN) maps, even when used directly in existing state-ofthe-art networkstructures. We also provide a new network architecture MUD-ikNN, which uses multi-scale upsampling via transposed convolutions to take full advantage of the provided ikNN labeling. This upsampling combined with the ikNN maps further outperforms the existing state-of-the-art methods. The full label comparison emphasizes the importance ofthe labeling scheme, with the ikNN labeling being particularly effective. We demonstrate the accuracy of our MUD-ikNN network and the ikNN labeling scheme on a variety of datasets.
Generalizing semi-supervised generative adversarial networks to regression
Olmschenk, Greg, Zhu, Zhigang, Tang, Hao
In this work, we generalize semi-supervised generative adversarial networks (GANs) from classification problems to regression problems. In the last few years, the importance of improving the training of neural networks using semi-supervised training has been demonstrated for classification problems. With probabilistic classification being a subset of regression problems, this generalization opens up many new possibilities for the use of semi-supervised GANs as well as presenting an avenue for a deeper understanding of how they function. We first demonstrate the capabilities of semi-supervised regression GANs on a toy dataset which allows for a detailed understanding of how they operate in various circumstances. This toy dataset is used to provide a theoretical basis of the semi-supervised regression GAN. We then apply the semi-supervised regression GANs to the real-world application of age estimation from single images. We perform extensive tests of what accuracies can be achieved with significantly reduced annotated data. Through the combination of the theoretical example and real-world scenario, we demonstrate how semi-supervised GANs can be generalized to regression problems.