Goto

Collaborating Authors

 Materials


Nudging: Inference-time Alignment via Model Collaboration

arXiv.org Artificial Intelligence

Large language models (LLMs) require alignment, such as instruction-tuning or reinforcement learning from human feedback, to effectively and safely follow user instructions. This process necessitates training aligned versions for every model size in each model family, resulting in significant computational overhead. In this work, we propose nudging, a simple, plug-and-play, and training-free algorithm that aligns any base model at inference time using a small aligned model. Nudging is motivated by recent findings that alignment primarily alters the model's behavior on a small subset of stylistic tokens, such as "Sure" or "Thank". We find that base models are significantly more uncertain when generating these tokens. Leveraging this observation, nudging employs a small aligned model to generate nudging tokens to steer the large base model's output toward desired directions when the base model's uncertainty is high. We evaluate the effectiveness of nudging across 3 model families and 13 tasks, covering reasoning, general knowledge, instruction following, and safety benchmarks. Without any additional training, nudging a large base model with a 7x - 14x smaller aligned model achieves zero-shot performance comparable to, and sometimes surpassing, that of large aligned models. For example, nudging OLMo-7b with OLMo-1b-instruct, affecting less than 9% of tokens, achieves a 10% absolute improvement on GSM8K over OLMo-7b-instruct. Unlike prior inference-time tuning methods, nudging enables off-the-shelf collaboration between model families. For instance, nudging Gemma-2-27b with Llama-2-7b-chat outperforms Llama-2-70b-chat on various tasks. Overall, this work introduces a simple yet powerful approach to token-level model collaboration, offering a modular solution to LLM alignment. Our project website: https://fywalter.github.io/nudging/ .


MoIN: Mixture of Introvert Experts to Upcycle an LLM

arXiv.org Artificial Intelligence

The goal of this paper is to improve (upcycle) an existing large language model without the prohibitive requirements of continued pre-training of the full-model. The idea is to split the pre-training data into semantically relevant groups and train an expert on each subset. An expert takes the form of a lightweight adapter added on the top of a frozen base model. During inference, an incoming query is first routed to the most relevant expert which is then loaded onto the base model for the forward pass. Unlike typical Mixture of Experts (MoE) models, the experts in our method do not work with other experts for a single query. Hence, we dub them "introvert" experts. Freezing the base model and keeping the experts as lightweight adapters allows extreme parallelism during training and inference. Training of all experts can be done in parallel without any communication channels between them. Similarly, the inference can also be heavily parallelized by distributing experts on different GPUs and routing each request to the GPU containing its relevant expert. We implement a proof-of-concept version of this method and show the validity of our approach.


Technical Design Review of Duke Robotics Club's Oogway: An AUV for RoboSub 2024

arXiv.org Artificial Intelligence

The Duke Robotics Club is proud to present our robot for the 2024 RoboSub Competition: Oogway. Now in its second year, Oogway has been dramatically upgraded in both its capabilities and reliability. Oogway was built on the principle of independent, well-integrated, and reliable subsystems. Individual components and subsystems were tested and designed separately. Oogway's most advanced capabilities are a result of the tight integration between these subsystems. Such examples include a re-envisioned controls system, an entirely new electrical stack, advanced sonar integration, additional cameras and system monitoring, a new marker dropper, and a watertight capsule mechanism. These additions enabled Oogway to prequalify for Robosub 2024.


Many-body Expansion Based Machine Learning Models for Octahedral Transition Metal Complexes

arXiv.org Artificial Intelligence

Graph-based machine learning models for materials properties show great potential to accelerate virtual high-throughput screening of large chemical spaces. However, in their simplest forms, graph-based models do not include any 3D information and are unable to distinguish stereoisomers such as those arising from different orderings of ligands around a metal center in coordination complexes. In this work we present a modification to revised autocorrelation descriptors, our molecular graph featurization method for machine learning various spin state dependent properties of octahedral transition metal complexes (TMCs). Inspired by analytical semi-empirical models for TMCs, the new modeling strategy is based on the many-body expansion (MBE) and allows one to tune the captured stereoisomer information by changing the truncation order of the MBE. We present the necessary modifications to include this approach in two commonly used machine learning methods, kernel ridge regression and feed-forward neural networks. On a test set composed of all possible isomers of binary transition metal complexes, the best MBE models achieve mean absolute errors of 2.75 kcal/mol on spin-splitting energies and 0.26 eV on frontier orbital energy gaps, a 30-40% reduction in error compared to models based on our previous approach. We also observe improved generalization to previously unseen ligands where the best-performing models exhibit mean absolute errors of 4.00 kcal/mol (i.e., a 0.73 kcal/mol reduction) on the spin-splitting energies and 0.53 eV (i.e., a 0.10 eV reduction) on the frontier orbital energy gaps. Because the new approach incorporates insights from electronic structure theory, such as ligand additivity relationships, these models exhibit systematic generalization from homoleptic to heteroleptic complexes, allowing for efficient screening of TMC search spaces.


A Systematic Assessment of OpenAI o1-Preview for Higher Order Thinking in Education

arXiv.org Artificial Intelligence

As artificial intelligence (AI) continues to advance, it demonstrates capabilities comparable to human intelligence, with significant potential to transform education and workforce development. This study evaluates OpenAI o1-preview's ability to perform higher-order cognitive tasks across 14 dimensions, including critical thinking, systems thinking, computational thinking, design thinking, metacognition, data literacy, creative thinking, abstract reasoning, quantitative reasoning, logical reasoning, analogical reasoning, and scientific reasoning. We used validated instruments like the Ennis-Weir Critical Thinking Essay Test and the Biological Systems Thinking Test to compare the o1-preview's performance with human performance systematically. Our findings reveal that o1-preview outperforms humans in most categories, achieving 150% better results in systems thinking, computational thinking, data literacy, creative thinking, scientific reasoning, and abstract reasoning. However, compared to humans, it underperforms by around 25% in logical reasoning, critical thinking, and quantitative reasoning. In analogical reasoning, both o1-preview and humans achieved perfect scores. Despite these strengths, the o1-preview shows limitations in abstract reasoning, where human psychology students outperform it, highlighting the continued importance of human oversight in tasks requiring high-level abstraction. These results have significant educational implications, suggesting a shift toward developing human skills that complement AI, such as creativity, abstract reasoning, and critical thinking. This study emphasizes the transformative potential of AI in education and calls for a recalibration of educational goals, teaching methods, and curricula to align with an AI-driven world.


Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks

arXiv.org Machine Learning

We show that training deep neural networks (DNNs) with absolute value activation and arbitrary input dimension can be formulated as equivalent convex Lasso problems with novel features expressed using geometric algebra. This formulation reveals geometric structures encoding symmetry in neural networks. Using the equivalent Lasso form of DNNs, we formally prove a fundamental distinction between deep and shallow networks: deep networks inherently favor symmetric structures in their fitted functions, with greater depth enabling multilevel symmetries, i.e., symmetries within symmetries. Moreover, Lasso features represent distances to hyperplanes that are reflected across training points. These reflection hyperplanes are spanned by training data and are orthogonal to optimal weight vectors. Numerical experiments support theory and demonstrate theoretically predicted features when training networks using embeddings generated by Large Language Models. Recent advancements have demonstrated that deep neural networks are powerful models that can perform tasks including natural language processing, synthetic data and image generation, classification, and regression. However, research literature still lacks in intuitively understanding why deep networks are so powerful: what they "look for" in data, or in other words, how each layer extracts features. We are interested in the following question: Is there a fundamental difference in the nature of functions learned by deep networks, as opposed to shallow networks? We answer this question by transforming non-convex training problems into convex formulations and analyzing their structure.


Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

Neural Information Processing Systems

Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties has been highlighted as a promising strategy for drug design. Molecular docking program -- a physical simulation that estimates protein-small molecule binding affinity -- can be an ideal reward scoring function for RL, as it is a straightforward proxy of the therapeutic potential. Still, two imminent challenges exist for this task. First, the models often fail to generate chemically realistic and pharmacochemically acceptable molecules. Second, the docking score optimization is a difficult exploration problem that involves many local optima and less smooth surface with respect to molecular structure.


A Catalyst Framework for Minimax Optimization

Neural Information Processing Systems

We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives. Despite its simplicity, this leads to a family of near-optimal algorithms with improved complexity over all existing methods designed for strongly-convex-concave minimax problems. Additionally, we obtain the first variance-reduced algorithms for this class of minimax problems with finite-sum structure and establish even faster convergence rate. Furthermore, when extended to the nonconvex-concave minimax optimization, our algorithm again achieves the state-of-the-art complexity for finding a stationary point. We carry out several numerical experiments showcasing the superiority of the Catalyst framework in practice.


Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning

Neural Information Processing Systems

Search techniques, such as Monte Carlo Tree Search (MCTS) and Proof-Number Search (PNS), are effective in playing and solving games. However, the understanding of their performance in industrial applications is still limited. We investigate MCTS and Depth-First Proof-Number (DFPN) Search, a PNS variant, in the domain of Retrosynthetic Analysis (RA). We find that DFPN's strengths, that justify its success in games, have limited value in RA, and that an enhanced MCTS variant by Segler et al. significantly outperforms DFPN. We address this disadvantage of DFPN in RA with a novel approach to combine DFPN with Heuristic Edge Initialization. Our new search algorithm DFPN-E outperforms the enhanced MCTS in search time by a factor of 3 on average, with comparable success rates.


OWPCP: A Deep Learning Model to Predict Octanol-Water Partition Coefficient

arXiv.org Artificial Intelligence

The physicochemical properties of chemical compounds have great importance in several areas, including pharmaceuticals, environmental and separation science. Among these are physicochemical properties such as the octanol-water partition coefficient, which has been considered an important index pointing out lipophilicity and hydrophilicity. It affects drug absorption and membrane permeability. Following Lipinski's rule of five, logP was identified as one of the key determinants of the stability of chemical entities and, as such, needed state-of-the-art methods for measuring lipophilicity. This paper presents a deep-learning model, OWPCP, developed to compute logP using Morgan fingerprints and MACCS keys as input features. It uses the interconnection of such molecular representations with logP values extracted from 26,254 compounds. The dataset was prepared to contain a wide range of chemical structures with differing molecular weights and polar surface area. Hyperparameter optimization was conducted using the Keras Tuner alongside the Hyperband algorithm to enhance the performance. OWPCP demonstrated outstanding performance compared to current computational methods, achieving an MAE=0.247 on the test set and outperforming all previous DL models. Remarkably, while one of the most accurate recent models is based on experimental data on retention time to make predictions, OWPCP manages computing logP efficiently without depending on these factors, being, therefore, very useful during early-stage drug discovery. Our model outperforms the best model, which leverages Retention Time, and our model does not require any experimental data. Further validation of the model performance was done across different functional groups, and it showed very high accuracy, especially for compounds that contain aliphatic OH groups. The results have indicated that OWPCP provides a reliable prediction of logP.