Goto

Collaborating Authors

 Hogg, Tad


Humanity's Last Exam

arXiv.org Artificial Intelligence

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.


Chemical Power for Swarms of Microscopic Robots in Blood Vessels

arXiv.org Artificial Intelligence

Microscopic robots in the bloodstream could obtain power from fuel cells using glucose and oxygen. Previous studies of small numbers of such robots operating near each other showed how robots compete with their neighbors for oxygen. However, proposed applications involve billions of such robots operating throughout the body. With such large numbers, the robots can have systemic effects on oxygen concentration. This paper evaluates these effects and their consequences for robot power generation, oxygen available to tissue and heating as such robots move with the blood. When robots consume oxygen as fast as it diffuses to their surfaces, available power decreases significantly as robots move from the lungs, through arteries to capillaries and veins. Tens of billions of robots can obtain hundreds of picowatts throughout the circuit, while a trillion robots significantly deplete oxygen in the veins. Robots can mitigate this depletion by limiting their oxygen consumption, either overall or in specific locations or situations.


Quantum-assisted associative adversarial network: Applying quantum annealing in deep learning

arXiv.org Machine Learning

We present an algorithm for learning a latent variable generative model via generative adversarial learning where the canonical uniform noise input is replaced by samples from a graphical model. This graphical model is learned by a Boltzmann machine which learns low-dimensional feature representation of data extracted by the discriminator. A quantum annealer, the D-Wave 2000Q, is used to sample from this model. This algorithm joins a growing family of algorithms that use a quantum annealing subroutine in deep learning, and provides a framework to test the advantages of quantum-assisted learning in GANs. Fully connected, symmetric bipartite and Chimera graph topologies are compared on a reduced stochastically binarized MNIST dataset, for both classical and quantum annealing sampling methods. The quantum-assisted associative adversarial network successfully learns a generative model of the MNIST dataset for all topologies, and is also applied to the LSUN dataset bedrooms class for the Chimera topology. Evaluated using the Fr\'{e}chet inception distance and inception score, the quantum and classical versions of the algorithm are found to have equivalent performance for learning an implicit generative model of the MNIST dataset.


Quantifying the Impact of Cognitive Biases in Question-Answering Systems

AAAI Conferences

Crowdsourcing can identify high-quality solutions to problems; however, individual decisions are constrained by cognitive biases. We investigate some of these biases in an experimental model of a question-answering system. We observe a strong position bias in favor of answers appearing earlier in a list of choices. This effect is enhanced by three cognitive factors: the attention an answer receives, its perceived popularity, and cognitive load, measured by the number of choices a user has to process. While separately weak, these effects synergistically amplify position bias and decouple user choices of best answers from their intrinsic quality. We end our paper by discussing the novel ways we can apply these findings to substantially improve how high-quality answers are found in question-answering systems.


The DARPA Twitter Bot Challenge

arXiv.org Artificial Intelligence

A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.


Social Mechanics: An Empirically Grounded Science of Social Media

AAAI Conferences

What will social media sites of tomorrow look like? What behaviors will their interfaces enable? A major challenge for designing new sites that allow a broader range of user actions is the difficulty of extrapolating from experience with current sites without first distinguishing correlations from underlying causal mechanisms. The growing availability of data on user activities provides new opportunities to uncover correlations among user activity, contributed content and the structure of links among users. However, such correlations do not necessarily translate into predictive models. Instead, empirically grounded mechanistic models provide a stronger basis for establishing causal mechanisms and discovering the underlying statistical laws governing social behavior. We describe a statistical physics-based framework for modeling and analyzing social media and illustrate its application to the problems of prediction and inference. We hope these examples will inspire the research community to explore these methods to look for empirically valid causal mechanisms for the observed correlations.


Social Dynamics of Digg

AAAI Conferences

Online social media often highlight content that is highly rated by neighbors in a social network. For the news aggregator Digg, we use a stochastic model to distinguish the effect of the increased visibility from the network from how interesting content is to users. We find a wide range of interest, and distinguish stories primarily of interest to users in the network from those of more general interest to the user community. This distinction helps predict a story's eventual popularity from users' early reactions to the story.


Distributed Control of Microscopic Robots in Biomedical Applications

arXiv.org Artificial Intelligence

Current developments in molecular electronics, motors and chemical sensors could enable constructing large numbers of devices able to sense, compute and act in micron-scale environments. Such microscopic machines, of sizes comparable to bacteria, could simultaneously monitor entire populations of cells individually in vivo. This paper reviews plausible capabilities for microscopic robots and the physical constraints due to operation in fluids at low Reynolds number, diffusion-limited sensing and thermal noise from Brownian motion. Simple distributed controls are then presented in the context of prototypical biomedical tasks, which require control decisions on millisecond time scales. The resulting behaviors illustrate trade-offs among speed, accuracy and resource use. A specific example is monitoring for patterns of chemicals in a flowing fluid released at chemically distinctive sites. Information collected from a large number of such devices allows estimating properties of cell-sized chemical sources in a macroscopic volume. The microscopic devices moving with the fluid flow in small blood vessels can detect chemicals released by tissues in response to localized injury or infection. We find the devices can readily discriminate a single cell-sized chemical source from the background chemical concentration, providing high-resolution sensing in both time and space. By contrast, such a source would be difficult to distinguish from background when diluted throughout the blood volume as obtained with a blood sample.


A Dynamical Approach to Temporal Pattern Processing

Neural Information Processing Systems

W. Scott Stornetta Stanford University, Physics Department, Stanford, Ca., 94305 Tad Hogg and B. A. Huberman Xerox Palo Alto Research Center, Palo Alto, Ca. 94304 ABSTRACT Recognizing patterns with temporal context is important for such tasks as speech recognition, motion detection and signature verification. We propose an architecture in which time serves as its own representation, and temporal context is encoded in the state of the nodes. We contrast this with the approach of replicating portions of the architecture to represent time. As one example of these ideas, we demonstrate an architecture with capacitive inputs serving as temporal feature detectors in an otherwise standard back propagation model. Experiments involving motion detection and word discrimination serve to illustrate novel features of the system.


A Dynamical Approach to Temporal Pattern Processing

Neural Information Processing Systems

W. Scott Stornetta Stanford University, Physics Department, Stanford, Ca., 94305 Tad Hogg and B. A. Huberman Xerox Palo Alto Research Center, Palo Alto, Ca. 94304 ABSTRACT Recognizing patterns with temporal context is important for such tasks as speech recognition, motion detection and signature verification. We propose an architecture in which time serves as its own representation, and temporal context is encoded in the state of the nodes. We contrast this with the approach of replicating portions of the architecture to represent time. As one example of these ideas, we demonstrate an architecture with capacitive inputs serving as temporal feature detectors in an otherwise standard back propagation model. Experiments involving motion detection and word discrimination serve to illustrate novel features of the system.