Zheng, Wei-Long
Combining Induction and Transduction for Abstract Reasoning
Li, Wen-Ding, Hu, Keya, Larsen, Carter, Wu, Yuqing, Alford, Simon, Woo, Caleb, Dunn, Spencer M., Tang, Hao, Naim, Michelangelo, Nguyen, Dat, Zheng, Wei-Long, Tavares, Zenna, Pu, Yewen, Ellis, Kevin
When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e.g. using a neural network? We study this question on ARC by training neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input). We train on synthetically generated variations of Python programs that solve ARC training tasks. We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture: Inductive program synthesis excels at precise computations, and at composing multiple concepts, while transduction succeeds on fuzzier perceptual concepts. Ensembling them approaches human-level performance on ARC.
Focused State Recognition Using EEG with Eye Movement-Assisted Annotation
Li, Tian-Hua, Ma, Tian-Fang, Peng, Dan, Zheng, Wei-Long, Lu, Bao-Liang
With the rapid advancement in machine learning, the recognition and analysis of brain activity based on EEG and eye movement signals have attained a high level of sophistication. Utilizing deep learning models for learning EEG and eye movement features proves effective in classifying brain activities. A focused state indicates intense concentration on a task or thought. Distinguishing focused and unfocused states can be achieved through eye movement behaviors, reflecting variations in brain activities. By calculating binocular focusing point disparity in eye movement signals and integrating relevant EEG features, we propose an annotation method for focused states. The resulting comprehensive dataset, derived from raw data processed through a bio-acquisition device, includes both EEG features and focused labels annotated by eye movements. Extensive training and testing on several deep learning models, particularly the Transformer, yielded a 90.16% accuracy on the subject-dependent experiments. The validity of this approach was demonstrated, with cross-subject experiments, key frequency band and brain region analyses confirming its generalizability and providing physiological explanations.
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Tang, Hao, Hu, Keya, Zhou, Jin Peng, Zhong, Sicheng, Zheng, Wei-Long, Si, Xujie, Ellis, Kevin
Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively refine code, with prior work employing simple greedy or breadth-first strategies. We show here that refinement exposes an explore-exploit tradeoff: exploit by refining the program that passes the most test cases, or explore by refining a lesser considered program. We frame this as an arm-acquiring bandit problem, which we solve with Thompson Sampling. The resulting LLM-based program synthesis algorithm is broadly applicable: Across loop invariant synthesis, visual reasoning puzzles, and competition programming problems, we find that our new method can solve more problems using fewer language model calls.
Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals
Lan, Yu-Ting, Ren, Kan, Wang, Yansen, Zheng, Wei-Long, Li, Dongsheng, Lu, Bao-Liang, Qiu, Lili
Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to visual stimuli reconstruction by reconstructing the observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notorious to be noisy, processing and extracting useful information requires more dedicated efforts; In this paper, we propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals. Specifically, we incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data. A latent diffusion model will then leverage the extracted information to reconstruct the high-resolution visual stimuli images. The experimental results have illustrated the effectiveness of image reconstruction and superior quantitative performance of our proposed method.
Combining Eye Movements and EEG to Enhance Emotion Recognition
Lu, Yifei (Shanghai Jiao Tong University) | Zheng, Wei-Long (Shanghai Jiao Tong University) | Li, Binbin (Shanghai Jiao Tong University) | Lu, Bao-Liang (Shanghai Jiao Tong University)
In this paper, we adopt a multimodal emotion recognition framework by combining eye movements and electroencephalography (EEG) to enhance emotion recognition. The main contributions of this paper are twofold. a) We investigate sixteen eye movements related to emotions and identify the intrinsic patterns of these eye movements for three emotional states: positive, neutral and negative. b) We examine various modality fusion strategies for integrating users external subconscious behaviors and internal cognitive states and reveal that the characteristics of eye movements and EEG are complementary to emotion recognition. Experiment results demonstrate that modality fusion could significantly improve emotion recognition accuracy in comparison with single modality. The best accuracy achieved by fuzzy integral fusion strategy is 87.59%, whereas the accuracies of solely using eye movements and EEG data are 77.80% and 78.51%, respectively.