Goto

Collaborating Authors

 Education


ChatGPT trounces humans in entrance exams for top Japan university, study finds

The Japan Times

AI models surpassed the highest score recorded for a human test taker in this year's University of Tokyo entrance exam, a new study shows. If an artificial intelligence model such as ChatGPT had taken the entrance exams for Japan's top university in 2026, it would have been assessed as top of the class and admitted for scoring higher than any human test takers, a study by AI startup LifePrompt has found. The research used three major AI models -- ChatGPT 5.2 Thinking by OpenAI, Gemini 3 Pro Preview by Google and Claude Opus 4.5 by Anthropic -- and had them take the actual entrance exam used by the University of Tokyo in February 2026 to assess candidates for courses set to start in April. The university's category 3 science exam, often taken by those who want to enter the institution's medical school, is considered the most difficult exam to pass in Japan. In a time of both misinformation and too much information, quality journalism is more crucial than ever.






Knowledge Distillation Performs Partial Variance Reduction

Neural Information Processing Systems

Knowledge distillation is a popular approach for enhancing the performance of "student" models, with lower representational capacity, by taking advantage of more powerful "teacher" models. Despite its apparent simplicity and widespread use, the underlying mechanics behind knowledge distillation (KD) are still not fully understood. In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. We show that, in the context of linear and deep linear models, KD can be interpreted as a novel type of stochastic variance reduction mechanism. We provide a detailed convergence analysis of the resulting dynamics, which hold under standard assumptions for both strongly-convex and non-convex losses, showing that KD acts as a form of partial variance reduction, which can reduce the stochastic gradient noise, but may not eliminate it completely, depending on the properties of the "teacher" model. Our analysis puts further emphasis on the need for careful parametrization of KD, in particular w.r.t. the weighting of the distillation loss, and is validated empirically on both linear models and deep neural networks.


Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift

Neural Information Processing Systems

Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.



Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks Supplementary Materials

Neural Information Processing Systems

The source code of Minigrid and Miniworld can be found at https://github.com/ To run the experiments, we have implemented the following functionalities: 1. implemented the human trajectory saving for MiniGrid-FourRooms-v0 (copied the ManualControlclass from Minigrid and added 38 lines of code, which are mostly calling data saving functions); 2. implemented the human trajectory saving for MiniWorld-FourRooms-v0 (copied the ManualControlclass from Miniworld and added 45 lines of code, which are mostly calling data saving functions); 3. implemented data saving and plotting for MiniGrid-FourRooms-v0 (33 lines of code, mostly for Matplotlib); 4. implemented data saving and plotting for MiniWorld-FourRooms-v0 (33 lines of code, mostly for Matplotlib). In total, the implementation of this new functionality required 149 lines of code. The source code is hosted on GitHub. We bear all the responsibility in case of violation of rights.


Vision Model: Frozen, GIT, CoCa, VCAudio Model: WavCaps AC

Neural Information Processing Systems

Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video caption dataset called VAST-27M. Specifically, we first collect 27 million opendomain video clips and separately train a vision and an audio captioner to generate vision and audio captions. Then, we employ an off-the-shelf Large Language Model (LLM) to integrate the generated captions, together with subtitles and instructional prompts into omni-modality captions. Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA). Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks.