Goto

Collaborating Authors

 Educational Setting


A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of Θ(T) and its Application to Best-of-Both-Worlds

Neural Information Processing Systems

Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment.


Table 3 List of key terms for reinforcement learning

Neural Information Processing Systems

C.3 Liquidation Analysis and Trade Execution Reproducing [8], We build a simulated environment of stock prices according to the Almgren and Chriss model. Then we implement the multi-agent DRL algorithms for both competing and cooperative liquidation strategies.


Appendix

Neural Information Processing Systems

This is the appendix of our work: 'Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data'. In this appendix, we provide more details of the proposed SFGC in terms of related works, potential application scenarios, dataset statistics, method analysis, and experimental settings with some additional results. Dataset Distillation (Condensation) aims to synthesize a small typical dataset that distills the most important knowledge from a given large target dataset, such that the synthesized small dataset could serve as an effective substitution of the large target dataset for various scenarios [30, 49], e.g., model training and inference, architecture search, and continue learning. Typically, DD [59] and DC-KRR [39] adopted the meta-learning framework to solve bi-level distillation objectives through calculating meta-gradients. In contrast, DC [77], DM [76], and MTT [4] designed surrogate functions to avoid unrolled optimization through the gradient matching, feature distribution matching, and training trajectory matching, respectively, where the core idea is to effectively mimic the large target dataset in the synthesized small dataset.


Make Continual Learning Stronger via C-Flat Ang Bian 1, Wei Li

Neural Information Processing Systems

How to balance the learning'sensitivity-stability' upon new task training and memory preserving is critical in CL to resolve catastrophic forgetting. Improving model generalization ability within each learning phase is one solution to help CL learning overcome the gap in the joint knowledge space. Zeroth-order loss landscape sharpness-aware minimization is a strong training regime improving model generalization in transfer learning compared with optimizer like SGD. It has also been introduced into CL to improve memory representation or learning efficiency. However, zeroth-order sharpness alone could favors sharper over flatter minima in certain scenarios, leading to a rather sensitive minima rather than a global optima. To further enhance learning stability, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code is available at https://github.com/WanNaa/C-Flat.


M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Neural Information Processing Systems

Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. M3Exam exhibits three unique characteristics: (1) multilingualism, encompassing questions from multiple countries that require strong multilingual proficiency and cultural knowledge; (2) multimodality, accounting for the multimodal nature of many exam questions to test the model's multimodal understanding capability; and (3) multilevel structure, featuring exams from three critical educational periods to comprehensively assess a model's proficiency at different levels. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23% of the questions require processing images for successful solving. We assess the performance of top-performing LLMs on M3Exam and find that current models, including GPT-4, still struggle with multilingual text, particularly in low-resource and non-Latin script languages. Multimodal LLMs also perform poorly with complex multimodal questions. We believe that M3Exam can be a valuable resource for comprehensively evaluating LLMs by examining their multilingual and multimodal abilities and tracking their development.


Develop valuable data visualization skills and learn to code for only 50

Popular Science

If you feel like tech advances have passed you by because you've never learned to code or use AI, you could not be more wrong. Thank goodness it's no longer necessary to return to school to develop new skills. You can now learn valuable data wrangling skills and learn how to code with the Microsoft Visual Studio Professional 2022 The Premium Learn to Code Certification Bundle. It should be no surprise that Microsoft Visual Studio Professional 2022 has a perfect 5-star rating on Microsoft Choice Software. The Live Share feature makes collaboration seamless, CodeLens provides deep insights from your code, and Intellicode tops it all off by allowing you to type less while coding more.


DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors Joseph Ortiz

Neural Information Processing Systems

Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents by avoiding the need for expensive online learning. Despite strong generalization in some respects, agents are often remarkably brittle to minor visual variations in control-irrelevant factors such as the background or camera viewpoint. In this paper, we present the DeepMind Control Vision Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents for solving continuous control tasks from visual input in the presence of visual distractors. In contrast to prior works, our dataset (a) combines locomotion and navigation tasks of varying difficulties, (b) includes static and dynamic visual variations, (c) considers data generated by policies with different skill levels, (d) systematically returns pairs of state and pixel observation, (e) is an order of magnitude larger, and (f) includes tasks with hidden goals. Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods. First, we find that pretrained representations do not help policy learning on DMC-VB, and we highlight a large representation gap between policies learned on pixel observations and on states. Second, we demonstrate when expert data is limited, policy learning can benefit from representations pretrained on (a) suboptimal data, and (b) tasks with stochastic hidden goals.


Hybrid Policy Optimization from Imperfect Demonstrations

Neural Information Processing Systems

Exploration is one of the main challenges in Reinforcement Learning (RL), especially in environments with sparse rewards. Learning from Demonstrations (LfD) is a promising approach to solving this problem by leveraging expert demonstrations. However, expert demonstrations of high quality are usually costly or even impossible to collect in real-world applications. In this work, we propose a novel RL algorithm called HYbrid Policy Optimization (HYPO), which uses a small number of imperfect demonstrations to accelerate an agent's online learning process. The key idea is to train an offline guider policy using imitation learning in order to instruct an online agent policy to explore efficiently. Through mutual update of the guider policy and the agent policy, the agent can leverage suboptimal demonstrations for efficient exploration while avoiding the conservative policy caused by imperfect demonstrations. Empirical results show that HYPO significantly outperforms several baselines in various challenging tasks, such as MuJoCo with sparse rewards, Google Research Football, and the AirSim drone simulation.


Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

Neural Information Processing Systems

With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, sociocultural perspectives. Our evaluation resources must be enhanced and calibrated by including people and experiences from different cultures and societies worldwide, to prevent gross underestimations or skewed measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource that contains stereotypes for axes of disparity uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.


Med-Real2Sim: Non-Invasive Medical Digital Twins using Physics-Informed Self-Supervised Learning David Ouyang UC Berkeley UC Berkeley & UCSF University of Barcelona Cedars Sinai Anthony Philippakis

Neural Information Processing Systems

A digital twin is a virtual replica of a real-world physical phenomena that uses mathematical modeling to characterize and simulate its defining features. By constructing digital twins for disease processes, we can perform in-silico simulations that mimic patients' health conditions and counterfactual outcomes under hypothetical interventions in a virtual setting. This eliminates the need for invasive procedures or uncertain treatment decisions. In this paper, we propose a method to identify digital twin model parameters using only noninvasive patient health data. We approach the digital twin modeling as a composite inverse problem, and observe that its structure resembles pretraining and finetuning in self-supervised learning (SSL). Leveraging this, we introduce a physics-informed SSL algorithm that initially pretrains a neural network on the pretext task of learning a differentiable simulator of a physiological process. Subsequently, the model is trained to reconstruct physiological measurements from noninvasive modalities while being constrained by the physical equations learned in pretraining. We apply our method to identify digital twins of cardiac hemodynamics using noninvasive echocardiogram videos, and demonstrate its utility in unsupervised disease detection and in-silico clinical trials.