Instructional Material
Online Decision Mediation
Jarrett, Daniel, Hüyük, Alihan, van der Schaar, Mihaela
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to *accept* that agent's decision, *intervene* with an alternative, or *request* the expert's opinion. For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting. Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of *online decision mediation* -- that is, of simultaneously learning and evaluating mediator policies from scratch with *abstentive feedback*: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail. Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.
GPT-4 Vision on Medical Image Classification -- A Case Study on COVID-19 Dataset
Chen, Ruibo, Xiong, Tianyi, Wu, Yihan, Liu, Guodong, Hu, Zhengmian, Chen, Lichang, Chen, Yanshuo, Liu, Chenxi, Huang, Heng
In the intricate landscape of modern healthcare, medical image classification emerges as a pivotal task, driving crucial decisions in diagnosis, treatment planning, and patient management. This process involves the systematic categorization of various types of medical imagery--including X-rays, CT scans, MRIs, and ultrasound--into distinct classes that assist healthcare professionals in identifying anomalies, understanding physiological phenomena, and detecting diseases at early stages. The reliability and precision of image classification are paramount, given that these determinations form the bedrock upon which medical practitioners build their diagnostic and therapeutic strategies, directly impacting patient outcomes. With an increasing influx of complex imaging data and a growing need for rapid, accurate interpretation, the medical sector faces significant pressure to evolve beyond traditional analysis methods, necessitating innovative solutions that enhance the efficiency and accuracy of image classification. The advent of large foundation models in artificial intelligence has ushered in a transformative era of computational capabilities. These models, characterized by their extensive scale, diverse training datasets, and impressive adaptability, have demonstrated profound impacts across various domains.
Expanding the Set of Pragmatic Considerations in Conversational AI
Seals, S. M., Shalin, Valerie L.
Despite considerable performance improvements, current conversational AI systems often fail to meet user expectations. We discuss several pragmatic limitations of current conversational AI systems. We illustrate pragmatic limitations with examples that are syntactically appropriate, but have clear pragmatic deficiencies. We label our complaints as "Turing Test Triggers" (TTTs) as they indicate where current conversational AI systems fall short compared to human behavior. We develop a taxonomy of pragmatic considerations intended to identify what pragmatic competencies a conversational AI system requires and discuss implications for the design and evaluation of conversational AI systems.
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning
Chaudhary, Aditi, Sampath, Arun, Sheshadri, Ashwin, Anastasopoulos, Antonios, Neubig, Graham
One of the challenges in language teaching is how best to organize rules regarding syntax, semantics, or phonology in a meaningful manner. This not only requires content creators to have pedagogical skills, but also have that language's deep understanding. While comprehensive materials to develop such curricula are available in English and some broadly spoken languages, for many other languages, teachers need to manually create them in response to their students' needs. This is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) describing all the intricacies of a language is time-consuming and prone to omission. In this work, we aim to facilitate this process by automatically discovering and visualizing grammar descriptions. We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary). We apply this method for teaching two Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed resources for second language learning. To assess the perceived utility of the extracted material, we enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving
Wang, Karen D., Burkholder, Eric, Wieman, Carl, Salehi, Shima, Haber, Nick
The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of 40 problems from a college-level engineering physics course. These problems ranged from well-specified problems, where all data required for solving the problem was provided, to under-specified, real-world problems where not all necessary data were given. Our findings show that ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems. Analysis of the model's incorrect solutions revealed three distinct failure modes: 1) failure to construct accurate models of the physical world, 2) failure to make reasonable assumptions about missing data, and 3) calculation errors. The study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI's strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.
Learning Space-Time Continuous Neural PDEs from Partially Observed States
Iakovlev, Valerii, Heinonen, Markus, Lähdesmäki, Harri
We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.
CEIL: Generalized Contextual Imitation Learning
Liu, Jinxin, He, Li, Kang, Yachen, Zhuang, Zifeng, Wang, Donglin, Xu, Huazhe
In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Jin, Tiancheng, Liu, Junyan, Rouyer, Chloé, Chang, William, Wei, Chen-Yu, Luo, Haipeng
Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves $\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses.
A Multilingual Virtual Guide for Self-Attachment Technique
Law, Alicia Jiayun, Hu, Ruoyu, Alazraki, Lisa, Gopalan, Anandha, Polydorou, Neophytos, Edalat, Abbas
In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting available response data through empathetic rewriting. We evaluate our chatbot against a previous, English-only SAT chatbot through non-clinical human trials (N=42), each lasting five days, and quantitatively show that we are able to attain a comparable level of performance to the English SAT chatbot. We provide qualitative analysis on the limitations of our study and suggestions with the aim of guiding future improvements.