Lee, Yoonsang
Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling
Lee, Yoonsang
Nonlinear Bayesian update for a prior ensemble is proposed to extend traditional ensemble Kalman filtering to settings characterized by non-Gaussian priors and nonlinear measurement operators. In this framework, the observed component is first denoised via a standard Kalman update, while the unobserved component is estimated using a nonlinear regression approach based on kernel density estimation. The method incorporates a subsampling strategy to ensure stability and, when necessary, employs unsupervised clustering to refine the conditional estimate. Numerical experiments on Lorenz systems and a PDE-constrained inverse problem illustrate that the proposed nonlinear update can reduce estimation errors compared to standard linear updates, especially in highly nonlinear scenarios.
Entropy stable conservative flux form neural networks
Liu, Lizuo, Li, Tongtong, Gelb, Anne, Lee, Yoonsang
We propose an entropy-stable conservative flux form neural network (CFN) that integrates classical numerical conservation laws into a data-driven framework using the entropy-stable, second-order, and non-oscillatory Kurganov-Tadmor (KT) scheme. The proposed entropy-stable CFN uses slope limiting as a denoising mechanism, ensuring accurate predictions in both noisy and sparse observation environments, as well as in both smooth and discontinuous regions. Numerical experiments demonstrate that the entropy-stable CFN achieves both stability and conservation while maintaining accuracy over extended time domains. Furthermore, it successfully predicts shock propagation speeds in long-term simulations, {\it without} oracle knowledge of later-time profiles in the training data.
RARe: Retrieval Augmented Retrieval with In-Context Examples
Tejaswi, Atula, Lee, Yoonsang, Sanghavi, Sujay, Choi, Eunsol
We investigate whether in-context examples, widely used in decoder-only language models (LLMs), can improve embedding model performance in retrieval tasks. Unlike in LLMs, naively prepending in-context examples (query-document pairs) to the target query at inference time does not work out of the box. We introduce a simple approach to enable retrievers to use in-context examples. Our approach, RARe, finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query. This can be applied to adapt various base architectures (i.e., decoder-only language models, retriever models) and consistently achieves performance gains of up to +2.72% nDCG across various open-domain retrieval datasets (BeIR, RAR-b). In particular, we find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples, similar to what is seen for in-context learning in LLMs. We further provide analysis on the design choices of in-context example augmentation and lay the foundation for future work in this space. In-context learning (ICL) (Brown et al., 2020) has emerged as a powerful paradigm enabling diverse applications without parameter updates in large language models (LLMs). By conditioning on inputoutput examples that demonstrate a specific task, LLMs can generate predictions while maintaining fixed parameters. While in-context learning has been extensively studied for LLMs (Xu et al., 2023; Min et al., 2022a; Dong et al., 2024), its potential for retriever models remains unexplored.
Disentangling Questions from Query Generation for Task-Adaptive Retrieval
Lee, Yoonsang, Kim, Minsoo, Hwang, Seung-won
This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
AmbigDocs: Reasoning across Documents on Different Entities under the Same Name
Lee, Yoonsang, Ye, Xi, Choi, Eunsol
Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.
Stochastic approach for elliptic problems in perforated domains
Han, Jihun, Lee, Yoonsang
A wide range of applications in science and engineering involve a PDE model in a domain with perforations, such as perforated metals or air filters. Solving such perforated domain problems suffers from computational challenges related to resolving the scale imposed by the geometries of perforations. We propose a neural network-based mesh-free approach for perforated domain problems. The method is robust and efficient in capturing various configuration scales, including the averaged macroscopic behavior of the solution that involves a multiscale nature induced by small perforations. The new approach incorporates the derivative-free loss method that uses a stochastic representation or the Feynman-Kac formulation. In particular, we implement the Neumann boundary condition for the derivative-free loss method to handle the interface between the domain and perforations. A suite of stringent numerical tests is provided to support the proposed method's efficacy in handling various perforation scales.
Adaptive Tracking of a Single-Rigid-Body Character in Various Environments
Kwon, Taesoo, Gu, Taehong, Ahn, Jaewon, Lee, Yoonsang
Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
Crafting In-context Examples according to LMs' Parametric Knowledge
Lee, Yoonsang, Atreya, Pranav, Ye, Xi, Choi, Eunsol
In-context learning has been applied to knowledge-rich tasks such as question answering. In such scenarios, in-context examples are used to trigger a behaviour in the language model: namely, it should surface information stored in its parametric knowledge. We study the construction of in-context example sets, with a focus on the parametric knowledge of the model regarding in-context examples. We identify 'known' examples, where models can correctly answer from its parametric knowledge, and 'unknown' ones. Our experiments show that prompting with 'unknown' examples decreases the performance, potentially as it encourages hallucination rather than searching its parametric knowledge. Constructing an in-context example set that presents both known and unknown information performs the best across diverse settings. We perform analysis on three multi-answer question answering datasets, which allows us to further study answer set ordering strategies based on the LM's knowledge about each answer. Together, our study sheds lights on how to best construct in-context example sets for knowledge-rich tasks.
Learning In-between Imagery Dynamics via Physical Latent Spaces
Han, Jihun, Lee, Yoonsang, Gelb, Anne
Understanding image dynamics from a set of complex measurement data is important in many applications, from the diagnosis or monitoring of a disease done by analyzing a series of medical (e.g. MRI or ultrasound) images, [28], to the interpretation of a sequence of satellite images used to study climate changes, natural disaster, or environmental conditions [2]. Here an "image" refers to a high-dimensional data frame that contains complex and condensed information within each pixel where these pixels are also spatially correlated. To understand the underlying dynamics between sequential images, therefore, it is essential to simultaneously decipher the intertwined relationship among their spatial and temporal features. A common approach for understanding such spatio-temporal dynamics involves the employment of physical models such as differential equations (DEs). By using the observed data to estimate the parameters in these corresponding DEs, it is possible to gain physical insights regarding their evolution [12, 20]. However, directly applying such techniques to image dynamics is of limited use due to the intricate description that would be required by a suitable prior model, the highly nonlinear relationship among pixels, and the computational complexities arising from the high dimensionality of the images.
An analysis of the derivative-free loss method for solving PDEs
Han, Jihun, Lee, Yoonsang
The neural network is well known for its flexibility to represent complicated functions in a highdimensional space [3, 9]. In recent years, this strong property of the neural network has naturally led to representing the solution of partial differential equations (PDEs). Physics-informed neural network [16] and Deep Galerkin [17] use the strong form of the PDE to define the training loss, while the Deep Ritz [4] method uses a weak (or variational) formulation of PDEs to train the network. Also, a class of methods uses a stochastic representation of PDEs to train the neural network [5, 8]. All these methods have shown successful results in a wide range of problems in science and engineering, particularly for high-dimensional problems where the standard numerical PDE methods have limitations [5, 17, 2]. The goal of the current study is an analysis of the derivative-free loss method (DFLM; [8]). DFLM employs a stochastic representation of the solution for a certain class of PDEs, averaging stochastic samples as a generalized Feynman-Kac formulation. The loss formulation of DFLM directly guides a neural network to learn the point-to-neighborhood relationships of the solution. DFLM adopts bootstrapping in the context of reinforcement learning, where the neural network's target function is computed based on its current state through the point-to-neighborhood relation.