Goto

Collaborating Authors

 Park, Jaehyun


Active Learning for Continual Learning: Keeping the Past Alive in the Present

arXiv.org Artificial Intelligence

Continual learning (CL) enables deep neural networks to adapt to ever-changing data distributions. In practice, there may be scenarios where annotation is costly, leading to active continual learning (ACL), which performs active learning (AL) for the CL scenarios when reducing the labeling cost by selecting the most informative subset is preferable. However, conventional AL strategies are not suitable for ACL, as they focus solely on learning the new knowledge, leading to catastrophic forgetting of previously learned tasks. Therefore, ACL requires a new AL strategy that can balance the prevention of catastrophic forgetting and the ability to quickly learn new tasks. In this paper, we propose AccuACL, Accumulated informativeness-based Active Continual Learning, by the novel use of the Fisher information matrix as a criterion for sample selection, derived from a theoretical analysis of the Fisher-optimality preservation properties within the framework of ACL, while also addressing the scalability issue of Fisher information-based AL. Extensive experiments demonstrate that AccuACL significantly outperforms AL baselines across various CL algorithms, increasing the average accuracy and forgetting by 23.8% and 17.0%, respectively, in average. Continual learning (CL), a learning scenario to adapt models continuously on evolving data distributions, is essential in our dynamic world (Thrun, 1995). Numerous CL methods have been advanced with the common goal of preserving past knowledge while acquiring new knowledge across the CL tasks (Abraham and Robins, 2005; Kim et al., 2023b; Mermillod et al., 2013). While most studies in CL assume that the evolving data distributions are fully labeled, this is rarely the case in practice.


DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

arXiv.org Artificial Intelligence

We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making. DIAR introduces an Adaptive Revaluation mechanism that dynamically adjusts decision lengths by comparing current and future state values, enabling flexible long-term decision-making. Furthermore, we address Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model. The diffusion model generates diverse latent trajectories, enhancing policy robustness and generalization. As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.


Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task

arXiv.org Artificial Intelligence

Effective long-term strategies enable AI systems to navigate complex environments by making sequential decisions over extended horizons. Similarly, reinforcement learning (RL) agents optimize decisions across sequences to maximize rewards, even without immediate feedback. To verify that Latent Diffusion-Constrained Q-learning (LDCQ), a prominent diffusion-based offline RL method, demonstrates strong reasoning abilities in multi-step decision-making, we aimed to evaluate its performance on the Abstraction and Reasoning Corpus (ARC). However, applying offline RL methodologies to enhance strategic reasoning in AI for solving tasks in ARC is challenging due to the lack of sufficient experience data in the ARC training set. To address this limitation, we introduce an augmented offline RL dataset for ARC, called Synthesized Offline Learning Data for Abstraction and Reasoning (SOLAR), along with the SOLAR-Generator, which generates diverse trajectory data based on predefined rules. SOLAR enables the application of offline RL methods by offering sufficient experience data. We synthesized SOLAR for a simple task and used it to train an agent with the LDCQ method. Our experiments demonstrate the effectiveness of the offline RL approach on a simple ARC task, showing the agent's ability to make multi-step sequential decisions and correctly identify answer states. These results highlight the potential of the offline RL approach to enhance AI's strategic reasoning capabilities.


Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

arXiv.org Artificial Intelligence

While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.


Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

arXiv.org Artificial Intelligence

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $\Omega(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $\Omega(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting.


Understanding Contrastive Learning Through the Lens of Margins

arXiv.org Artificial Intelligence

Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance.


Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer

arXiv.org Artificial Intelligence

In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.


ECO: Enabling Energy-Neutral IoT Devices through Runtime Allocation of Harvested Energy

arXiv.org Artificial Intelligence

Energy harvesting offers an attractive and promising mechanism to power low-energy devices. However, it alone is insufficient to enable an energy-neutral operation, which can eliminate tedious battery charging and replacement requirements. Achieving an energy-neutral operation is challenging since the uncertainties in harvested energy undermine the quality of service requirements. To address this challenge, we present a rollout-based runtime energy-allocation framework that optimizes the utility of the target device under energy constraints. The proposed framework uses an efficient iterative algorithm to compute initial energy allocations at the beginning of a day. The initial allocations are then corrected at every interval to compensate for the deviations from the expected energy harvesting pattern. We evaluate this framework using solar and motion energy harvesting modalities and American Time Use Survey data from 4772 different users. Compared to state-of-the-art techniques, the proposed framework achieves 34.6% higher utility even under energy-limited scenarios. Moreover, measurements on a wearable device prototype show that the proposed framework has less than 0.1% energy overhead compared to iterative approaches with a negligible loss in utility.