AITopics

2503.10219

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

arXiv.org Artificial IntelligenceDec-3-2024

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Koh, Woosung, Oh, Wonbeen, Kim, Siyeol, Shin, Suhin, Kim, Hyeongjin, Jang, Jaein, Lee, Junghyun, Yun, Se-Young

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-\`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

machine learning, natural language, reinforcement learning, (16 more...)

2410.15876

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.67)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.67)
Energy > Oil & Gas > Midstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJan-18-2024

Querying Easily Flip-flopped Samples for Deep Active Learning

Cho, Seong Jin, Kim, Gwangsu, Lee, Junghyun, Shin, Jinwoo, Yoo, Chang D.

Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.

algorithm, artificial intelligence, machine learning, (14 more...)

2401.09787

Country:

Europe (0.67)
North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJan-8-2024

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

Phunyaphibarn, Prin, Lee, Junghyun, Wang, Bohan, Zhang, Huishuai, Yun, Chulhee

Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive. In this work, we empirically show that momentum gradient descent with a large learning rate and learning rate warmup displays large catapults, driving the iterates towards flatter minima than those found by gradient descent. We then provide empirical evidence and theoretical intuition that the large catapult is caused by momentum "amplifying" the self-stabilization effect (Damian et al., 2023).B.1

artificial intelligence, machine learning, phb, (16 more...)

2311.15051

Country:

Asia > China (0.46)
North America > United States > New York (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

arXiv.org Machine LearningNov-6-2023

Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex Networks

Lee, Junghyun, Schmid, Laura, Yun, Se-Young

Multi-armed bandits are extensively used to model sequential decision-making, making them ubiquitous in many real-life applications such as online recommender systems and wireless networking. We consider a multi-agent setting where each agent solves their own bandit instance endowed with a different set of arms. Their goal is to minimize their group regret while collaborating via some communication protocol over a given network. Previous literature on this problem only considered arm heterogeneity and networked agents separately. In this work, we introduce a setting that encompasses both features. For this novel setting, we first provide a rigorous regret analysis for a standard flooding protocol combined with the classic UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding in complex networks, we propose a new protocol called Flooding with Absorption (FwA). We provide a theoretical analysis of the resulting regret bound and discuss the advantages of using FwA over flooding. Lastly, we experimentally verify on various scenarios, including dynamic networks, that FwA leads to significantly lower communication costs despite minimal regret performance loss compared to other network protocols.

artificial intelligence, flooding, machine learning, (15 more...)

2303.05445

Country:

Europe (0.46)
North America > United States (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningOct-28-2023

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Lee, Junghyun, Cho, Hanseul, Yun, Se-Young, Yun, Chulhee

Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

data mining, machine learning, mean & cov, (17 more...)

2310.18593

Country:

Europe (0.67)
Asia > Middle East (0.67)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report (0.49)
Overview (0.46)

Industry:

Education (0.46)
Law (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

arXiv.org Machine LearningOct-27-2023

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

Lee, Junghyun, Yun, Se-Young, Jun, Kwang-Sung

Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or neglect dependencies in $S \geq \lVert \theta_\star \rVert_2$, where $\theta_\star \in \mathbb{R}^d$ is the unknown parameter vector, which is particularly problematic when $S$ is large, e.g., $S \geq d$. In this work, we improve the dependency on $S$ via a novel approach called {\it regret-to-confidence set conversion (R2CS)}, which allows us to construct a convex confidence set based on only the \textit{existence} of an online learning algorithm with a regret guarantee. Using R2CS, we obtain a strict improvement in the regret bound w.r.t. $S$ in logistic bandits while retaining computational feasibility and the dependence on other factors such as $d$ and $T$. We apply our new confidence set to the regret analyses of logistic bandits with a new martingale concentration step that circumvents an additional factor of $S$. We then extend this analysis to multinomial logistic bandits and obtain similar improvements in the regret, showing the efficacy of R2CS. While we applied R2CS to the (multinomial) logistic model, R2CS is a generic approach for developing confidence sets that can be used for various models, which can be of independent interest.

artificial intelligence, machine learning, proceedings, (15 more...)

2310.18554

Country:

North America > United States > Arizona (0.14)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-16-2023

Optimizing Layerwise Polynomial Approximation for Efficient Private Inference on Fully Homomorphic Encryption: A Dynamic Programming Approach

Lee, Junghyun, Lee, Eunsang, Kim, Young-Sik, Lee, Yongwoo, Lee, Joon-Woo, Kim, Yongjune, No, Jong-Seon

Recent research has explored the implementation of privacy-preserving deep neural networks solely using fully homomorphic encryption. However, its practicality has been limited because of prolonged inference times. When using a pre-trained model without retraining, a major factor contributing to these prolonged inference times is the high-degree polynomial approximation of activation functions such as the ReLU function. The high-degree approximation consumes a substantial amount of homomorphic computational resources, resulting in slower inference. Unlike the previous works approximating activation functions uniformly and conservatively, this paper presents a \emph{layerwise} degree optimization of activation functions to aggressively reduce the inference time while maintaining classification accuracy by taking into account the characteristics of each layer. Instead of the minimax approximation commonly used in state-of-the-art private inference models, we employ the weighted least squares approximation method with the input distributions of activation functions. Then, we obtain the layerwise optimized degrees for activation functions through the \emph{dynamic programming} algorithm, considering how each layer's approximation error affects the classification accuracy of the deep neural network. Furthermore, we propose modulating the ciphertext moduli-chain layerwise to reduce the inference time. By these proposed layerwise optimization methods, we can reduce inference times for the ResNet-20 model and the ResNet-32 model by 3.44 times and 3.16 times, respectively, in comparison to the prior implementations employing uniform degree polynomials and a consistent ciphertext modulus.

artificial intelligence, machine learning, polynomial, (18 more...)

2310.10349

Country:

Asia > South Korea (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)

arXiv.org Artificial IntelligenceFeb-24-2023

Nearly Optimal Latent State Decoding in Block MDPs

Jedra, Yassir, Lee, Junghyun, Proutière, Alexandre, Yun, Se-Young

We investigate the problems of model estimation and reward-free learning in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP. We then study the problem of learning near-optimal policies in the reward-free framework. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible rate. Interestingly, our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of possible contexts.

machine learning, optimal latent state decoding, reinforcement learning, (13 more...)

2208.0848

Country:

Europe (1.00)
North America > United States (0.92)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

arXiv.org Machine LearningJul-7-2020

Unsupervised CT Metal Artifact Learning using Attention-guided beta-CycleGAN

Lee, Junghyun, Gu, Jawook, Ye, Jong Chul

Metal artifact reduction (MAR) is one of the most important research topics in computed tomography (CT). With the advance of deep learning technology for image reconstruction,various deep learning methods have been also suggested for metal artifact removal, among which supervised learning methods are most popular. However, matched non-metal and metal image pairs are difficult to obtain in real CT acquisition. Recently, a promising unsupervised learning for MAR was proposed using feature disentanglement, but the resulting network architecture is complication and difficult to handle large size clinical images. To address this, here we propose a much simpler and much effective unsupervised MAR method for CT. The proposed method is based on a novel beta-cycleGAN architecture derived from the optimal transport theory for appropriate feature space disentanglement. Another important contribution is to show that attention mechanism is the key element to effectively remove the metal artifacts. Specifically, by adding the convolutional block attention module (CBAM) layers with a proper disentanglement parameter, experimental results confirm that we can get more improved MAR that preserves the detailed texture of the original image.

artifact, deep learning, educational technology, (19 more...)

2007.0348

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)