AITopics | Lee, Minjae

Collaborating Authors

Lee, Minjae

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

Kang, Wonjun, Galim, Kevin, Zeng, Yuchen, Lee, Minjae, Koo, Hyung Il, Cho, Nam Ik

arXiv.org Artificial IntelligenceMar-5-2025

State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.

large language model, machine learning, state-offset tuning, (18 more...)

arXiv.org Artificial Intelligence

2503.03499

Country:

North America > United States > Michigan (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Zeng, Thomas, Zhang, Shuibai, Wu, Shutong, Classen, Christian, Chae, Daewon, Ewer, Ethan, Lee, Minjae, Kim, Heeju, Kang, Wonjun, Kunde, Jackson, Fan, Ying, Kim, Jungtaek, Koo, Hyung Il, Ramchandran, Kannan, Papailiopoulos, Dimitris, Lee, Kangwook

arXiv.org Artificial IntelligenceFeb-10-2025

In particular, Outcome Reward Models (ORMs) are Process Reward Models (PRMs) have proven used to provide supervision based solely on the correctness effective at enhancing mathematical reasoning of the final outcome. However, ORMs fail to address errors for Large Language Models (LLMs) by leveraging in intermediate steps, limiting their effectiveness for increased inference-time computation. However, complex, multi-step reasoning tasks (Luo et al., 2024; Lightman they are predominantly trained on mathematical et al., 2024; Sun et al., 2024). Because ORMs suffer data and their generalizability to nonmathematical from this limitation, Process Reward Models (PRMs) have domains has not been rigorously been proposed to offer fine-grained, step-by-step feedback studied. In response, this work first shows that on the correctness of each reasoning step (Lightman et al., current PRMs have poor performance in other 2024; Uesato et al., 2022). PRMs have proven highly effective domains. To address this limitation, we introduce during inference, improving the reranking of generated VersaPRM, a multi-domain PRM trained solutions and guiding LLMs through search-based on synthetic reasoning data generated using our algorithms (Wan et al., 2024; Wang et al., 2024a).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.06737

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System

Kwon, Hyucksung, Koo, Kyungmo, Kim, Janghyeon, Lee, Woongkyu, Lee, Minjae, Lee, Hyungdeok, Jung, Yousub, Park, Jaehan, Song, Yosub, Yang, Byeongsu, Choi, Haerang, Kim, Guhyun, Won, Jongsoon, Shin, Woojae, Kim, Changhyun, Shin, Gyeongcheol, Kwon, Yongkee, Kim, Ilkon, Lim, Euicheol, Kim, John, Choi, Jungwook

arXiv.org Artificial IntelligenceJan-14-2025

The expansion of large language models (LLMs) with hundreds of billions of parameters presents significant challenges to computational resources, particularly data movement and memory bandwidth. Long-context LLMs, which process sequences of tens of thousands of tokens, further increase the demand on the memory system as the complexity in attention layers and key-value cache sizes is proportional to the context length. Processing-in-Memory (PIM) maximizes memory bandwidth by moving compute to the data and can address the memory bandwidth challenges; however, PIM is not necessarily scalable to accelerate long-context LLM because of limited per-module memory capacity and the inflexibility of fixed-functional unit PIM architecture and static memory management. In this work, we propose LoL-PIM which is a multi-node PIM architecture that accelerates long context LLM through hardware-software co-design. In particular, we propose how pipeline parallelism can be exploited across a multi-PIM module while a direct PIM access (DPA) controller (or DMA for PIM) is proposed that enables dynamic PIM memory management and results in efficient PIM utilization across a diverse range of context length. We developed an MLIR-based compiler for LoL-PIM extending a commercial PIM-based compiler where the software modifications were implemented and evaluated, while the hardware changes were modeled in the simulator. Our evaluations demonstrate that LoL-PIM significantly improves throughput and reduces latency for long-context LLM inference, outperforming both multi-GPU and GPU-PIM systems (up to 8.54x and 16.0x speedup, respectively), thereby enabling more efficient deployment of LLMs in real-world applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.20166

Country:

North America > United States > California > San Diego County (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Just Say the Name: Online Continual Learning with Category Names Only via Data Generation

Seo, Minhyuk, Misra, Diganta, Cho, Seongwon, Lee, Minjae, Choi, Jonghyun

arXiv.org Artificial IntelligenceApr-30-2024

In real-world scenarios, extensive manual annotation for continual learning is impractical due to prohibitive costs. Although prior arts, influenced by large-scale webly supervised training, suggest leveraging web-scraped data in continual learning, this poses challenges such as data imbalance, usage restrictions, and privacy concerns. Addressing the risks of continual webly supervised training, we present an online continual learning framework - Generative Name only Continual Learning (G-NoCL). The proposed G-NoCL uses a set of generators G along with the learner. When encountering new concepts (i.e., classes), G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data. Through extensive experimentation, we demonstrate superior performance of DISCOBER in G-NoCL online CL benchmarks, covering both In-Distribution (ID) and Out-of-Distribution (OOD) generalization evaluations, compared to naive generator-ensembling, web-supervised, and manually annotated data.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.10853

Genre:

Research Report (1.00)
Instructional Material > Online (0.62)

Industry:

Information Technology > Security & Privacy (1.00)
Media (0.93)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
(4 more...)

Add feedback

Learning Equi-angular Representations for Online Continual Learning

Seo, Minhyuk, Koh, Hyunseo, Jeung, Wonje, Lee, Minjae, Kim, San, Lee, Hankook, Cho, Sungjun, Choi, Sungik, Kim, Hyunwoo, Choi, Jonghyun

arXiv.org Artificial IntelligenceApr-2-2024

Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups.

artificial intelligence, machine learning, neural collapse, (18 more...)

arXiv.org Artificial Intelligence

2404.01628

Country:

North America > United States (0.14)
Asia (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View

Lee, Minjae, Kang, Kyeongsu, Yu, Hyeonwoo

arXiv.org Artificial IntelligenceSep-15-2023

With the advent of Neural Radiance Field (NeRF), representing 3D scenes through multiple observations has shown remarkable improvements in performance. Since this cutting-edge technique is able to obtain high-resolution renderings by interpolating dense 3D environments, various approaches have been proposed to apply NeRF for the spatial understanding of robot perception. However, previous works are challenging to represent unobserved scenes or views on the unexplored robot trajectory, as these works do not take into account 3D reconstruction without observation information. To overcome this problem, we propose a method to generate flipped observation in order to cover unexisting observation for unexplored robot trajectory. To achieve this, we propose a data augmentation method for 3D reconstruction using NeRF by flipping observed images, and estimating flipped camera 6DOF poses. Our technique exploits the property of objects being geometrically symmetric, making it simple but fast and powerful, thereby making it suitable for robotic applications where real-time performance is important. We demonstrate that our method significantly improves three representative perceptual quality measures on the NeRF synthetic dataset.

artificial intelligence, conference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.06335

Country:

Asia > South Korea (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Necessity Feature Correspondence Estimation for Large-scale Global Place Recognition and Relocalization

Kang, Kyeongsu, Lee, Minjae, Yu, Hyeonwoo

arXiv.org Artificial IntelligenceSep-15-2023

Global place recognition and 3D relocalization are one of the most important components in the loop closing detection for 3D LiDAR Simultaneous Localization and Mapping (SLAM). In order to find the accurate global 6-DoF transform by feature matching approach, various end-to-end architectures have been proposed. However, existing methods do not consider the false correspondence of the features, thereby unnecessary features are also involved in global place recognition and relocalization. In this paper, we introduce a robust correspondence estimation method by removing unnecessary features and highlighting necessary features simultaneously. To focus on the necessary features and ignore the unnecessary ones, we use the geometric correlation between two scenes represented in the 3D LiDAR point clouds. We introduce the correspondence auxiliary loss that finds key correlations based on the point align algorithm and enables end-to-end training of the proposed networks with robust correspondence estimation. Since the ground with many plane patches acts as an outlier during correspondence estimation, we also propose a preprocessing step to consider negative correspondence by removing dominant plane patches. The evaluation results on the dynamic urban driving dataset, show that our proposed method can improve the performances of both global place recognition and relocalization tasks. We show that estimating the robust feature correspondence is one of the important factors in place recognition and relocalization.

artificial intelligence, correspondence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.06308

Country: Asia > South Korea (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Leveraging Skill-to-Skill Supervision for Knowledge Tracing

Kim, Hyeondey, Nam, Jinwoo, Lee, Minjae, Jegal, Yun, Song, Kyungwoo

arXiv.org Artificial IntelligenceJun-11-2023

Knowledge tracing plays a pivotal role in intelligent tutoring systems. This task aims to predict the probability of students answering correctly to specific questions. To do so, knowledge tracing systems should trace the knowledge state of the students by utilizing their problem-solving history and knowledge about the problems. Recent advances in knowledge tracing models have enabled better exploitation of problem solving history. However, knowledge about problems has not been studied, as well compared to students' answering histories. Knowledge tracing algorithms that incorporate knowledge directly are important to settings with limited data or cold starts. Therefore, we consider the problem of utilizing skill-to-skill relation to knowledge tracing. In this work, we introduce expert labeled skill-to-skill relationships. Moreover, we also provide novel methods to construct a knowledge-tracing model to leverage human experts' insight regarding relationships between skills. The results of an extensive experimental analysis show that our method outperformed a baseline Transformer model. Furthermore, we found that the extent of our model's superiority was greater in situations with limited data, which allows a smooth cold start of our model.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2306.06841

Genre: Research Report (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Education > Educational Setting > Online (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Encoder-decoder multimodal speaker change detection

Jung, Jee-weon, Seo, Soonshin, Heo, Hee-Soo, Kim, Geonmin, Kim, You Jin, Kwon, Young-ki, Lee, Minjae, Lee, Bong-Jin

arXiv.org Artificial IntelligenceJun-1-2023

The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.0068

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)

Add feedback

PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices

Lee, Minjae, Kim, Hyungmin, Park, Seongmin, Yoon, Minyong, Lee, Janghwan, Choi, Junwon, Kang, Mingu, Choi, Jungwook

arXiv.org Artificial IntelligenceMay-15-2023

PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for high-accuracy 3D object detection. However, most state-of-the-art methods employing PointPillar overlook the inherent sparsity of pillar encoding, missing opportunities for significant computational reduction. In this study, we propose a groundbreaking algorithm-hardware co-design that accelerates sparse convolution processing and maximizes sparsity utilization in pillar-based 3D object detection networks. We investigate sparsification opportunities using an advanced pillar-pruning method, achieving an optimal balance between accuracy and sparsity. We introduce PillarAcc, a state-ofthe-art sparsity support mechanism that enhances sparse pillar convolution through linear complexity input-output mapping generation and conflict-free gather-scatter memory access. Additionally, we propose dataflow optimization techniques, Figure 1: Challenges in PointPillars acceleration and improvements dynamically adjusting the pillar processing schedule by this work: (a) up to three orders of magnitude for optimal hardware utilization under diverse sparsity increase in frames per second at equivalent accuracy by proposed operations. We evaluate PillarAcc on various cutting-edge PillarAcc, (b) degraded sparsity across layers by convolution 3D object detection networks and benchmarks, achieving (conv.) vs. maintained sparsity (this work), (c) significant remarkable speedup and energy savings compared to representative sparsity mapping overhead in conventional system edge platforms, demonstrating record-breaking (conv.) vs. reduced mapping overhead and enhanced computing PointPillars speed of 500FPS with minimal compromise in efficiency (this work).

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.07522

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback