AITopics | choi

Collaborating Authors

choi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QA: What's the Answer Right Now?

Neural Information Processing SystemsFeb-16-2026, 01:58:42 GMT

We build strong baseline models upon large pretrained language models, including GPT -3 and T5. Our benchmark is an ongoing effort, and this paper presents real-time evaluation results over the past year.

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom (0.46)
North America > United States > California (0.14)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report (0.93)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.95)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)

Add feedback

79ec2a4246feb2126ecf43c4a4418002-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 11:20:11 GMT

Weformulate the decoding process asanoptimization problem which allows for multiple attributesweaimtocontrol tobeeasilyincorporated asdifferentiable constraints to the optimization. By relaxing this discrete optimization to a continuous one, we make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport

Neural Information Processing SystemsDec-26-2025, 06:37:45 GMT

Optimal Transport (OT) problem investigates a transport map that bridges two distributions while minimizing a given cost function. In this regard, OT between tractable prior distribution and data has been utilized for generative modeling tasks. However, OT-based methods are susceptible to outliers and face optimization challenges during training. In this paper, we propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT).

generative modeling, semi-dual formulation, unbalanced optimal transport, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

Meta-Learning with Adaptive Hyperparameters

Neural Information Processing SystemsDec-24-2025, 20:51:52 GMT

Despite its popularity, several recent works question the effectiveness of MAML when test tasks are different from training tasks, thus suggesting various task-conditioned methodology to improve the initialization. Instead of searching for better task-aware initialization, we focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation). Consequently, we propose a new weight update rule that greatly enhances the fast adaptation process. Specifically, we introduce a small meta-network that can adaptively generate per-step hyperparameters: learning rate and weight decay coefficients. The experimental results validate that the Adaptive Learning of hyperparameters for Fast Adaptation (ALFA) is the equally important ingredient that was often neglected in the recent few-shot learning approaches. Surprisingly, fast adaptation from random initialization with ALFA can already outperform MAML.

adaptive hyperparameter, meta-learning, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Neural Information Processing SystemsDec-24-2025, 06:48:41 GMT

Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides 15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models.

efficient attention-guided token-level data augmentation, name change, tokenmixup, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

PMA-Diffusion: A Physics-guided Mask-Aware Diffusion Framework for TSE from Sparse Observations

Liu, Lindong, Jin, Zhixiong, Choi, Seongjin

arXiv.org Artificial IntelligenceDec-9-2025

High-resolution highway traffic state information is essential for Intelligent Transportation Systems, but typical traffic data acquired from loop detectors and probe vehicles are often too sparse and noisy to capture the detailed dynamics of traffic flow. We propose PMA-Diffusion, a physics-guided mask-aware diffusion framework that reconstructs unobserved highway speed fields from sparse, incomplete observations. Our approach trains a diffusion prior directly on sparsely observed speed fields using two mask-aware training strategies: Single-Mask and Double-Mask. At the inference phase, the physics-guided posterior sampler alternates reverse-diffusion updates, observation projection, and physics-guided projection based on adaptive anisotropic smoothing to reconstruct the missing speed fields. The proposed framework is tested on the I-24 MOTION dataset with varying visibility ratios. Even under severe sparsity, with only 5% visibility, PMA-Diffusion outperforms other baselines across three reconstruction error metrics. Furthermore, PMA-diffusion trained with sparse observation nearly matches the performance of the baseline model trained on fully observed speed fields. The results indicate that combining mask-aware diffusion priors with a physics-guided posterior sampler provides a reliable and flexible solution for traffic state estimation under realistic sensing sparsity.

data mining, data quality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.06183

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting

Cheong, Christopher, Davis, Gary, Choi, Seongjin

arXiv.org Artificial IntelligenceDec-1-2025

Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.08888

Country:

Europe (0.92)
North America > United States > California (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:

Overview (0.92)
Research Report > New Finding (0.45)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

QA: What's the Answer Right Now? Jungo Kasai Keisuke Sakaguchi Y oichi T akahashi Ronan Le Bras

Neural Information Processing SystemsOct-9-2025, 02:20:55 GMT

large language model, machine learning, question answering, (21 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom (0.46)
North America > United States > California (0.14)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report (0.93)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.95)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)

Add feedback

Consistent Kernel Change-Point Detection under m-Dependence for Text Segmentation

Diaz-Rodriguez, Jairo, Jia, Mumin

arXiv.org Machine LearningOct-7-2025

Kernel change-point detection (KCPD) has become a widely used tool for identifying structural changes in complex data. While existing theory establishes consistency under independence assumptions, real-world sequential data such as text exhibits strong dependencies. We establish new guarantees for KCPD under $m$-dependent data: specifically, we prove consistency in the number of detected change points and weak consistency in their locations under mild additional assumptions. We perform an LLM-based simulation that generates synthetic $m$-dependent text to validate the asymptotics. To complement these results, we present the first comprehensive empirical study of KCPD for text segmentation with modern embeddings. Across diverse text datasets, KCPD with text embeddings outperforms baselines in standard text segmentation metrics. We demonstrate through a case study on Taylor Swift's tweets that KCPD not only provides strong theoretical and simulated reliability but also practical effectiveness for text segmentation tasks.

kcpd, segmentation, text segmentation, (16 more...)

arXiv.org Machine Learning

2510.03437

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
(2 more...)

Add feedback

Expanding Foundational Language Capabilities in Open-Source LLMs through a Korean Case Study

Lim, Junghwan, Jo, Gangwon, Lee, Sungmin, Park, Jiyoung, Kim, Dongseok, Kim, Jihwan, Lee, Junhyeok, Cheung, Wai Ting, Choi, Dahye, Choi, Kibong, Huh, Jaeyeon, Kim, Beomgyu, Kim, Jangwoong, Kim, Taehyun, Lee, Haesol, Lee, Jeesoo, Oh, Dongpin, Song, Changseok, Suh, Daewon

arXiv.org Artificial IntelligenceSep-5-2025

We introduce Llama-3-Motif, a language model consisting of 102 billion parameters, specifically designed to enhance Korean capabilities while retaining strong performance in English. Developed on the Llama 3 architecture, Llama-3-Motif employs advanced training techniques, including LlamaPro and Masked Structure Growth, to effectively scale the model without altering its core Transformer architecture. Using the MoAI platform for efficient training across hyperscale GPU clusters, we optimized Llama-3-Motif using a carefully curated dataset that maintains a balanced ratio of Korean and English data. Llama-3-Motif shows decent performance on Korean-specific benchmarks, outperforming existing models and achieving results comparable to GPT-4.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.03972

Genre: Research Report (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback