Goto

Collaborating Authors

 kim


NASA astronaut comes home after circling Earth 3,920 times

Popular Science

Jonny Kim returned after eight months aboard the International Space Station. Breakthroughs, discoveries, and DIY tips sent every weekday. After 245 days in orbit aboard the International Space Station (ISS), one NASA astronaut and two cosmonauts have safely returned to Earth. NASA's Jonny Kim along with Roscosmos cosmonauts Sergey Ryzhikov and Alexey Zubritsky landed near the town of Zhezkazgan, Kazakhstan on December 9, and are now undergoing the standard post-mission health screenings. Kim officially became an astronaut in 2017.



Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Neural Information Processing Systems

With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs.Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyperparameter tunings are required.As a cost-effective alternative, learning-free PTQ schemes have been proposed. However, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a significant feature of Transformers.In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency.The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while targeting attention-wise reconstruction to consider the cross-layer dependency.Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.


Simulation-Free Training of Neural ODEs on Paired Data

Neural Information Processing Systems

In this work, we investigate a method for simulation-free training of Neural Ordinary Differential Equations (NODEs) for learning deterministic mappings between paired data. Despite the analogy of NODEs as continuous-depth residual networks, their application in typical supervised learning tasks has not been popular, mainly due to the large number of function evaluations required by ODE solvers and numerical instability in gradient estimation. To alleviate this problem, we employ the flow matching framework for simulation-free training of NODEs, which directly regresses the parameterized dynamics function to a predefined target velocity field. Contrary to generative tasks, however, we show that applying flow matching directly between paired data can often lead to an ill-defined flow that breaks the coupling of the data pairs (e.g., due to crossing trajectories). We propose a simple extension that applies flow matching in the embedding space of data pairs, where the embeddings are learned jointly with the dynamic function to ensure the validity of the flow which is also easier to learn.


ChatGPT and U(X): A Rapid Review on Measuring the User Experience

Seaborn, Katie

arXiv.org Artificial Intelligence

ChatGPT, powered by a large language model (LLM), has revolutionized everyday human-computer interaction (HCI) since its 2022 release. While now used by millions around the world, a coherent pathway for evaluating the user experience (UX) ChatGPT offers remains missing. In this rapid review (N = 58), I explored how ChatGPT UX has been approached quantitatively so far. I focused on the independent variables (IVs) manipulated, the dependent variables (DVs) measured, and the methods used for measurement. Findings reveal trends, gaps, and emerging consensus in UX assessments. This work offers a first step towards synthesizing existing approaches to measuring ChatGPT UX, urgent trajectories to advance standardization and breadth, and two preliminary frameworks aimed at guiding future research and tool development. I seek to elevate the field of ChatGPT UX by empowering researchers and practitioners in optimizing user interactions with ChatGPT and similar LLM-based systems.


I was so freaked out by talking to this AI that I had to leave

PCWorld

Fifteen minutes after "hanging up" with Sesame's new "lifelike" AI, and I'm still freaked out. So-called "conversations" with AI don't do a lot for me, especially where text is concerned. With voice chats, such as the new options for Google Gemini and Microsoft's Copilot, all voice does is save some typing. While Google and Microsoft designed its assistants to be helpful, they're not especially personable -- or sometimes, they're just artificially cheery. Sesame's model, however, is a simple one: "We believe in a future where computers are lifelike," according to the company's mission statement.


PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

He, Yintao, Mao, Haiyu, Giannoula, Christina, Sadrosadati, Mohammad, Gómez-Luna, Juan, Li, Huawei, Li, Xiaowei, Wang, Ying, Mutlu, Onur

arXiv.org Artificial Intelligence

Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. In this paper, we aim to accelerate LLM decoding while considering the dynamically changing characteristics of the kernels involved. We propose PAPI (PArallel Decoding with PIM), a PIM-enabled heterogeneous architecture that exploits dynamic scheduling of compute-bound or memory-bound kernels to suitable hardware units. PAPI has two key mechanisms: (1) online kernel characterization to dynamically schedule kernels to the most suitable hardware units at runtime and (2) a PIM-enabled heterogeneous computing system that harmoniously orchestrates both computation-centric processing units and hybrid PIM units with different computing capabilities. Our experimental results on three broadly-used LLMs show that PAPI achieves 1.8$\times$ and 11.1$\times$ speedups over a state-of-the-art heterogeneous LLM accelerator and a state-of-the-art PIM-only LLM accelerator, respectively.


Trustworthy AI on Safety, Bias, and Privacy: A Survey

Fang, Xingli, Li, Jianwei, Mulchandani, Varun, Kim, Jung-Eun

arXiv.org Artificial Intelligence

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.


Sample-Efficient Behavior Cloning Using General Domain Knowledge

Zhu, Feiyu, Oh, Jean, Simmons, Reid

arXiv.org Artificial Intelligence

Behavior cloning has shown success in many sequential decision-making tasks by learning from expert demonstrations, yet they can be very sample inefficient and fail to generalize to unseen scenarios. One approach to these problems is to introduce general domain knowledge, such that the policy can focus on the essential features and may generalize to unseen states by applying that knowledge. Although this knowledge is easy to acquire from the experts, it is hard to be combined with learning from individual examples due to the lack of semantic structure in neural networks and the time-consuming nature of feature engineering. To enable learning from both general knowledge and specific demonstration trajectories, we use a large language model's coding capability to instantiate a policy structure based on expert domain knowledge expressed in natural language and tune the parameters in the policy with demonstrations. We name this approach the Knowledge Informed Model (KIM) as the structure reflects the semantics of expert knowledge. In our experiments with lunar lander and car racing tasks, our approach learns to solve the tasks with as few as 5 demonstrations and is robust to action noise, outperforming the baseline model without domain knowledge. This indicates that with the help of large language models, we can incorporate domain knowledge into the structure of the policy, increasing sample efficiency for behavior cloning.


DNA 1.0 Technical Report

Lee, Jungyup, Kim, Jemin, Park, Sang, Lee, SeungJae

arXiv.org Artificial Intelligence

In this report, we present DNA 1.0 8B Instruct, a state-of-the-art bilingual language model optimized for Korean and English language tasks. By applying continual pre-training (CPT) with high-quality Korean datasets to Llama 3.1 8B and subsequent supervised fine-tuning (SFT), we create an instruction-following model with enhanced Korean language capabilities. This model is then merged with Llama 3.1 8B Instruct via spherical linear interpolation (SLERP) and undergoes further optimization through direct preference optimization (DPO) and knowledge distillation (KD). DNA 1.0 8B Instruct achieves state-of-the-art results on Korean-specific tasks, including KMMLU (53.26%), KoBEST (83.40%), and BELEBELE (57.99%), while maintaining strong English capabilities on MMLU (66.64%), MMLU-Pro (43.05%) and GSM8K (80.52%). As an open model, DNA 1.0 8B Instruct represents a significant advancement in bilingual language modeling. As an open model, DNA 1.0 8B Instruct is freely available through https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct . For commercial licensing inquiries or feedback, please contact us at https://www.dnotitia.com/contact/post-form