Goto

Collaborating Authors

 kim


PLLay: Efficient Topological Layer based on Persistent Landscapes

Neural Information Processing Systems

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure. In this work, we show differentiability with respect to layer inputs, for a general persistent homology with arbitrary filtration. Thus, our proposed layer can be placed anywhere in the network and feed critical information on the topological features of input data into subsequent layers to improve the learnability of the networks toward a given task. A task-optimal structure of PLLay is learned during training via backpropagation, without requiring any input featurization or data preprocessing. We provide a novel adaptation for the DTM function-based filtration, and show that the proposed layer is robust against noise and outliers through a stability analysis. We demonstrate the effectiveness of our approach by classification experiments on various datasets.


Leveraging Early-Stage Robustness in Diffusion Models for Efficient and High-Quality Image Synthesis

Neural Information Processing Systems

While diffusion models have demonstrated exceptional image generation capabilities, the iterative noise estimation process required for these models is compute-intensive and their practical implementation is limited by slow sampling speeds. In this paper, we propose a novel approach to speed up the noise estimation network by leveraging the robustness of early-stage diffusion models. Our findings indicate that inaccurate computation during the early-stage of the reverse diffusion process has minimal impact on the quality of generated images, as this stage primarily outlines the image while later stages handle the finer details that require more sensitive information. To improve computational efficiency, we combine our findings with post-training quantization (PTQ) to introduce a method that utilizes low-bit activation for the early reverse diffusion process while maintaining high-bit activation for the later stages. Experimental results show that the proposed method can accelerate the early-stage computation without sacrificing the quality of the generated images.


UniCLIP: Unified Framework for Contrastive Language-Image Pre-training

Neural Information Processing Systems

Pre-training vision-language models with contrastive objectives has shown promising results that are both scalable to large uncurated datasets and transferable to many downstream applications. Some following works have targeted to improve data efficiency by adding self-supervision terms, but inter-domain (image-text) contrastive loss and intra-domain (image-image) contrastive loss are defined on individual spaces in those works, so many feasible combinations of supervision are overlooked. To overcome this issue, we propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training.


Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

Neural Information Processing Systems

Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems. We have verified the inclusion of prosody in speech tokens that predominantly contain semantic information and have used this foundation to construct a prosody-infused speech-text model. Additionally, we propose a generalized speech-text pretraining scheme that enhances the capture of cross-modal semantics.


PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System

He, Yintao, Mao, Haiyu, Giannoula, Christina, Sadrosadati, Mohammad, Gómez-Luna, Juan, Li, Huawei, Li, Xiaowei, Wang, Ying, Mutlu, Onur

arXiv.org Artificial Intelligence

Large language models (LLMs) are widely used for natural language understanding and text generation. An LLM model relies on a time-consuming step called LLM decoding to generate output tokens. Several prior works focus on improving the performance of LLM decoding using parallelism techniques, such as batching and speculative decoding. State-of-the-art LLM decoding has both compute-bound and memory-bound kernels. Some prior works statically identify and map these different kernels to a heterogeneous architecture consisting of both processing-in-memory (PIM) units and computation-centric accelerators. We observe that characteristics of LLM decoding kernels (e.g., whether or not a kernel is memory-bound) can change dynamically due to parameter changes to meet user and/or system demands, making (1) static kernel mapping to PIM units and computation-centric accelerators suboptimal, and (2) one-size-fits-all approach of designing PIM units inefficient due to a large degree of heterogeneity even in memory-bound kernels. In this paper, we aim to accelerate LLM decoding while considering the dynamically changing characteristics of the kernels involved. We propose PAPI (PArallel Decoding with PIM), a PIM-enabled heterogeneous architecture that exploits dynamic scheduling of compute-bound or memory-bound kernels to suitable hardware units. PAPI has two key mechanisms: (1) online kernel characterization to dynamically schedule kernels to the most suitable hardware units at runtime and (2) a PIM-enabled heterogeneous computing system that harmoniously orchestrates both computation-centric processing units and hybrid PIM units with different computing capabilities. Our experimental results on three broadly-used LLMs show that PAPI achieves 1.8$\times$ and 11.1$\times$ speedups over a state-of-the-art heterogeneous LLM accelerator and a state-of-the-art PIM-only LLM accelerator, respectively.


Reviews: Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections

Neural Information Processing Systems

The proposed method is very related to Kim et al.'s work, but the later one is not mentioned at all. At the time of submission, the arXiv version of Kim's paper was already online, which should be cited and discussed. In Kim et al.'s work, the model is with multiple flat convolution layers. The input is connected with the output to form a residual learning. Compared with Kim's model, half of convolutional layers are replaced with de-convolution layers in the proposed model and it has more skip connections between the layers.


Art of the deal: South Korean millennials swap stocks for art

Al Jazeera

Incheon, South Korea – Kim, 35, did not explicitly object when his wife started investing in art three years ago – but he had his reservations. "I told her that I'm fine as long as you want it," the video game designer, who asked to be identified by his last name only, told Al Jazeera. "But I was secretly thinking, why not just invest that money into stocks or something?" But as time passed, Kim began to appreciate how art could offer an escape from the COVID-19 pandemic and the monotony of work. Last year, he joined her in collecting fine art.


Harnessing machine learning to analyze quantum material

#artificialintelligence

Electrons and their behavior pose fascinating questions for quantum physicists, and recent innovations in sources, instruments and facilities allow researchers to potentially access even more of the information encoded in quantum materials. However, these research innovations are producing unprecedented--and until now, indecipherable--volumes of data. "The information content in a piece of material can quickly exceed the total information content in the Library of Congress, which is about 20 terabytes," said Eun-Ah Kim, professor of physics in the College of Arts and Sciences, who is at the forefront of both quantum materials research and harnessing the power of machine learning to analyze data from quantum material experiments. "The limited capacity of the traditional mode of analysis--largely manual--is quickly becoming the critical bottleneck," Kim said. A group led by Kim has successfully used a machine learning technique developed with Cornell computer scientists to analyze massive amounts of data from the quantum metal Cd2Re2O7, settling a debate about this particular material and setting the stage for future machine learning aided insight into new phases of mater.


Engineers build LEGO-like artificial intelligence chip

#artificialintelligence

Imagine a more sustainable future, where cellphones, smartwatches, and other wearable devices don't have to be shelved or discarded for a newer model. Instead, they could be upgraded with the latest sensors and processors that would snap onto a device's internal chip -- like LEGO bricks incorporated into an existing build. Such reconfigurable chipware could keep devices up to date while reducing our electronic waste. Now MIT engineers have taken a step toward that modular vision with a LEGO-like design for a stackable, reconfigurable artificial intelligence chip. The design comprises alternating layers of sensing and processing elements, along with light-emitting diodes (LED) that allow for the chip's layers to communicate optically.


Engineers build artificial intelligence chip: The new design is stackable and reconfigurable, for swapping out and building on existing sensors and neural network processors

#artificialintelligence

Now MIT engineers have taken a step toward that modular vision with a LEGO-like design for a stackable, reconfigurable artificial intelligence chip. The design comprises alternating layers of sensing and processing elements, along with light-emitting diodes (LED) that allow for the chip's layers to communicate optically. Other modular chip designs employ conventional wiring to relay signals between layers. Such intricate connections are difficult if not impossible to sever and rewire, making such stackable designs not reconfigurable. The MIT design uses light, rather than physical wires, to transmit information through the chip.