Goto

Collaborating Authors

 hong


AI Models Get Brain Rot, Too

WIRED

A new study shows that feeding large language models low-quality, high-engagement content from social media lowers their cognitive abilities. AI models may be a bit like humans, after all. A new study from the University of Texas at Austin, Texas A&M, and Purdue University shows that large language models fed a diet of popular but low-quality social media content experience a kind of "brain rot" that may be familiar to anyone who has spent too long doomscrolling on X or TikTok. We live in an age where information grows faster than attention spans--and much of it is engineered to capture clicks, not convey truth or depth," says Junyuan Hong, an incoming assistant professor at the National University of Singapore who worked on the study as a graduate student at UT Austin. "We wondered: What happens when AIs are trained on the same stuff?"


MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations

Zhang, Deyun, Lan, Xiang, Geng, Shijia, Zhao, Qinghao, Fan, Sumei, Feng, Mengling, Hong, Shenda

arXiv.org Artificial Intelligence

Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders. While machine learning has achieved expert-level performance in ECG interpretation, the development of clinically deployable multimodal AI systems remains constrained, primarily due to the lack of publicly available datasets that simultaneously incorporate raw signals, diagnostic images, and interpretation text. Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings. To address this gap, we introduce MEETI (MIMIC-IV-Ext ECG-Text-Image), the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models. In addition, MEETI includes beat-level quantitative ECG parameters extracted from each lead, offering structured parameters that support fine-grained analysis and model interpretability. Each MEETI record is aligned across four components: (1) the raw ECG waveform, (2) the corresponding plotted image, (3) extracted feature parameters, and (4) detailed interpretation text. This alignment is achieved using consistent, unique identifiers. This unified structure supports transformer-based multimodal learning and supports fine-grained, interpretable reasoning about cardiac health. By bridging the gap between traditional signal analysis, image-based interpretation, and language-driven understanding, MEETI established a robust foundation for the next generation of explainable, multimodal cardiovascular AI. It offers the research community a comprehensive benchmark for developing and evaluating ECG-based AI systems.


ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Chen, Kai, Gao, Ruiyuan, Hong, Lanqing, Xu, Hang, Jia, Xu, Caesar, Holger, Dai, Dengxin, Liu, Bingbing, Tsishkou, Dzmitry, Xu, Songcen, Xu, Chunjing, Xu, Qiang, Lu, Huchuan, Yeung, Dit-Yan

arXiv.org Artificial Intelligence

In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and hold a dual-track challenge, including both corner case scene understanding and generation. As the pioneering effort, we will continuously bridge the gap between frontier autonomous driving techniques and fully intelligent, reliable self-driving agents robust towards corner cases.


Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence

Qiao, Yu, Le, Huy Q., Raha, Avi Deb, Tran, Phuong-Nam, Adhikary, Apurba, Zhang, Mengchun, Nguyen, Loc X., Huh, Eui-Nam, Niyato, Dusit, Hong, Choong Seon

arXiv.org Artificial Intelligence

The rise of large language models (LLMs), such as ChatGPT, DeepSeek, and Grok-3, has reshaped the artificial intelligence landscape. As prominent examples of foundational models (FMs) built on LLMs, these models exhibit remarkable capabilities in generating human-like content, bringing us closer to achieving artificial general intelligence (AGI). However, their large-scale nature, sensitivity to privacy concerns, and substantial computational demands present significant challenges to personalized customization for end users. To bridge this gap, this paper presents the vision of artificial personalized intelligence (API), focusing on adapting these powerful models to meet the specific needs and preferences of users while maintaining privacy and efficiency. Specifically, this paper proposes personalized federated intelligence (PFI), which integrates the privacy-preserving advantages of federated learning (FL) with the zero-shot generalization capabilities of FMs, enabling personalized, efficient, and privacy-protective deployment at the edge. We first review recent advances in both FL and FMs, and discuss the potential of leveraging FMs to enhance federated systems. We then present the key motivations behind realizing PFI and explore promising opportunities in this space, including efficient PFI, trustworthy PFI, and PFI empowered by retrieval-augmented generation (RAG). Finally, we outline key challenges and future research directions for deploying FM-powered FL systems at the edge with improved personalization, computational efficiency, and privacy guarantees. Overall, this survey aims to lay the groundwork for the development of API as a complement to AGI, with a particular focus on PFI as a key enabling technique.


DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey

Qiao, Yu, Tran, Phuong-Nam, Yoon, Ji Su, Nguyen, Loc X., Hong, Choong Seon

arXiv.org Artificial Intelligence

Reinforcement learning (RL)-based large language models (LLMs), such as ChatGPT, DeepSeek, and Grok-3, have gained significant attention for their exceptional capabilities in natural language processing and multimodal data understanding. Meanwhile, the rapid expansion of information services has driven the growing need for intelligence, efficient, and adaptable wireless networks. Wireless networks require the empowerment of RL-based LLMs while these models also benefit from wireless networks to broaden their application scenarios. Specifically, RL-based LLMs can enhance wireless communication systems through intelligent resource allocation, adaptive network optimization, and real-time decision-making. Conversely, wireless networks provide a vital infrastructure for the efficient training, deployment, and distributed inference of RL-based LLMs, especially in decentralized and edge computing environments. This mutual empowerment highlights the need for a deeper exploration of the interplay between these two domains. We first review recent advancements in wireless communications, highlighting the associated challenges and potential solutions. We then discuss the progress of RL-based LLMs, focusing on key technologies for LLM training, challenges, and potential solutions. Subsequently, we explore the mutual empowerment between these two fields, highlighting key motivations, open challenges, and potential solutions. Finally, we provide insights into future directions, applications, and their societal impact to further explore this intersection, paving the way for next-generation intelligent communication systems. Overall, this survey provides a comprehensive overview of the relationship between RL-based LLMs and wireless networks, offering a vision where these domains empower each other to drive innovations.


HOPS: High-order Polynomials with Self-supervised Dimension Reduction for Load Forecasting

Song, Pengyang, Feng, Han, Shukla, Shreyashi, Wang, Jue, Hong, Tao

arXiv.org Artificial Intelligence

Load forecasting is a fundamental task in smart grid. Many techniques have been applied to developing load forecasting models. Due to the challenges such as the Curse of Dimensionality, overfitting, and limited computing resources, multivariate higher-order polynomial models have received limited attention in load forecasting, despite their desirable mathematical foundations and optimization properties. In this paper, we propose low rank approximation and self-supervised dimension reduction to address the aforementioned issues. To further improve computational efficiency, we also introduce a fast Conjugate Gradient based algorithm for the proposed polynomial models. Based on the ISO New England dataset used in Global Energy Forecasting Competition 2017, the proposed method high-order polynomials with self-supervised dimension reduction (HOPS) demonstrates higher forecasting accuracy over several competitive models. Additionally, experimental results indicate that our approach alleviates redundant variable construction, achieving better forecasts with fewer input variables.


Cycloidal Quasi-Direct Drive Actuator Designs with Learning-based Torque Estimation for Legged Robotics

Zhu, Alvin, Tanaka, Yusuke, Rafeedi, Fadi, Hong, Dennis

arXiv.org Artificial Intelligence

Abstract-- This paper presents a novel approach through the design and implementation of Cycloidal Quasi-Direct Drive actuators for legged robotics. The cycloidal gear mechanism, with its inherent high torque density and mechanical robustness, offers significant advantages over conventional designs. Additionally, we develop a torque estimation framework for the actuator using an Actuator Network, which effectively reduces the sim-toreal gap introduced by the cycloidal drive's complex dynamics. However, integrating the gearbox into a confined space with conventional planetary, spur gear, This paper presents a QDD actuator with a 10:1 cycloidal and belt drive mechanisms is challenging without sacrificing gearbox for legged robots. We also present a gated recurrent gear load capacity since they are less resilient to significant unit (GRU) based torque estimation framework to model impulse load, such as those experienced during a fall.


The dye in Doritos can make mice transparent

Popular Science

X-Ray specs and invisibility cloaks are the stuff of sci-fi and fantasy, but sometimes science is just stranger than fiction. A food dye that helps give certain sodas and snacks their hallmark orange hue renders mouse skin almost completely see-through in a reversible, potentially non-toxic research method that could transform medical and scientific imaging. Because of a counterintuitive fundamental physics principle, Tartrazine, also known as Yellow 5, can temporarily turn biological tissue transparent to the naked eye, as described in a study published September 5 in the journal Science. So far, the scientists behind the new discovery have used the method to see the organs in a mouse's intact abdomen, glimpse the pulsing vessels surrounding a rodent skull, and to get an exceptionally clear view of muscle tissue through a microscope. With further safety and efficacy research, the method may spur new scientific findings, boost microscopy advances, and improve medical diagnostic strategies and treatments.


Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

Kim, Ji-Hoon, Kim, Jaehun, Chung, Joon Son

arXiv.org Artificial Intelligence

The goal of this work is to reconstruct high quality speech from lip motions alone, a task also known as lip-to-speech. A key challenge of lip-to-speech systems is the one-to-many mapping caused by (1) the existence of homophenes and (2) multiple speech variations, resulting in a mispronounced and over-smoothed speech. In this paper, we propose a novel lip-to-speech system that significantly improves the generation quality by alleviating the one-to-many mapping problem from multiple perspectives. Specifically, we incorporate (1) self-supervised speech representations to disambiguate homophenes, and (2) acoustic variance information to model diverse speech styles. Additionally, to better solve the aforementioned problem, we employ a flow based post-net which captures and refines the details of the generated speech. We perform extensive experiments on two datasets, and demonstrate that our method achieves the generation quality close to that of real human utterance, outperforming existing methods in terms of speech naturalness and intelligibility by a large margin. Synthesised samples are available at our demo page: https://mm.kaist.ac.kr/projects/LTBS.


Task Planning for Multiple Item Insertion using ADMM

Zheng, Gavin

arXiv.org Artificial Intelligence

Mixed-integer nonlinear programmings (MINLPs) are powerful formulation tools for task planning. However, it suffers from long solving time especially for large scale problems. In this work, we first formulate the task planning problem for item stowing into a mixed-integer nonlinear programming problem, then solve it using Alternative Direction Method of Multipliers (ADMM). ADMM separates the complete formulation into a nonlinear programming problem and mixed-integer programming problem, then iterate between them to solve the original problem. We show that our ADMM converges better than non-warm-started nonlinear complementary formulation. Our proposed methods are demonstrated on hardware as a high level planner to insert books into the bookshelf.