Goto

Collaborating Authors

 dept


Decentralized sketching of low rank matrices

Rakshith Sharma Srinivasa, Kiryung Lee, Marius Junge, Justin Romberg

Neural Information Processing Systems

A fundamental structural model for data is that the data points lie close to an unknown subspace, meaning that the matrix created by concatenating the data vectors has low rank. We address a particular low-rank matrix recovery problem where we wish to recover a set of vectors from a low-dimensional subspace after they have been individually compressed (or "sketched").


DelayedPropagationTransformer: AUniversalComputationEnginetowardsPractical ControlinCyber-PhysicalSystems

Neural Information Processing Systems

DePT induces a cone-shaped spatial-temporal attention prior,which injects theinformation propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS - network-scale traffic signal control system in the open world - show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets.



Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems

Neural Information Processing Systems

Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformer-based model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS -- network-scale traffic signal control system in the open world -- show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets.



A Novel Evaluation Benchmark for Medical LLMs: Illuminating Safety and Effectiveness in Clinical Domains

Wang, Shirui, Tang, Zhihui, Yang, Huaxia, Gong, Qiuhong, Gu, Tiantian, Ma, Hongyang, Wang, Yongxin, Sun, Wubin, Lian, Zeliang, Mao, Kehang, Jiang, Yinan, Huang, Zhicheng, Ma, Lingyun, Shen, Wenjie, Ji, Yajie, Tan, Yunhui, Wang, Chunbo, Gao, Yunlu, Ye, Qianling, Lin, Rui, Chen, Mingyu, Niu, Lijuan, Wang, Zhihao, Yu, Peng, Lang, Mengran, Liu, Yue, Zhang, Huimin, Shen, Haitao, Chen, Long, Zhao, Qiguang, Liu, Si-Xuan, Zhou, Lina, Gao, Hua, Ye, Dongqiang, Meng, Lingmin, Yu, Youtao, Liang, Naixin, Wu, Jianxiong

arXiv.org Artificial Intelligence

Large language models (LLMs) hold promise in clinical decision support but face major challenges in safety evaluation and effectiveness validation. We developed the Clinical Safety-Effectiveness Dual-Track Benchmark (CSEDB), a multidimensional framework built on clinical expert consensus, encompassing 30 criteria covering critical areas like critical illness recognition, guideline adherence, and medication safety, with weighted consequence measures. Thirty-two specialist physicians developed and reviewed 2,069 open-ended Q&A items aligned with these criteria, spanning 26 clinical departments to simulate real-world scenarios. Benchmark testing of six LLMs revealed moderate overall performance (average total score 57.2%, safety 54.7%, effectiveness 62.3%), with a significant 13.3% performance drop in high-risk scenarios (p < 0.0001). Domain-specific medical LLMs showed consistent performance advantages over general-purpose models, with relatively higher top scores in safety (0.912) and effectiveness (0.861). The findings of this study not only provide a standardized metric for evaluating the clinical application of medical LLMs, facilitating comparative analyses, risk exposure identification, and improvement directions across different scenarios, but also hold the potential to promote safer and more effective deployment of large language models in healthcare environments.


SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

Li, Yuqi, Zheng, Yuanzhong, Guo, Zhongtian, Wang, Yaoxuan, Yin, Jianjun, Fei, Haojun

arXiv.org Artificial Intelligence

--This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2V ec2 for feature extraction [1] and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and emphasizing the need for stronger defenses, benchmarked against the ICASSP 2025 Attacker Challenge [2]. This paper introduces SpecWav-Attack, a tailored adversarial model for attacking anonymized speech with a focus on Effective Equal Error Rate (EER). Using the ECAP A-TDNN architecture [3], we integrate the Wav2V ec2 self-supervised model [1] to enrich speech representations, enhancing sensitivity to variations in anonymized data.


DEPT: Decoupled Embeddings for Pre-training Language Models

Iacob, Alex, Sani, Lorenzo, Kurmanji, Meghdad, Shen, William F., Qiu, Xinchi, Cai, Dongqi, Gao, Yan, Lane, Nicholas D.

arXiv.org Artificial Intelligence

Language model pre-training benefits from diverse data to enhance performance across domains and languages. However, training on such heterogeneous corpora requires extensive and costly efforts. Since these data sources vary lexically, syntactically, and semantically, they cause negative interference or the ``curse of multilinguality''. We propose a novel pre-training framework to alleviate this curse. Our method, DEPT, decouples embeddings from the transformer body while simultaneously training the latter in multiple contexts. DEPT enables training without a shared global vocabulary and: (1) can train robustly and effectively under significant data heterogeneity, (2) reduces token embedding parameters by up to 80% and the communication costs by 675x for billion-scale models, (3) enhances model generalization and plasticity in adapting to new languages and domains, and (4) permits training with custom optimized vocabularies per data source. We demonstrate DEPT's potential via the first vocabulary-agnostic federated multilingual pre-training of a 1.3 billion-parameter model, limiting its embedding size to 102.4 million instead of 512 million.


Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems

Neural Information Processing Systems

Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformer-based model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS -- network-scale traffic signal control system in the open world -- show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets.


Gaussian Processes for Monitoring Air-Quality in Kampala

Stoddart, Clara, Shrack, Lauren, Sserunjogi, Richard, Abdul-Ganiy, Usman, Bainomugisha, Engineer, Okure, Deo, Misener, Ruth, Folch, Jose Pablo, Sedgwick, Ruby

arXiv.org Machine Learning

Monitoring air pollution is of vital importance to the overall health of the population. Unfortunately, devices that can measure air quality can be expensive, and many cities in low and middle-income countries have to rely on a sparse allocation of them. In this paper, we investigate the use of Gaussian Processes for both nowcasting the current air-pollution in places where there are no sensors and forecasting the air-pollution in the future at the sensor locations. In particular, we focus on the city of Kampala in Uganda, using data from AirQo's network of sensors. We demonstrate the advantage of removing outliers, compare different kernel functions and additional inputs. We also compare two sparse approximations to allow for the large amounts of temporal data in the dataset.