Goto

Collaborating Authors

 tang


AReinforcement Learning-based Bidding Strategy for Data Consumers in Auction-based Federated Learning

Neural Information Processing Systems

A major challenge in AFL pertains to how DCs select and bid for DOs. Existing methods are generally static, making them ill-suited for dynamic AFL markets. To address this issue, we propose the Reinforcement Learning-based Bidding Strategy for DCs in Auction-based Federated Learning (RLB-AFL). We incorporate historical states into a Deep Q-Network to capture sequential information critical for bidding decisions. To mitigate state space sparsity, where specific states rarely reoccur for each DC during auctions, we incorporate the Gaussian Mixture Model into RLB-AFL.


CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning

Neural Information Processing Systems

Floorplanning is the initial step in the physical design process of Electronic Design Automation (EDA), directly influencing subsequent placement, routing, and final power of the chip. However, the solution space in floorplanning is vast, and current algorithms often struggle to explore it sufficiently, making them prone to getting trapped in local optima.


Twilight: Adaptive Attention Sparsity with Hierarchical Top- p Pruning

Neural Information Processing Systems

Leveraging attention sparsity to accelerate long-context large language models (LLMs) has been of great importance recently. However, most existing sparse attention algorithms use a fixed budget of how many tokens to use in their computations. This simple static decision raises critical issues in real-world deployment because it fails to account for the dynamic nature of real-world scenarios, where the optimal balance between accuracy and efficiency can vary greatly. In this paper, we reveal a key insight that leveraging the idea of top-$p$ sampling (a.k.a., nucleus sampling) in sparse attention could enable efficient and adaptive budget decisions. Based on this, we propose Twilight, a framework that enhances any existing sparse attention algorithm with adaptive budget decision capabilities without sacrificing accuracy. Empirical results show that Twilight can adaptively prune up to 98% tokens with nearly no accuracy loss in both mid-and long-context scenarios, leading to a $1.4\times$ speedup over state-of-the-art sparse attention mechanisms.


OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Neural Information Processing Systems

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks ($4\times$ more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios ($31$ diverse scenarios), and thorough evaluation metrics, with $10,000$ human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with $1,500$ manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below $50$ ($100$ in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning.


Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity

Neural Information Processing Systems

Most existing multi-view clustering methods aim to generate a consensus partition across all views, based on the assumption that all views share the same sample arrangement. However, in real-world scenarios, the collected data across different views is often unsynchronized, making it difficult to ensure consistent sample correspondence between views. To address this issue, we propose a scalable sample-alignment-based multi-view clustering method, referred to as SSA-MVC. Specifically, we first employ a cluster-label matching (CLM) algorithm to select the view whose clustering labels best match those of the others as the benchmark view. Then, for each of the remaining views, we construct representations of non-aligned samples by computing their similarities with aligned samples. Based on these representations, we build a similarity graph between the non-aligned samples of each view and those in the benchmark view, which serves as the alignment criterion. This alignment criterion is then integrated into a late-fusion framework to enable clustering without requiring aligned samples. Notably, the learned sample alignment matrix can be used to enhance existing multi-view clustering methods in scenarios where sample correspondence is unavailable. The effectiveness of the proposed SSA-MVC algorithm is validated through extensive experiments conducted on eight real-world multi-view datasets.


How Chinese short dramas became AI content machines

MIT Technology Review

The viral short dramas are increasingly being created entirely with AI, with hundreds of new shows spun up each day. In a dimly lit bedroom, a frightened young woman is thrown onto a bed by a tall, muscular man. He grabs her hand, and flame-like vines crawl across her body, fusing with her flesh. A dragon-shaped tattoo appears across her chest. "Two months," the man says. "Give me an heir, or I will eat you."



Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

Neural Information Processing Systems

Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models (LLMs) while maintaining an identical sampling distribution. However, the conventional approach of training separate draft model to achieve a satisfactory token acceptance rate can be costly and impractical. In this paper, we propose a novel self-speculative decoding framework \emph{Kangaroo} with \emph{double} early exiting strategy, which leverages the shallow sub-network and the \texttt{LM Head} of the well-trained target LLM to construct a self-drafting model. Then, the self-verification stage only requires computing the remaining layers over the \emph{early-exited} hidden states in parallel. To bridge the representation gap between the sub-network and the full model, we train a lightweight and efficient adapter module on top of the sub-network.


d71a4a6c796cacd9b8a298589943cdf3-Supplemental-Conference.pdf

Neural Information Processing Systems

The codes related todataset, model, loss, training pipeline and experiment areenclosed. Cross-Domain MAFLAFLWMAFLWR 300W Supervised learning TCDCN[13] XX 7.95 7.65 - 5.54 MTCNN[12] XX 5.39 6.90 - WingLoss[3] XX - - - 4.04 Generative modeling based DeformingAE[9] OX 5.45 - - ImGen.[4] After the initialization period, the intra pseudo-paired dataxd1)d1, xd2)d2 and inter pseudo-paired dataxd1)d2,xd2)d1 aregenerated with latent space exploration described atSection 3.2. Atlastsemanticmatchingloss LM are utilized to get intra semantic matching lossLM1 and inter semantic matching lossLM2. We provide more examples of pseudo-paired data on various combinations of original and pair domainsinFig.3.