Zhang, Wenbo
Inference Computation Scaling for Feature Augmentation in Recommendation Systems
Liu, Weihao, Du, Zhaocheng, Zhao, Haiyuan, Zhang, Wenbo, Zhao, Xiaoyan, Wang, Gang, Dong, Zhenhua, Xu, Jun
Large language models have become a powerful method for feature augmentation in recommendation systems. However, existing approaches relying on quick inference often suffer from incomplete feature coverage and insufficient specificity in feature descriptions, limiting their ability to capture fine-grained user preferences and undermining overall performance. Motivated by the recent success of inference scaling in math and coding tasks, we explore whether scaling inference can address these limitations and enhance feature quality. Our experiments show that scaling inference leads to significant improvements in recommendation performance, with a 12% increase in NDCG@10. The gains can be attributed to two key factors: feature quantity and specificity. In particular, models using extended Chain-of-Thought (CoT) reasoning generate a greater number of detailed and precise features, offering deeper insights into user preferences and overcoming the limitations of quick inference. We further investigate the factors influencing feature quantity, revealing that model choice and search strategy play critical roles in generating a richer and more diverse feature set. This is the first work to apply inference scaling to feature augmentation in recommendation systems, bridging advances in reasoning tasks to enhance personalized recommendation.
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Zhang, Wenbo, Cai, Hengrui, Chen, Wenyu
Large language models (LLMs) have demonstrated significant utilities in real-world applications, exhibiting impressive capabilities in natural language processing and understanding. Benchmark evaluations are crucial for assessing the capabilities of LLMs as they can provide a comprehensive assessment of their strengths and weaknesses. However, current evaluation methods often overlook the inherent randomness of LLMs by employing deterministic generation strategies or relying on a single random sample, resulting in unaccounted sampling variance and unreliable benchmark score estimates. In this paper, we propose a hierarchical statistical model that provides a more comprehensive representation of the benchmarking process by incorporating both benchmark characteristics and LLM randomness. We show that leveraging multiple generations improves the accuracy of estimating the benchmark score and reduces variance. We also introduce $\mathbb P\left(\text{correct}\right)$, a prompt-level difficulty score based on correct ratios, providing fine-grained insights into individual prompts. Additionally, we create a data map that visualizes difficulty and semantic prompts, enabling error detection and quality control in benchmark construction.
Adaptive Pruning for Large Language Models with Structural Importance Awareness
Zheng, Haotian, Ren, Jinke, Sun, Yushan, Zhang, Ruichen, Zhang, Wenbo, Li, Zhen, Niyato, Dusit, Cui, Shuguang, Han, Yatong
The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high computational and storage resource demands. To address this issue, we propose a novel LLM model pruning method, namely structurally-aware adaptive pruning (SAAP), to significantly reduce the computational and memory costs while maintaining model performance. We first define an adaptive importance fusion metric to evaluate the importance of all coupled structures in LLMs by considering their homoscedastic uncertainty. Then, we rank the importance of all modules to determine the specific layers that should be pruned to meet particular performance requirements. Furthermore, we develop a new group fine-tuning strategy to improve the inference efficiency of LLMs. Finally, we evaluate the proposed SAAP method on multiple LLMs across two common tasks, i.e., zero-shot classification and text generation. Experimental results show that our SAAP method outperforms several state-of-the-art baseline methods, achieving 2.17%, 2.37%, and 2.39% accuracy gains on LLaMA-7B, Vicuna-7B, and LLaMA-13B. Additionally, SAAP improves the token generation speed by 5%, showcasing its practical advantages in resource-constrained scenarios.
INSIGHT: Explainable Weakly-Supervised Medical Image Analysis
Zhang, Wenbo, Chen, Junyu, Kanan, Christopher
Processing such pathology images (WSIs) are often processed by extracting data end-to-end with deep neural networks is computationally embeddings from local regions and then an aggregator infeasible. Instead, pipelines rely on aggregators, which makes predictions from this set. However, current methods synthesize local embeddings extracted from tiles (WSIs) or require post-hoc visualization techniques (e.g., Grad-CAM) slices (volumes) into global predictions [5, 6, 23]. While and often fail to localize small yet clinically crucial details. this divide-and-conquer strategy is efficient, current methods To address these limitations, we introduce INSIGHT, a often discard spatial information during feature aggregation novel weakly-supervised aggregator that integrates heatmap and depend on post-hoc visualization tools, such as Grad-generation as an inductive bias. Starting from pre-trained CAM [33], to generate interpretable heatmaps. These visualizations feature maps, INSIGHT employs a detection module with are prone to missing clinically significant features small convolutional kernels to capture fine details and a and introduce additional complexity.
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
Zhang, Wenbo, Zhang, Lu, Hu, Ping, Ma, Liqian, Zhuge, Yunzhi, Lu, Huchuan
Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating strategy: semantics help to extract coherent instances in 3D space, while the resulting instances regularize the injection of stable semantics from 2D space. Additionally, we adopt a 2D-3D joint contrastive loss to enhance the complementarity between view-consistent 3D geometry and rich semantics during the bootstrapping process, enabling FreeGS to uniformly perform tasks such as novel-view semantic segmentation, object selection, and 3D object detection. Extensive experiments on LERF-Mask, 3D-OVS, and ScanNet datasets demonstrate that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback
Zhang, Wenbo, Majumdar, Aditya, Yadav, Amulya
Code-mixing(CM) or code-switching(CSW) refers to the juxtaposition of linguistic units from two or more languages during the conversation or sometimes even a single utterance. Code-mixing introduces unique challenges in daily life, such as syntactic mismatches and semantic blending, that are rarely encountered in monolingual settings. Large language models (LLMs) have revolutionized the field of natural language processing (NLP) by offering unprecedented capabilities in understanding human languages. However, the effectiveness of current state-of-the-art multilingual LLMs has not yet been fully explored in the CM scenario. To fill this gap, we first benchmark the performance of multilingual LLMs on various code-mixing NLP tasks. Then we propose to improve the multilingual LLMs' ability to understand code-mixing through reinforcement learning from human feedback (RLHF) and code-mixed machine translation tasks. Given the high-cost and time-consuming preference labeling procedure, we improve this by utilizing LLMs as annotators to perform the reinforcement learning from AI feedback (RLAIF). The experiments show the effectiveness of the proposed method.
Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation
Liu, Peidong, Zhang, Wenbo, Zhe, Xue, Lv, Jiancheng, Liu, Xianggen
Drug discovery entails a comprehensive understanding of the molecular underpinnings of disease pathophysiology, followed by the identification and synthesis of chemical entities or biopharmaceuticals capable of selectively modulating the pertinent biological pathways (Sneader, 2005). Among the numerous traditional methods, screening from natural products and serendipitous discoveries are the most renowned. The discovery of penicillin and artemisinin (White, 1997), two antibiotics, relied on the former method, while the drug repurposing of sildenafil (Eardley et al., 2002) for the treatment of erectile dysfunction owed to the latter approach. Subsequently, new biologybased and computer-assisted methods have achieved encouraging results (Mandal et al., 2009; Rognan, 2007; Batool et al., 2019). For instance, rational drug design lowers the overall cost by targeting known protein pockets, and highthroughput screening (Mayr and Bojanic, 2009) enables faster identification of molecules with potential drug activity.
Rethinking Softmax: Self-Attention with Polynomial Activations
Saratchandran, Hemanth, Zheng, Jianqiao, Ji, Yiping, Zhang, Wenbo, Lucey, Simon
This paper challenges the conventional belief that softmax attention in transformers is effective primarily because it generates a probability distribution for attention allocation. Instead, we theoretically show that its success lies in its ability to implicitly regularize the Frobenius norm of the attention matrix during training. We then explore alternative activations that regularize the Frobenius norm of the attention matrix, demonstrating that certain polynomial activations can achieve this effect, making them suitable for attention-based architectures. Empirical results indicate these activations perform comparably or better than softmax across various computer vision and language tasks, suggesting new possibilities for attention mechanisms beyond softmax. A key component in the transformer architecture is the softmax attention block, enabling transformers to evaluate the importance of individual input elements during output generation. This feature facilitates an efficient method to attend to diverse input elements throughout training, allowing transformers to effectively capture spatial dependencies within sequential data.
The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News
Wang, Xinyu, Zhang, Wenbo, Koneru, Sai, Guo, Hangzhi, Mingole, Bonam, Sundar, S. Shyam, Rajtmajer, Sarah, Yadav, Amulya
With the rise of AI-generated content spewed at scale from large language models (LLMs), genuine concerns about the spread of fake news have intensified. The perceived ability of LLMs to produce convincing fake news at scale poses new challenges for both human and automated fake news detection systems. To address this gap, this work presents the findings from a university-level competition which aimed to explore how LLMs can be used by humans to create fake news, and to assess the ability of human annotators and AI models to detect it. A total of 110 participants used LLMs to create 252 unique fake news stories, and 84 annotators participated in the detection tasks. Our findings indicate that LLMs are ~68% more effective at detecting real news than humans. However, for fake news detection, the performance of LLMs and humans remains comparable (~60% accuracy). Additionally, we examine the impact of visual elements (e.g., pictures) in news on the accuracy of detecting fake news stories. Finally, we also examine various strategies used by fake news creators to enhance the credibility of their AI-generated content. This work highlights the increasing complexity of detecting AI-generated fake news, particularly in collaborative human-AI settings.
Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey
Wang, Xinyu, Zhang, Wenbo, Rajtmajer, Sarah
In today's global digital landscape, misinformation transcends linguistic boundaries, posing a significant challenge for moderation systems. While significant advances have been made in misinformation detection, the focus remains largely on monolingual high-resource contexts, with low-resource languages often overlooked. This survey aims to bridge that gap by providing a comprehensive overview of the current research on low-resource language misinformation detection in both monolingual and multilingual settings. We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts. We also examine emerging approaches, such as language-agnostic models and multi-modal techniques, while emphasizing the need for improved data collection practices, interdisciplinary collaboration, and stronger incentives for socially responsible AI research. Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.