Goto

Collaborating Authors

 Law


A Critical Review of Predominant Bias in Neural Networks

arXiv.org Artificial Intelligence

Bias issues of neural networks garner significant attention along with its promising advancement. Among various bias issues, mitigating two predominant biases is crucial in advancing fair and trustworthy AI: (1) ensuring neural networks yields even performance across demographic groups, and (2) ensuring algorithmic decision-making does not rely on protected attributes. However, upon the investigation of \pc papers in the relevant literature, we find that there exists a persistent, extensive but under-explored confusion regarding these two types of biases. Furthermore, the confusion has already significantly hampered the clarity of the community and subsequent development of debiasing methodologies. Thus, in this work, we aim to restore clarity by providing two mathematical definitions for these two predominant biases and leveraging these definitions to unify a comprehensive list of papers. Next, we highlight the common phenomena and the possible reasons for the existing confusion. To alleviate the confusion, we provide extensive experiments on synthetic, census, and image datasets, to validate the distinct nature of these biases, distinguish their different real-world manifestations, and evaluate the effectiveness of a comprehensive list of bias assessment metrics in assessing the mitigation of these biases. Further, we compare these two types of biases from multiple dimensions including the underlying causes, debiasing methods, evaluation protocol, prevalent datasets, and future directions. Last, we provide several suggestions aiming to guide researchers engaged in bias-related work to avoid confusion and further enhance clarity in the community.


Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) are widely applied in various domains, the safety of LLMs is increasingly attracting attention to avoid their powerful capabilities being misused. Existing jailbreak methods create a forced instruction-following scenario, or search adversarial prompts with prefix or suffix tokens to achieve a specific representation manually or automatically. However, they suffer from low efficiency and explicit jailbreak patterns, far from the real deployment of mass attacks to LLMs. In this paper, we point out that simply rewriting the original instruction can achieve a jailbreak, and we find that this rewriting approach is learnable and transferable. We propose the Rewrite to Jailbreak (R2J) approach, a transferable black-box jailbreak method to attack LLMs by iteratively exploring the weakness of the LLMs and automatically improving the attacking strategy. The jailbreak is more efficient and hard to identify since no additional features are introduced. Extensive experiments and analysis demonstrate the effectiveness of R2J, and we find that the jailbreak is also transferable to multiple datasets and various types of models with only a few queries. We hope our work motivates further investigation of LLM safety.


Improving Similar Case Retrieval Ranking Performance By Revisiting RankSVM

arXiv.org Artificial Intelligence

Given the rapid development of Legal AI, a lot of attention has been paid to one of the most important legal AI tasks--similar case retrieval, especially with language models to use. In our paper, however, we try to improve the ranking performance of current models from the perspective of learning to rank instead of language models. Specifically, we conduct experiments using a pairwise method--RankSVM as the classifier to substitute a fully connected layer, combined with commonly used language models on similar case retrieval datasets LeCaRDv1 and LeCaRDv2. We finally come to the conclusion that RankSVM could generally help improve the retrieval performance on the LeCaRDv1 and LeCaRDv2 datasets compared with original classifiers by optimizing the precise ranking. It could also help mitigate overfitting owing to class imbalance. Our code is available in https://github.com/liuyuqi123study/RankSVM_for_SLR


ReLearn: Unlearning via Learning for Large Language Models

arXiv.org Artificial Intelligence

Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.


From Deception to Perception: The Surprising Benefits of Deepfakes for Detecting, Measuring, and Mitigating Bias

arXiv.org Artificial Intelligence

Individuals from minority groups, even with equivalent qualifications, consistently receive fewer opportunities in critical areas such as employment, education, and healthcare. Yet, empirically demonstrating the existence of such pervasive bias, let alone measuring the extent of bias or correcting it, remains a significant challenge. Over several decades, researchers have utilized a range of experimental methodologies to test for biases in real-life situations (Bertrand and Duflo 2017). Audit studies, among the earliest of such methods, match two individuals who are similar in all respects except for sensitive characteristics like race, to test decision-makers' biases (Ayres and Siegelman 1995). A significant limitation of this method, however, is the inherent impossibility of achieving an exact match between two individuals, precluding perfect comparability (Heckman 1998). Correspondence studies have emerged as a predominant experimental approach for measuring biases (Guryan and Charles 2013, Bertrand and Mullainathan 2004). They create identical fictional profiles with manipulated attributes like race to assess differential treatment. However, these studies traditionally manipulate solely textual information, which may not reflect contemporary decision-making scenarios increasingly influenced by visual cues like facial images, as seen in recent hiring processes (Acquisti and Fong 2020, Ruffle and Shtudiner 2015). This reliance on text limits their effectiveness, as modern contexts often involve multimedia elements, making it challenging to measure real-world biases accurately or correct them based on such incomplete information (Armbruster et al. 2015).


Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View

arXiv.org Artificial Intelligence

Large language models (LLMs) offer powerful capabilities but come with significant environmental costs, particularly in carbon emissions. Existing studies benchmark these emissions but lack a standardized basis for comparison across models. To address this, we introduce the concept of a functional unit (FU) and develop FUEL, the first FU-based framework for evaluating LLM serving's environmental impact. Through case studies on model size, quantization, and hardware, we uncover key trade-offs in sustainability. Our findings highlight the potential for reducing carbon emissions by optimizing model selection, deployment strategies, and hardware choices, paving the way for more sustainable AI infrastructure.


FairFare: A Tool for Crowdsourcing Rideshare Data to Empower Labor Organizers

arXiv.org Artificial Intelligence

In recent years, labor organizers representing rideshare and delivery workers have advocated for regulations to improve working conditions in the rideshare industry that set wage floors and job loss protections [67]. To call for these improvements, organizers need to understand workers' existing conditions [37], a significant data access and social computing challenge in the rideshare industry. Labor organizers representing rideshare workers typically rely on a collage of qualitative anecdotes and screenshots to provide data about existing working conditions [24]. While these qualitative data provide rich, "thick descriptions" [30] of workers' experience, they are often dismissed by platforms as non-representative, cherry-picked examples. Rideshare platforms, on the other hand, have exclusive access to large-scale, comprehensive quantitative datasets of driver, trip, and pay data that they can draw upon to create authoritative narratives about working conditions in their industry [72]. Labor organizers need comprehensive access to large-scale quantitative data describing working conditions to conduct rigorous, independent investigations and contest platform-driven narratives. There are tools and legal frameworks that empower individual rideshare workers to independently access quantitative work data (e.g., Gridwise and Data Subject Access Requests). However, these tools and frameworks do not provide an intuitive way to aggregate individual worker data into a dataset that provides collective insight into overarching working conditions. Algorithmic auditing scholarship provides methods, like crowdsourcing data, to independently investigate black-boxed systems [66].


"Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents

arXiv.org Artificial Intelligence

Large language models (LLMs) are evolving into autonomous decision-makers, raising concerns about catastrophic risks in high-stakes scenarios, particularly in Chemical, Biological, Radiological and Nuclear (CBRN) domains. Based on the insight that such risks can originate from trade-offs between the agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel three-stage evaluation framework, which is carefully constructed to effectively and naturally expose such risks. We conduct 14,400 agentic simulations across 12 advanced LLMs, with extensive experiments and analysis. Results reveal that LLM agents can autonomously engage in catastrophic behaviors and deception, without being deliberately induced. Furthermore, stronger reasoning abilities often increase, rather than mitigate, these risks. We Figure 1: We find LLM agents can deploy catastrophic also show that these agents can violate instructions behaviors even if it has no authority and the permission and superior commands. On the whole, request is denied. It will also falsely accuse the third we empirically prove the existence of catastrophic party as a way of deception when asked by its superior.


Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study

arXiv.org Machine Learning

Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the highdimensional setting. For data with inliers generated from a Gaussian with known covariance, we find experimentally that several robust mean estimation techniques can practically improve upon the sample mean, with the quantum entropy scaling approach from Dong et.al.


Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation. However, their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge. In this survey, we provide a comprehensive overview of these methods, which we categorize into four key approaches: dynamic knowledge injection, static knowledge embedding, modular adapters, and prompt optimization. Each approach offers unique mechanisms to equip LLMs with domain expertise, balancing trade-offs between flexibility, scalability, and efficiency. We discuss how these methods enable LLMs to tackle specialized tasks, compare their advantages and disadvantages, evaluate domain-specific LLMs against general LLMs, and highlight the challenges and opportunities in this emerging field. For those interested in delving deeper into this area, we also summarize the commonly used datasets and benchmarks. To keep researchers updated on the latest studies, we maintain an open-source at: https://github.com/abilliyb/Knowledge_Injection_Survey_Papers, dedicated to documenting research in the field of specialized LLM.