AITopics | Ahmed, Farhan

Collaborating Authors

Ahmed, Farhan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Djuhera, Aladin, Kadhe, Swanand Ravindra, Ahmed, Farhan, Zawad, Syed, Boche, Holger

arXiv.org Artificial IntelligenceMar-21-2025

Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning-and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses. Large language models (LLMs) have demonstrated remarkable capabilities in text generation and understanding while becoming increasingly accessible to AI practitioners. Safety tuning is critical to ensure that advanced LLMs align with human values and security policies, making them safe for deployment (Ouyang et al., 2022; Bai et al., 2022; Chiang et al., 2023; Zhang et al., 2024). However, the safety alignment of current LLMs has been shown to be vulnerable (Wei et al., 2023; Huang et al., 2024e; Yang et al., 2023; Zeng et al., 2024; Zhan et al., 2024; Qi et al., 2023; 2024a).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.17239

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GneissWeb: Preparing High Quality Data for LLMs at Scale

Gohari, Hajar Emami, Kadhe, Swanand Ravindra, Adam, Syed Yousaf Shah. Constantin, Adebayo, Abdulhamid, Adusumilli, Praneet, Ahmed, Farhan, Angel, Nathalie Baracaldo, Borse, Santosh, Chang, Yuan-Chi, Dang, Xuan-Hong, Desai, Nirmit, Eres, Ravital, Iwamoto, Ran, Karve, Alexei, Koyfman, Yan, Lee, Wei-Han, Liu, Changchang, Lublinsky, Boris, Ohko, Takuyo, Pesce, Pablo, Touma, Maroun, Wang, Shiqiang, Witherspoon, Shalisha, Woisetschlager, Herbert, Wood, David, Wu, Kun-Lung, Yoshida, Issei, Zawad, Syed, Zerfos, Petros, Zhou, Yi, Bhattacharjee, Bishwaranjan

arXiv.org Artificial IntelligenceFeb-18-2025

Data quantity and quality play a vital role in determining the performance of Large Language Models (LLMs). High-quality data, in particular, can significantly boost the LLM's ability to generalize on a wide range of downstream tasks. Large pre-training datasets for leading LLMs remain inaccessible to the public, whereas many open datasets are small in size (less than 5 trillion tokens), limiting their suitability for training large models. In this paper, we introduce GneissWeb, a large dataset yielding around 10 trillion tokens that caters to the data quality and quantity requirements of training LLMs. Our GneissWeb recipe that produced the dataset consists of sharded exact sub-string deduplication and a judiciously constructed ensemble of quality filters. GneissWeb achieves a favorable trade-off between data quality and quantity, producing models that outperform models trained on state-of-the-art open large datasets (5+ trillion tokens). We show that models trained using GneissWeb dataset outperform those trained on FineWeb-V1.1.0 by 2.73 percentage points in terms of average score computed on a set of 11 commonly used benchmarks (both zero-shot and few-shot) for pre-training dataset evaluation. When the evaluation set is extended to 20 benchmarks (both zero-shot and few-shot), models trained using GneissWeb still achieve a 1.75 percentage points advantage over those trained on FineWeb-V1.1.0.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.14907

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Education (1.00)
Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Taking off the Rose-Tinted Glasses: A Critical Look at Adversarial ML Through the Lens of Evasion Attacks

Eykholt, Kevin, Ahmed, Farhan, Vaishnavi, Pratik, Rahmati, Amir

arXiv.org Artificial IntelligenceOct-15-2024

The vulnerability of machine learning models in adversarial scenarios has garnered significant interest in the academic community over the past decade, resulting in a myriad of attacks and defenses. However, while the community appears to be overtly successful in devising new attacks across new contexts, the development of defenses has stalled. After a decade of research, we appear no closer to securing AI applications beyond additional training. Despite a lack of effective mitigations, AI development and its incorporation into existing systems charge full speed ahead with the rise of generative AI and large language models. Will our ineffectiveness in developing solutions to adversarial threats further extend to these new technologies? In this paper, we argue that overly permissive attack and overly restrictive defensive threat models have hampered defense development in the ML domain. Through the lens of adversarial evasion attacks against neural networks, we critically examine common attack assumptions, such as the ability to bypass any defense not explicitly built into the model. We argue that these flawed assumptions, seen as reasonable by the community based on paper acceptance, have encouraged the development of adversarial attacks that map poorly to real-world scenarios. In turn, new defenses evaluated against these very attacks are inadvertently required to be almost perfect and incorporated as part of the model. But do they need to? In practice, machine learning models are deployed as a small component of a larger system. We analyze adversarial machine learning from a system security perspective rather than an AI perspective and its implications for emerging AI paradigms.

artificial intelligence, attacker, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.12076

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

Jiang, Shuli, Kadhe, Swanand Ravindra, Zhou, Yi, Ahmed, Farhan, Cai, Ling, Baracaldo, Nathalie

arXiv.org Artificial IntelligenceJul-18-2024

The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To address this gap, we conduct an investigation of various poisoning techniques targeting the LLM's fine-tuning phase via prefix-tuning, a Parameter Efficient Fine-Tuning (PEFT) method. We assess their effectiveness across two generative tasks: text summarization and text completion; and we also introduce new metrics to quantify the success and stealthiness of such NLG poisoning attacks. Through our experiments, we find that the prefix-tuning hyperparameters and trigger designs are the most crucial factors to influence attack success and stealthiness. Moreover, we demonstrate that existing popular defenses are ineffective against our poisoning attacks. Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT across a wide range of triggers and attack settings. We hope our findings will aid the AI security community in developing effective defenses against such threats.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.12281

Country:

Europe (1.00)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

Kadhe, Swanand Ravindra, Ahmed, Farhan, Wei, Dennis, Baracaldo, Nathalie, Padhi, Inkit

arXiv.org Artificial IntelligenceJun-17-2024

Large language models (LLMs) have shown to pose social and ethical risks such as generating toxic language or facilitating malicious use of hazardous knowledge. Machine unlearning is a promising approach to improve LLM safety by directly removing harmful behaviors and knowledge. In this paper, we propose "SPlit, UNlearn, MerGE" (SPUNGE), a framework that can be used with any unlearning method to amplify its effectiveness. SPUNGE leverages data attributes during unlearning by splitting unlearning data into subsets based on specific attribute values, unlearning each subset separately, and merging the unlearned models. We empirically demonstrate that SPUNGE significantly improves the performance of two recent unlearning methods on state-of-the-art LLMs while maintaining their general capabilities on standard academic benchmarks.

large language model, natural language, toxicity, (18 more...)

arXiv.org Artificial Intelligence

2406.1178

Country:

North America > Canada (0.14)
North America > United States > Minnesota (0.14)
Europe > Italy (0.14)
Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Psychological Metrics for Dialog System Evaluation

Giorgi, Salvatore, Havaldar, Shreya, Ahmed, Farhan, Akhtar, Zuhaib, Vaidya, Shalaka, Pan, Gary, Ungar, Lyle H., Schwartz, H. Andrew, Sedoc, Joao

arXiv.org Artificial IntelligenceSep-15-2023

We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens in which conversational agents express a diversity of both states (e.g., emotion) and traits (e.g., personality), just as people do. We present five interpretable metrics from established psychology that are fundamental to human communication and relationships: emotional entropy, linguistic style and emotion matching, agreeableness, and empathy. These metrics can be applied (1) across dialogs and (2) on turns within dialogs. The psychological metrics are compared against seven state-of-the-art traditional metrics (e.g., BARTScore and BLEURT) on seven standard dialog system data sets. We also introduce a novel data set, the Three Bot Dialog Evaluation Corpus, which consists of annotated conversations from ChatGPT, GPT-3, and BlenderBot. We demonstrate that our proposed metrics offer novel information; they are uncorrelated with traditional metrics, can be used to meaningfully compare dialog systems, and lead to increased accuracy (beyond existing traditional metrics) in predicting crowd-sourced dialog judgements. The interpretability and unique signal of our psychological metrics make them a valuable tool for evaluating and improving dialog systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14757

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.71)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback