AITopics | detection mechanism

Collaborating Authors

detection mechanism

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection

Chen, Yulin, Wang, Zeyuan, Yu, Tianyuan, Wei, Yingmei, Bai, Liang

arXiv.org Artificial IntelligenceNov-11-2025

The well-aligned attribute of CLIP-based models enables its effective application like CLIPscore as a widely adopted image quality assessment metric. However, such a CLIP-based metric is vulnerable for its delicate multimodal alignment. In this work, we propose \textbf{FoCLIP}, a feature-space misalignment framework for fooling CLIP-based image quality metric. Based on the stochastic gradient descent technique, FoCLIP integrates three key components to construct fooling examples: feature alignment as the core module to reduce image-text modality gaps, the score distribution balance module and pixel-guard regularization, which collectively optimize multimodal output equilibrium between CLIPscore performance and image quality. Such a design can be engineered to maximize the CLIPscore predictions across diverse input prompts, despite exhibiting either visual unrecognizability or semantic incongruence with the corresponding adversarial prompts from human perceptual perspectives. Experiments on ten artistic masterpiece prompts and ImageNet subsets demonstrate that optimized images can achieve significant improvement in CLIPscore while preserving high visual fidelity. In addition, we found that grayscale conversion induces significant feature degradation in fooling images, exhibiting noticeable CLIPscore reduction while preserving statistical consistency with original images. Inspired by this phenomenon, we propose a color channel sensitivity-driven tampering detection mechanism that achieves 91% accuracy on standard benchmarks. In conclusion, this work establishes a practical pathway for feature misalignment in CLIP-based multimodal systems and the corresponding defense method.

artificial intelligence, clipscore, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2511.06947

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reviewer

Neural Information Processing SystemsAug-20-2025, 02:22:47 GMT

We greatly appreciate that both R1 and R2 consider our paper to be well-written/clearly presented. On CIFAR-10, the boundary attack achieves a final average MSE of 0.009 (which FPR = 0.1, which is much higher than what we've obtained on white-box attacks (Table 2). FPR setting we used in our experiments. ImageNet dataset, which few works have experimented or succeeded on (including Madry's, which only evaluates on We hope that this aspect of our study can be appreciated. While the two properties are indeed known, our observations and analysis (cf.

boundary, decision boundary, white-box attack, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback

Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases

Mohanty, Suvendu

arXiv.org Artificial IntelligenceMar-7-2025

Recent advancements in Artificial Intelligence, particularly in Large Language Models (LLMs), have transformed natural language processing by improving generative capabilities. However, detecting biases embedded within these models remains a challenge. Subtle biases can propagate misinformation, influence decision-making, and reinforce stereotypes, raising ethical concerns. This study presents a detection framework to identify nuanced biases in LLMs. The approach integrates contextual analysis, interpretability via attention mechanisms, and counterfactual data augmentation to capture hidden biases across linguistic contexts. The methodology employs contrastive prompts and synthetic datasets to analyze model behaviour across cultural, ideological, and demographic scenarios. Quantitative analysis using benchmark datasets and qualitative assessments through expert reviews validate the effectiveness of the framework. Results show improvements in detecting subtle biases compared to conventional methods, which often fail to highlight disparities in model responses to race, gender, and socio-political contexts. The framework also identifies biases arising from imbalances in training data and model architectures. Continuous user feedback ensures adaptability and refinement. This research underscores the importance of proactive bias mitigation strategies and calls for collaboration between policymakers, AI developers, and regulators. The proposed detection mechanisms enhance model transparency and support responsible LLM deployment in sensitive applications such as education, legal systems, and healthcare. Future work will focus on real-time bias monitoring and cross-linguistic generalization to improve fairness and inclusivity in AI-driven communication tools.

detection, llm, nuanced bias, (13 more...)

arXiv.org Artificial Intelligence

2503.06054

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.74)

Industry: Law (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AdaPhish: AI-Powered Adaptive Defense and Education Resource Against Deceptive Emails

Meguro, Rei, Chong, Ng S. T.

arXiv.org Artificial IntelligenceFeb-10-2025

Phishing attacks remain a significant threat in the digital age, yet organizations lack effective methods to tackle phishing attacks without leaking sensitive information. Phish bowl initiatives are a vital part of cybersecurity efforts against these attacks. However, traditional phish bowls require manual anonymization and are often limited to internal use. To overcome these limitations, we introduce AdaPhish, an AI-powered phish bowl platform that automatically anonymizes and analyzes phishing emails using large language models (LLMs) and vector databases. AdaPhish achieves real-time detection and adaptation to new phishing tactics while enabling long-term tracking of phishing trends. Through automated reporting, adaptive analysis, and real-time alerts, AdaPhish presents a scalable, collaborative solution for phishing detection and cybersecurity education.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICAIC63015.2025.10848878

2502.03622

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
South America > Brazil > Paraná > Curitiba (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre:

Research Report (1.00)
Instructional Material (0.76)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.55)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Safety Monitoring of Machine Learning Perception Functions: a Survey

Ferreira, Raul Sena, Guérin, Joris, Delmas, Kevin, Guiochet, Jérémie, Waeselynck, Hélène

arXiv.org Artificial IntelligenceDec-9-2024

Machine Learning (ML) models, such as deep neural networks, are widely applied in autonomous systems to perform complex perception tasks. New dependability challenges arise when ML predictions are used in safety-critical applications, like autonomous cars and surgical robots. Thus, the use of fault tolerance mechanisms, such as safety monitors, is essential to ensure the safe behavior of the system despite the occurrence of faults. This paper presents an extensive literature review on safety monitoring of perception functions using ML in a safety-critical context. In this review, we structure the existing literature to highlight key factors to consider when designing such monitors: threat identification, requirements elicitation, detection of failure, reaction, and evaluation. We also highlight the ongoing challenges associated with safety monitoring and suggest directions for future research.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.06869

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(11 more...)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Robotics & Automation (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

FedCAP: Robust Federated Learning via Customized Aggregation and Personalization

Li, Youpeng, Wang, Xinda, Yu, Fuxun, Sun, Lichao, Zhang, Wenbin, Wang, Xuyu

arXiv.org Artificial IntelligenceOct-16-2024

Federated learning (FL), an emerging distributed machine learning paradigm, has been applied to various privacy-preserving scenarios. However, due to its distributed nature, FL faces two key issues: the non-independent and identical distribution (non-IID) of user data and vulnerability to Byzantine threats. To address these challenges, in this paper, we propose FedCAP, a robust FL framework against both data heterogeneity and Byzantine attacks. The core of FedCAP is a model update calibration mechanism to help a server capture the differences in the direction and magnitude of model updates among clients. Furthermore, we design a customized model aggregation rule that facilitates collaborative training among similar clients while accelerating the model deterioration of malicious clients. With a Euclidean norm-based anomaly detection mechanism, the server can quickly identify and permanently remove malicious clients. Moreover, the impact of data heterogeneity and Byzantine attacks can be further mitigated through personalization on the client side. We conduct extensive experiments, comparing multiple state-of-the-art baselines, to demonstrate that FedCAP performs well in several non-IID settings and shows strong robustness under a series of poisoning attacks.

data mining, machine learning, model update, (17 more...)

arXiv.org Artificial Intelligence

2410.13083

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reviews: Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Neural Information Processing SystemsOct-8-2024, 01:53:32 GMT

In this paper the authors examine the intuition that interpretability to be the workhorse in detecting adversarial examples of different kinds. That is, if the humanly interpretable attributes are all the same for two images, then the prediction result should only be different if some non-interpretable neurons behave differently. Other than adversarial examples, this work is also highly related to interpretability and explainability questions for DNNs. The basis of their detection mechanism (AmI) lies in determining the sets of neurons (they call attribute witnesses) that are correspond (one-to-one) to a humanly interpretable attributes (like eyeglasses). That means, if the attribute does not change, the neuron should not give a different output, and the other way around if the feature changes, the neuron should change.

attack meet interpretability, attribute-steered detection, neuron, (11 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Add feedback

FRIDA: Free-Rider Detection using Privacy Attacks

Recasens, Pol G., Horváth, Ádám, Gutierrez-Torre, Alberto, Torres, Jordi, Berral, Josep Ll., Pejó, Balázs

arXiv.org Artificial IntelligenceOct-7-2024

Federated learning is increasingly popular as it enables multiple parties with limited datasets and resources to train a high-performing machine learning model collaboratively. However, similarly to other collaborative systems, federated learning is vulnerable to free-riders -- participants who do not contribute to the training but still benefit from the shared model. Free-riders not only compromise the integrity of the learning process but also slow down the convergence of the global model, resulting in increased costs for the honest participants. To address this challenge, we propose FRIDA: free-rider detection using privacy attacks, a framework that leverages inference attacks to detect free-riders. Unlike traditional methods that only capture the implicit effects of free-riding, FRIDA directly infers details of the underlying training datasets, revealing characteristics that indicate free-rider behaviour. Through extensive experiments, we demonstrate that membership and property inference attacks are effective for this purpose. Our evaluation shows that FRIDA outperforms state-of-the-art methods, especially in non-IID settings.

detection, detection mechanism, mechanism, (15 more...)

arXiv.org Artificial Intelligence

2410.0502

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > Virginia (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Topic-based Watermarks for LLM-Generated Text

Nemecek, Alexander, Jiang, Yuzhou, Ayday, Erman

arXiv.org Artificial IntelligenceApr-16-2024

Recent advancements of large language models (LLMs) have resulted in indistinguishable text outputs comparable to human-generated text. Watermarking algorithms are potential tools that offer a way to differentiate between LLM- and human-generated text by embedding detectable signatures within LLM-generated output. However, current watermarking schemes lack robustness against known attacks against watermarking algorithms. In addition, they are impractical considering an LLM generates tens of thousands of text outputs per day and the watermarking algorithm needs to memorize each output it generates for the detection to work. In this work, focusing on the limitations of current watermarking schemes, we propose the concept of a "topic-based watermarking algorithm" for LLMs. The proposed algorithm determines how to generate tokens for the watermarked LLM output based on extracted topics of an input prompt or the output of a non-watermarked LLM. Inspired from previous work, we propose using a pair of lists (that are generated based on the specified extracted topic(s)) that specify certain tokens to be included or excluded while generating the watermarked output of the LLM. Using the proposed watermarking algorithm, we show the practicality of a watermark detection algorithm. Furthermore, we discuss a wide range of attacks that can emerge against watermarking algorithms for LLMs and the benefit of the proposed watermarking scheme for the feasibility of modeling a potential attacker considering its benefit vs. loss.

llm, sequence, text sequence, (13 more...)

arXiv.org Artificial Intelligence

2404.02138

Country:

North America > United States > Virginia (0.04)
North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multimodal Attack Detection for Action Recognition Models

Mumcu, Furkan, Yilmaz, Yasin

arXiv.org Artificial IntelligenceApr-12-2024

Adversarial machine learning attacks on video action recognition models is a growing research area and many effective attacks were introduced in recent years. These attacks show that action recognition models can be breached in many ways. Hence using these models in practice raises significant security concerns. However, there are very few works which focus on defending against or detecting attacks. In this work, we propose a novel universal detection method which is compatible with any action recognition model. In our extensive experiments, we show that our method consistently detects various attacks against different target models with high true positive rates while satisfying very low false positive rates. Tested against four state-of-the-art attacks targeting four action recognition models, the proposed detector achieves an average AUC of 0.911 over 16 test cases while the best performance achieved by the existing detectors is 0.645 average AUC. This 41.2% improvement is enabled by the robustness of the proposed detector to varying attack methods and target models. The lowest AUC achieved by our detector across the 16 test cases is 0.837 while the competing detector's performance drops as low as 0.211. We also show that the proposed detector is robust to varying attack strengths. In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.

action recognition model, adversarial attack, video, (14 more...)

arXiv.org Artificial Intelligence

2404.1079

Country: North America > United States > Florida > Hillsborough County > Tampa (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback