AITopics | frontend

Collaborating Authors

frontend

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Unsupervised Information-Theoretic Perceptual Quality Metric

Neural Information Processing SystemsApr-30-2026, 19:37:04 GMT

Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings, or both. We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM)1. We show that PIM is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset and outperforms them in predicting the ranking of image compression methods in CLIC 2020. We also perform qualitative experiments using the ImageNet-C dataset, and establish that PIM is robust with respect to architectural details.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

Neural Information Processing SystemsFeb-16-2026, 02:19:09 GMT

Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized by generative AI models.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > Canada (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions

Wang, Xi, Ling, Xianyao, Li, Kun, Yin, Gang, Zhang, Liang, Wu, Jiang, Xu, Jun, Zhang, Fu, Lei, Wenbo, Wang, Annie, Gong, Peng

arXiv.org Artificial IntelligenceNov-21-2025

In the current era of big data, extracting deep insights from massive, heterogeneous, and complexly associated multi-dimensional data has become a significant challenge. Large Language Models (LLMs) perform well in natural language understanding and generation, but still suffer from "hallucination" issues when processing structured knowledge and are difficult to update in real-time. Although Knowledge Graphs (KGs) can explicitly store structured knowledge, their static nature limits dynamic interaction and analytical capabilities. Therefore, this paper proposes a multi-dimensional data analysis method based on the interactions between LLM agents and KGs, constructing a dynamic, collaborative analytical ecosystem. This method utilizes LLM agents to automatically extract product data from unstructured data, constructs and visualizes the KG in real-time, and supports users in deep exploration and analysis of graph nodes through an interactive platform. Experimental results show that this method has significant advantages in product ecosystem analysis, relationship mining, and user-driven exploratory analysis, providing new ideas and tools for multi-dimensional data analysis.

large language model, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2510.15258

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Co-TAP: Three-Layer Agent Interaction Protocol Technical Report

An, Shunyu, Wang, Miao, Li, Yongchao, Wan, Dong, Wang, Lina, Qin, Ling, Gao, Liqin, Fan, Congyao, Mao, Zhiyong, Pu, Jiange, Xia, Wenji, Zhao, Dong, Hao, Zhaohui, Hu, Rui, Lu, Ji, Zhou, Guiyue, Tang, Baoyu, Gao, Yanqin, Du, Yongsheng, Xu, Daigang, Huang, Lingjun, Wang, Baoli, Zhang, Xiwen, Wang, Luyao, Liu, Shilong

arXiv.org Artificial IntelligenceOct-29-2025

This paper proposes Co-TAP (T: Triple, A: Agent, P: Protocol), a three-layer agent interaction protocol designed to address the challenges faced by multi-agent systems across the three core dimensions of Interoperability, Interaction and Collaboration, and Knowledge Sharing. We have designed and proposed a layered solution composed of three core protocols: the Human-Agent Interaction Protocol (HAI), the Unified Agent Protocol (UAP), and the Memory-Extraction-Knowledge Protocol (MEK). HAI focuses on the interaction layer, standardizing the flow of information between users, interfaces, and agents by defining a standardized, event-driven communication paradigm. This ensures the real-time performance, reliability, and synergy of interactions. As the core of the infrastructure layer, UAP is designed to break down communication barriers among heterogeneous agents through unified service discovery and protocol conversion mechanisms, thereby enabling seamless interconnection and interoperability of the underlying network. MEK, in turn, operates at the cognitive layer. By establishing a standardized ''Memory (M) - Extraction (E) - Knowledge (K)'' cognitive chain, it empowers agents with the ability to learn from individual experiences and form shareable knowledge, thereby laying the foundation for the realization of true collective intelligence. We believe this protocol framework will provide a solid engineering foundation and theoretical guidance for building the next generation of efficient, scalable, and intelligent multi-agent applications.

agent, artificial intelligence, protocol, (14 more...)

arXiv.org Artificial Intelligence

2510.08263

Genre:

Workflow (0.94)
Overview (0.67)

Industry:

Health & Medicine (0.68)
Telecommunications (0.46)
Information Technology > Security & Privacy (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

SLAM-Former: Putting SLAM into One Transformer

Yuan, Yijun, Chen, Zhuoguang, Li, Kenan, Wang, Weibang, Zhao, Hang

arXiv.org Artificial IntelligenceSep-23-2025

Similar to traditional SLAM systems, SLAM-F ormer comprises both a frontend and a backend that operate in tandem. The frontend processes sequential monocular images in real-time for incremental mapping and tracking, while the backend performs global refinement to ensure a geometrically consistent result. This alternating execution allows the frontend and backend to mutually promote one another, enhancing overall system performance. Comprehensive experimental results demonstrate that SLAM-F ormer achieves superior or highly competitive performance compared to state-of-the-art dense SLAM methods.

artificial intelligence, machine learning, slam-former, (18 more...)

arXiv.org Artificial Intelligence

2509.16909

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Causality and Decision-making: A Logical Framework for Systems and Security Modelling

Chakraborty, Pinaki, Caulfield, Tristan, Pym, David

arXiv.org Artificial IntelligenceAug-5-2025

Causal reasoning is essential for understanding decision-making about the behaviour of complex `ecosystems' of systems that underpin modern society, with security -- including issues around correctness, safety, resilience, etc. -- typically providing critical examples. We present a theory of strategic reasoning about system modelling based on minimal structural assumptions and employing the methods of transition systems, supported by a modal logic of system states in the tradition of van Benthem, Hennessy, and Milner, and validated through equivalence theorems. Our framework introduces an intervention operator and a separating conjunction to capture actual causal relationships between component systems of the ecosystem, aligning naturally with Halpern and Pearl's counterfactual approach based on Structural Causal Models. We illustrate the applicability through examples of of decision-making about microservices in distributed systems. We discuss localized decision-making through a separating conjunction. This work unifies a formal, minimalistic notion of system behaviour with a Halpern--Pearl-compatible theory of counterfactual reasoning, providing a logical foundation for studying decision making about causality in complex interacting systems.

artificial intelligence, intervention, logic & formal reasoning, (19 more...)

arXiv.org Artificial Intelligence

2508.01758

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.34)

Add feedback

Phishing URL Detection using Bi-LSTM

Baskota, Sneha

arXiv.org Artificial IntelligenceMay-1-2025

Phishing attacks threaten online users, often leading to data breaches, financial losses, and identity theft. Traditional phishing detection systems struggle with high false positive rates and are usually limited by the types of attacks they can identify. This paper proposes a deep learning-based approach using a Bidirectional Long Short-Term Memory (Bi-LSTM) network to classify URLs into four categories: benign, phishing, defacement, and malware. The model leverages sequential URL data and captures contextual information, improving the accuracy of phishing detection. Experimental results on a dataset comprising over 650,000 URLs demonstrate the model's effectiveness, achieving 97% accuracy and significant improvements over traditional techniques.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.21049

Genre: Research Report (0.41)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility

Liu, Yifan, Fang, Yu, Lin, Zhouhan

arXiv.org Artificial IntelligenceMar-7-2025

Video-to-speech (V2S) synthesis, the task of generating speech directly from silent video input, is inherently more challenging than other speech synthesis tasks due to the need to accurately reconstruct both speech content and speaker characteristics from visual cues alone. Recently, audio-visual pre-training has eliminated the need for additional acoustic hints in V2S, which previous methods often relied on to ensure training convergence. However, even with pre-training, existing methods continue to face challenges in achieving a balance between acoustic intelligibility and the preservation of speaker-specific characteristics. We analyzed this limitation and were motivated to introduce DiVISe (Direct Visual-Input Speech Synthesis), an end-to-end V2S model that predicts Mel-spectrograms directly from video frames alone. Despite not taking any acoustic hints, DiVISe effectively preserves speaker characteristics in the generated audio, and achieves superior performance on both objective and subjective metrics across the LRS2 and LRS3 datasets. Our results demonstrate that DiVISe not only outperforms existing V2S models in acoustic intelligibility but also scales more effectively with increased data and model parameters. Code and weights can be found at https://github.com/PussyCat0700/DiVISe.

dataset, divise, vocoder, (17 more...)

arXiv.org Artificial Intelligence

2503.05223

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.92)

Add feedback

Split Adaptation for Pre-trained Vision Transformers

Wang, Lixu, Shang, Bingqi, Li, Yi, Mohapatra, Payal, Dong, Wei, Wang, Xiao, Zhu, Qi

arXiv.org Artificial IntelligenceMar-1-2025

Vision Transformers (ViTs), extensively pre-trained on large-scale datasets, have become essential to foundation models, allowing excellent performance on diverse downstream tasks with minimal adaptation. Consequently, there is growing interest in adapting pre-trained ViTs across various fields, including privacy-sensitive domains where clients are often reluctant to share their data. Existing adaptation methods typically require direct data access, rendering them infeasible under these constraints. A straightforward solution may be sending the pre-trained ViT to clients for local adaptation, which poses issues of model intellectual property protection and incurs heavy client computation overhead. To address these issues, we propose a novel split adaptation (SA) method that enables effective downstream adaptation while protecting data and models. SA, inspired by split learning (SL), segments the pre-trained ViT into a frontend and a backend, with only the frontend shared with the client for data representation extraction. But unlike regular SL, SA replaces frontend parameters with low-bit quantized values, preventing direct exposure of the model. SA allows the client to add bi-level noise to the frontend and the extracted data representations, ensuring data protection. Accordingly, SA incorporates data-level and model-level out-of-distribution enhancements to mitigate noise injection's impact on adaptation performance. Our SA focuses on the challenging few-shot adaptation and adopts patch retrieval augmentation for overfitting alleviation. Extensive experiments on multiple datasets validate SA's superiority over state-of-the-art methods and demonstrate its defense against advanced data reconstruction attacks while preventing model leakage with minimal computation cost on the client side. The source codes can be found at https://github.com/conditionWang/Split_Adaptation.

adaptation, quantization, representation, (14 more...)

arXiv.org Artificial Intelligence

2503.00441

Country: Asia > Nepal (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology: