AITopics | authentic image

Collaborating Authors

authentic image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual-branch Prompting for Multimodal Machine Translation

Wang, Jie, Yang, Zhendong, Zong, Liansong, Zhang, Xiaobo, Wang, Dexian, Zhang, Ji

arXiv.org Artificial IntelligenceDec-5-2025

Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often rely on paired image-text inputs at inference and are sensitive to irrelevant visual noise, which limits their robustness and practical applicability. To address these issues, we propose D2P-MMT, a diffusion-based dual-branch prompting framework for robust vision-guided translation. Specifically, D2P-MMT requires only the source text and a reconstructed image generated by a pre-trained diffusion model, which naturally filters out distracting visual details while preserving semantic cues. During training, the model jointly learns from both authentic and reconstructed images using a dual-branch prompting strategy, encouraging rich cross-modal interactions. To bridge the modality gap and mitigate training-inference discrepancies, we introduce a distributional alignment loss that enforces consistency between the output distributions of the two branches. Extensive experiments on the Multi30K dataset demonstrate that D2P-MMT achieves superior translation performance compared to existing state-of-the-art approaches.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2507.17588

Country:

Europe (1.00)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can ChatGPT Perform Image Splicing Detection? A Preliminary Study

Nath, Souradip

arXiv.org Artificial IntelligenceJun-9-2025

Multimodal Large Language Models (MLLMs) like GPT-4V are capable of reasoning across text and image modalities, showing promise in a variety of complex vision-language tasks. In this preliminary study, we investigate the out-of-the-box capabilities of GPT-4V in the domain of image forensics, specifically, in detecting image splicing manipulations. Without any task-specific fine-tuning, we evaluate GPT-4V using three prompting strategies: Zero-Shot (ZS), Few-Shot (FS), and Chain-of-Thought (CoT), applied over a curated subset of the CASIA v2.0 splicing dataset. Our results show that GPT-4V achieves competitive detection performance in zero-shot settings (more than 85% accuracy), with CoT prompting yielding the most balanced trade-off across authentic and spliced images. Qualitative analysis further reveals that the model not only detects low-level visual artifacts but also draws upon real-world contextual knowledge such as object scale, semantic consistency, and architectural facts, to identify implausible composites. While GPT-4V lags behind specialized state-of-the-art splicing detection models, its generalizability, interpretability, and encyclopedic reasoning highlight its potential as a flexible tool in image forensics.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.05358

Country: North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection

Wu, Junjie, Fu, Yumeng, Yu, Nan, Fu, Guohong

arXiv.org Artificial IntelligenceFeb-11-2025

Recent studies in Large Vision-Language Models (LVLMs) have demonstrated impressive advancements in multimodal Out-of-Context (OOC) misinformation detection, discerning whether an authentic image is wrongly used in a claim. Despite their success, the textual evidence of authentic images retrieved from the inverse search is directly transmitted to LVLMs, leading to inaccurate or false information in the decision-making phase. To this end, we present E2LVLM, a novel evidence-enhanced large vision-language model by adapting textual evidence in two levels. First, motivated by the fact that textual evidence provided by external tools struggles to align with LVLMs inputs, we devise a reranking and rewriting strategy for generating coherent and contextually attuned content, thereby driving the aligned and effective behavior of LVLMs pertinent to authentic images. Second, to address the scarcity of news domain datasets with both judgment and explanation, we generate a novel OOC multimodal instruction-following dataset by prompting LVLMs with informative content to acquire plausible explanations. Further, we develop a multimodal instruction-tuning strategy with convincing explanations for beyond detection. This scheme contributes to E2LVLM for multimodal OOC misinformation detection and explanation. A multitude of experiments demonstrate that E2LVLM achieves superior performance than state-of-the-art methods, and also provides compelling rationales for judgments.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10455

Country:

North America > United States (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Present and Future Generalization of Synthetic Image Detectors

Bernabeu-Perez, Pablo, Lopez-Cuena, Enrique, Garcia-Gasulla, Dario

arXiv.org Artificial IntelligenceNov-26-2024

The continued release of increasingly realistic image generation models creates a demand for synthetic image detectors. To build effective detectors we must first understand how factors like data source diversity, training methodologies and image alterations affect their generalization capabilities. This work conducts a systematic analysis and uses its insights to develop practical guidelines for training robust synthetic image detectors. Model generalization capabilities are evaluated across different setups (e.g. scale, sources, transformations) including real-world deployment conditions. Through an extensive benchmarking of state-of-the-art detectors across diverse and recent datasets, we show that while current approaches excel in specific scenarios, no single detector achieves universal effectiveness. Critical flaws are identified in detectors, and workarounds are proposed to enable the deployment of real-world detector applications enhancing accuracy, reliability and robustness beyond the limitations of current systems.

dataset, detection, detector, (15 more...)

arXiv.org Artificial Intelligence

2409.14128

Country:

Europe (0.14)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GreatSplicing: A Semantically Rich Splicing Dataset

Bi, Xiuli, Liang, Jiaming

arXiv.org Artificial IntelligenceOct-22-2023

In existing splicing forgery datasets, the insufficient semantic varieties of spliced regions cause a problem that trained detection models overfit semantic features rather than splicing traces. Meanwhile, because of the absence of a reasonable dataset, different detection methods proposed cannot reach a consensus on experimental settings. To address these urgent issues, GreatSplicing, a manually created splicing dataset with a considerable amount and high quality, is proposed in this paper. GreatSplicing comprises 5,000 spliced images and covers spliced regions with 335 distinct semantic categories, allowing neural networks to grasp splicing traces better. Extensive experiments demonstrate that models trained on GreatSplicing exhibit minimal misidentification rates and superior cross-dataset detection capabilities compared to existing datasets. Furthermore, GreatSplicing is available for all research purposes and can be downloaded from www.greatsplicing.net.

dataset, greatsplicing, spliced image, (14 more...)

arXiv.org Artificial Intelligence

2310.1007

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)

Add feedback

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Guo, Wenyu, Fang, Qingkai, Yu, Dong, Feng, Yang

arXiv.org Artificial IntelligenceOct-20-2023

Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Since there is no paired image available for the input sentence in most cases, recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. Nevertheless, synthetic images generated by these models often follow different distributions compared to authentic images. Consequently, using authentic images for training and synthetic images for inference can introduce a distribution shift, resulting in performance degradation during inference. To tackle this challenge, in this paper, we feed synthetic and authentic images to the MMT model, respectively. Then we minimize the gap between the synthetic and authentic images by drawing close the input image representations of the Transformer Encoder and the output distributions of the Transformer Decoder. Therefore, we mitigate the distribution disparity introduced by the synthetic images during inference, thereby freeing the authentic images from the inference process.Experimental results show that our approach achieves state-of-the-art performance on the Multi30K En-De and En-Fr datasets, while remaining independent of authentic images during inference.

authentic image, representation, translation, (15 more...)

arXiv.org Artificial Intelligence

2310.13361

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(16 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection

Dong, Chengbo, Chen, Xinru, Hu, Ruohan, Cao, Juan, Li, Xirong

arXiv.org Artificial IntelligenceDec-16-2021

Abstract--The key research question for image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity mostly ignored. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifacts surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by the prior art that relies on a semantic segmentation loss. Our thoughts are realized by a new network which we term MVSS-Net and its enhanced version MVSS-Net . Comprehensive experiments on six public benchmark datasets justify the viability of the MVSS-Net series for both pixel-level and image-level manipulation detection.

detection, manipulation detection, mvss-net, (16 more...)

arXiv.org Artificial Intelligence

2112.08935

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Chen, Xinru, Dong, Chengbo, Ji, Jiaqi, Cao, Juan, Li, Xirong

arXiv.org Artificial IntelligenceApr-14-2021

The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.

detection, manipulation detection, mvss-net, (16 more...)

arXiv.org Artificial Intelligence

2104.06832

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback