Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection

View Sample PDF

Author(s): Lihua Wang (Nanjing University of Posts and Telecommunications, China & China Unicom Digital Technology Co., Ltd., China), Pengfei Pei (China Unicom Western Innovation Research Institute, China), Yiran He (Institute of Information Engineering, Chinese Academy of Sciences, China), Zihuan Huang (Xi'an Jiaotong University, China)and Shuai Hu (Xi'an Jiaotong University, China & China Unicom Western Innovation Research Institute, China)
Copyright: 2026
Volume: 18
Issue: 1
Pages: 23
Source title: International Journal of Digital Crime and Forensics (IJDCF)
Editor(s)-in-Chief: Feng Liu (Chinese Academy of Sciences, China)
DOI: 10.4018/IJDCF.403419

Keywords: Digital Crime & Forensics / Information Science Reference / IT Security and Ethics / Security & Forensics

Purchase

View Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection on the publisher's website for pricing and purchasing information.

Abstract

Large vision language models (VLMs) show strong open-world generalization but degrade at domain-specific tasks, while traditional small forensic models perform well on in-distribution datasets yet lack cross-distribution generalization and language-based interpretability. To address this gap, the authors propose a vision forgery trace (VFT)-VLM framework, which incorporates forensic features into a VLM without sacrificing its general reasoning ability. Specifically, a lightweight VFT extraction module learns to encode texture anomalies, edge incoherence, pixel artifacts, and frequency-domain deviations. The traces are incorporated into the InternVL2-8B backbone via low rank adaptation fine-tuning, achieving alignment between visual evidence and textual explanations. Across 14 diverse artificial intelligence-generated content benchmark datasets, VFT-VLM outperforms VLM-based large-scale models and achieves comparable or superior performance to relevant traditional small-scale models. Ablation studies confirm both VFT extraction and low rank adaptation fine-tuning are critical to the performance gains.

The IRMA Community

Research IRM

Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection

Purchase

Abstract

Related Content

IRMA Sponsors