IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection

Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection
View Sample PDF
Author(s): Lihua Wang (Nanjing University of Posts and Telecommunications, China & China Unicom Digital Technology Co., Ltd., China), Pengfei Pei (China Unicom Western Innovation Research Institute, China), Yiran He (Institute of Information Engineering, Chinese Academy of Sciences, China), Zihuan Huang (Xi'an Jiaotong University, China)and Shuai Hu (Xi'an Jiaotong University, China & China Unicom Western Innovation Research Institute, China)
Copyright: 2026
Volume: 18
Issue: 1
Pages: 23
Source title: International Journal of Digital Crime and Forensics (IJDCF)
Editor(s)-in-Chief: Feng Liu (Chinese Academy of Sciences, China)
DOI: 10.4018/IJDCF.403419

Purchase

View Vision Forgery Trace Enhanced VLMs for Generalized AIGC Video Detection on the publisher's website for pricing and purchasing information.

Abstract

Large vision language models (VLMs) show strong open-world generalization but degrade at domain-specific tasks, while traditional small forensic models perform well on in-distribution datasets yet lack cross-distribution generalization and language-based interpretability. To address this gap, the authors propose a vision forgery trace (VFT)-VLM framework, which incorporates forensic features into a VLM without sacrificing its general reasoning ability. Specifically, a lightweight VFT extraction module learns to encode texture anomalies, edge incoherence, pixel artifacts, and frequency-domain deviations. The traces are incorporated into the InternVL2-8B backbone via low rank adaptation fine-tuning, achieving alignment between visual evidence and textual explanations. Across 14 diverse artificial intelligence-generated content benchmark datasets, VFT-VLM outperforms VLM-based large-scale models and achieves comparable or superior performance to relevant traditional small-scale models. Ablation studies confirm both VFT extraction and low rank adaptation fine-tuning are critical to the performance gains.

Related Content

Xixiang Yin. © 2026. 15 pages.
Lihua Wang, Pengfei Pei, Yiran He, Zihuan Huang, Shuai Hu. © 2026. 23 pages.
Shivalaxmi Arumugham, P. Ranjit Jeba Thangaiah. © 2026. 20 pages.
Yuqian Liu, Kairui Li, Mi Li. © 2026. 13 pages.
Waleed A. Alrodhan. © 2026. 33 pages.
Ling An. © 2025. 19 pages.
Mi Li. © 2025. 18 pages.
Body Bottom