The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
BadAgent Extension: Cross-Domain Robustness and Trigger Visibility in LLM Agents
Abstract
This paper presents an empirical study on backdoor attacks in large language model agents. We extend a recent attack framework by adding two lightweight benchmarks that measure cross-domain robustness and trigger visibility without changing the model architecture. Our approach fine-tunes AgentLM-based agents with parameter-efficient methods on operating system and web browsing tasks using multiple poisoning ratios and both visible and invisible triggers. We then evaluate the agents with four metrics: Attack Success Rate, Follow Step Ratio, Cross-Domain Robustness, and Trigger Visibility Gap. The results show that backdoors often transfer to unseen domains without a drop in success, while invisible triggers significantly reduce the attack success rate compared to visible ones. These findings highlight the need for stronger evaluation tools and defenses for LLM-based agents.
Related Content
|
Parth Nagar, Srinath M. S..
© 2027.
48 pages.
|
|
Swapnali Pravin Gaikwad, Saurabh Vinayak Hembade.
© 2027.
36 pages.
|
|
Titiksha Tulsidas Bhagat, Shweta Bondre, Vipin Bondre, Uma Yadav, Priya Dasarwar.
© 2027.
26 pages.
|
|
Anshik Kumar Tiwari, Brindha Subburaj.
© 2027.
22 pages.
|
|
Grace Shalini T., Pratham Shrivastav, Parthiv Gopa.
© 2027.
36 pages.
|
|
S. Aarthi, Jaypalsinh A. Gohil.
© 2027.
30 pages.
|
|
Arul Selvam P., Tamije Selvy P..
© 2027.
30 pages.
|
|
|