Testing AI agents is more complicated than testing conventional software. Large language models (LLMs) drive…