This paper presents Ad-llm, a comprehensive benchmark for evaluating large language models in anomaly detection tasks. The benchmark systematically evaluates the capabilities of various LLMs across different anomaly detection scenarios, providing insights into their strengths and limitations in identifying abnormal patterns in data.