Large Language Models are powerful tools with remarkable potential across a wide range of applications. However, despite their impressive capabilities, they often fall short in tasks that require nuanced reasoning and critical thinking.
For this reason, our research focuses on systematically evaluating LLMs using carefully designed benchmarks
that stress their reasoning abilities and uncover their limitations, and improve them accordingly.
In recent years, we have assessed the performance of LLMs in domains such as mathematical reasoning, complex decision-making, and multimodal tasks
requiring fine-grained alignemnt between perceptual and textual modalities. These evaluations help shed light on the true capabilities and boundaries of current generative models.
Our group is also actively involved in the development of Machine-Generated Text detection methods.
As generative models become increasingly pervasive, our goal is to provide tools that can distinguish between human- and machine-generated content, ensuring transparency and enabling responsible use of AI technologies.
2025 |
The Invalsi Benchmarks: measuring the Linguistic and Mathematical understanding of Large Language Models in Italian.
Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025. |
2024 |
AI 'News' Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024. |
2024 |
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024. |
2024 |
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models.
The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. |