Complex Abilities of LLMs

Andrea Esuli
Director of Research

Large Language Models are powerful tools with remarkable potential across a wide range of applications. However, despite their impressive capabilities, they often fall short in tasks that require nuanced reasoning and critical thinking.

For this reason, our research focuses on systematically evaluating LLMs using carefully designed benchmarks that stress their reasoning abilities and uncover their limitations, and improve them accordingly. In recent years, we have assessed the performance of LLMs in domains such as mathematical reasoning, complex decision-making, and multimodal tasks requiring fine-grained alignemnt between perceptual and textual modalities. These evaluations help shed light on the true capabilities and boundaries of current generative models.

Our group is also actively involved in the development of Machine-Generated Text detection methods. As generative models become increasingly pervasive, our goal is to provide tools that can distinguish between human- and machine-generated content, ensuring transparency and enabling responsible use of AI technologies.

Research Topics

Selected Publications

2025
The Invalsi Benchmarks: measuring the Linguistic and Mathematical understanding of Large Language Models in Italian.
Giovanni Puccetti, Maria Cassese, and Andrea Esuli.
Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025.
2024
AI 'News' Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian.
Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell'Orletta, and Andrea Esuli.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024.
2024
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection.
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, and Preslav Nakov.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024.
2024
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models.
Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, and Erkut Erdem.
The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024.