This research area focuses on the development of NLP and AI algorithms and methodologies, with particular emphasis on their application to various aspects of text analysis. Over the years, we have conducted extensive research on representation learning for text classification, including both cross-lingual and cross-domain scenarios, as well as robust representation learning for handling misspellings. Our work also covers sentiment classification, sequence learning for information extraction, cost-sensitive text classification, and the application of these methods to domains such as authorship analysis, technology-assisted review, and native language identification.
2025 |
Misspellings in Natural Language Processing: A survey.
ArXiv preprint. |
2024 |
A Simple Method for Classifier Accuracy Prediction Under Prior Probability Shift.
Discovery Science - 27th International Conference, DS 2024, Pisa, Italy, October 14-16, 2024, Proceedings, Part II. |
2024 |
Explainable Authorship Identification in Cultural Heritage Applications.
ACM Journal on Computing and Cultural Heritage 17(3). |
2023 |
Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification.
ACM Transactions on Information Systems 41(2). |