Date(s) - 07/04/2022
3:00 pm - 4:00 pm
This is a joint event by SICSA (Scottish Informatics and Computer Science Alliance) and SIGGEN (ACL Special Interest Group on Natural Language Generation).
Speaker: Sebastian Gehrmann (Google Research)
How good is a system that produces natural language and where does it fail? This question lies at the core of natural language generation research and motivates what systems we develop. An answer involves deliberations of languages, datasets, metrics, human evaluations, and many more. Using the latest evaluation resources will lead to a more accurate and reproducible answer, but it also relies on keeping up to date with the constantly evolving and fragmented ecosystem of evaluation practices. As a result, many system evaluations rely on anglo-centric corpora and well-established, but flawed, metrics. The Generation Evaluation and Metrics benchmark (GEM) is a participatory project aiming to make it easier to use the evaluation resources produced across the NLG community. In GEMv2, our team of 120 researchers provide access to 35+ corpora in 45+ languages and all the latest metrics in a single line or even without any code. In the talk, I will provide an overview of evaluation challenges and of GEMv2 and discuss how better evaluation practices can lead to better NLG models.
Sebastian is a senior research scientist at Google Research working on the development and evaluation of controllable and interpretable models for language generation. His work received awards and honorable mentions at IEEE Vis ’18 and the ACL’19 and NeurIPS’20 Demo Tracks. He co-organized INLG ’19, the EvalNLGEval workshop at INLG ’20, and the Generation, Evaluation, and Metrics workshop at ACL’21 and EMNLP’22.
You can register for the talk here.