Date(s) - 26/06/2013
4:00 pm - 5:00 pm
Accurate method for quantifying elements of text quality would be useful in search, computer-assisted writing and text generation technologies. However progress in this area of research is constrained by the lack of suitable datasets for development and evaluation.
Here I introduce a corpus of science journalism articles which fulfills the glaring need for realistic data for text quality applications. The corpus consists of science journalism pieces, categorized in three levels of writing quality. I will describe how we identiﬁed, guided by the judgements of renowned writers, samples of extraordinarily well-written pieces and how these were expanded to a larger set of typical journalistic writing. I introduce methods for determining two elements of text quality—sentence specificity and syntax-based local coherence—that distinguish amazing from typical writing. I also present results on novel automatic models for elements of text quality specific to the science journalism domain, including the presence of creative language and visual elements in the text.