Pro and cons of standardization of CEFR texts using NLP : the case of automatic text readability assessment

This submission has open access

Abstract Summary

Submission ID :

AILA1462

Submission Type

Select a Symposium

[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world

Argument :

Since the beginning of the 21st century, Automatic Text Readability (ARA) has undergone significant advances as the result of the merging of the traditional approach of readability assessment (from the field of education) with innovative technologies from Natural Language Processing (NLP). Therefore, new readability models have been developed by combining different language technologies. They are able to rely on a much richer set of text features and to make more reliable predictions (see Collins-Thompson, 2014; François, 2015; Vajjala, 2022 for syntheses of the recent advances). A by-product of using the NLP approach is the need for corpora – with texts whose reading difficulty has been previously identified – of larger size than for classical readability formulas. Whereas the annotation of text difficulty in classic readability research was based on reader data, the NLP-enabled readability has abandoned the reader data gathering in favor of pedagogical texts, then the difficulty is assessed by experts. As a result, recent readability models no longer directly model measures of reading comprehension – even though any of these, such as cloze test or multiple-choice questions, has limitations –, but instead model expert judgments about what is simpler or more complex. In this communication, we will discuss the influence of NLP techniques on the standardization of the difficulty scales used to assess the readability of texts, focusing on the case of the CEFR scale. We will argue that current practices not only implicitly transform the view of experts into standards once the algorithms have captured associations between text features and expert evaluations on a proficiency scale, but they also run the risk of standardizing the point of view of a limited number of professionals, given that the corpora currently recognized and used in the field are generally very homogeneous (WeeBit by Vajjala and Meurers, 2012; Newsela by Xu et al., 2015; OneStopEnglish by Vajjala and Lucic, 2018; CLEAR by Crossley et al., 2022). Moreover, although the corpora homogeneity, corpus-based readability formulas can overspecialize; in other words, they model the properties observed in a limited observation based on a single point of view. Therefore, based on the different perspectives between the available corpora, we will also discuss the risk of partial formulas, which can be caused by incomplete text features or coverage, that do not generalize a corpus enough and lead the biased measures. Using these biased measures might negatively impact, for example, the selection of texts based on the CEFR scale. Our talk will first introduce the task of automatic readability assessment, then describe the characteristics of the available corpora as well as the typically text features used, and, finally, discuss their impact on a possible standardization of proficiency scales. Bibliography: Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, 165(2), 97-135. Crossley, S., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2022). A largescaled corpus for assessing text readability. Behavior Research Methods, 1-17. François, T. (2015). When readability meets computational linguistics: a new paradigm in readability. Revue française de linguistique appliquée, (2), 79-97. Vajjala, S. (2022). Trends, limitations and open challenges in automatic readability assessment research. Proceedings of LREC 2022 (in press). Vajjala, S., & Lučić, I. (2018). OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications (pp. 297-304). Vajjala, S., & Meurers, D. (2012, June). On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the seventh workshop on building educational applications using NLP (pp. 163-173). Xu, W., Callison-Burch, C., & Napoles, C. (2015). Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3, 283-297.