New perspectives for examinations alignment on frameworks of reference for languages: corpora approach and new perspectives and challenges opened by automation and Artificial Intelligence

This submission has open access

Abstract Summary

Bachman, L. F. (2000). Modern Language Testing at the Turn of the Century : Assuring That What We Count Counts. Language Testing, 17(1), 1‑42.

Figueras, N., North, B., Takala, S., Verhelst, N., & Van Avermaet, P. (2005). Relating examinations to the Common European Framework : A manual. Language Testing, 22(3), 261‑279. https://doi.org/10.1191/0265532205lt308oa

Folny, V. (2020). Adossement des épreuves d'expression orale et écrite du Test de connaissance du français (TCF) sur les Niveaux de compétences linguistiques canadiens (NCLC) et correspondance avec les niveaux du Cadre européen commun de référence pour les langues (CECRL). Canadian Journal of Applied Linguistics, 23(2), 20‑72. https://doi.org/10.37213/cjal.2020.30437

Jiao, H., & Lissitz, R. W. (Éds.). (2020). Application of artificial intelligence to assessment. Information Age Publishing, inc.

Klebanov, B. B., & Madnani, N. (2022). Automated essay scoring. Morgan & Claypool Publishers.

Yan, D., Rupp, A. A., & Foltz, P. W. (2020). Handbook of automated scoring : Theory in practice. CRC Press Taylor & Francis Group.

Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores : A manual for setting standards of performance on educational and occupational tests. Educational Testing Service.

Submission ID :

AILA1043

Submission Type

Oral Presentation

Select a Symposium

[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world

Argument :

At the corner of the millennium, the language testing field has seen the publication of diverse frameworks of reference for languages (CEFR, CLB/NCLC, STANAG…), Reference Level Descriptions and Illustrations of the levels of language proficiency. This clear professionalisation was desired by the field itself. The use of these frameworks for languages has facilitated the alignment of the examinations on diverse proficiency levels. In Europe, Standard setting and benchmarking have become a regular activity for professional test developers and are clearly part of their test validation agenda. It is also a way to rationalise the decision made on people.

If key publications have helped to promote good practices and improvement concerning Standard setting procedures, questions are still opened concerning the procedures and the way to reach a significative level of quality: use of one or various standard setting methods, reproducibility of the findings, way to optimize efficiency (number of panellists, number of items / number of productions to be revised, remote work, reliability analysis of panellists ratings…). The last decade has not been an opportunity to see emergent procedures or dramatic improvements in this area.

During the last decade, dramatic innovations emerged elsewhere mainly from the technological ground: big data collection, artificial intelligence procedures (machine learning, deep learning…), language models (GPT3, BERT, Lambda, Big science project...), externalisation of data storage (cloud)… This context is challenging well-established language test providers with their practices. These innovations are interpreted by some as a threat coming from "outside" or by others as a way to renew the field.

France Education International (FEI), a french public agency ("opérateur"), is in charge of a test, the TCF (Test de connaissance du français). This test has a 20-year history, is implemented in 200 countries and taken by 200 000 candidates annually. The TCF is use mainly for studies in French universities, migration purposes in France and Canada. Two years ago, FEI has taken the decision to "modernize" and partially automate the writing rating of this test. This project is part of the modernisation of the French administration and FEI got public founding to support a research agenda.

During this presentation, we will explain how FEI is planning to use new technologies, artificial intelligence procedures to help the writing assessment of the candidates and the raters in their work. We will analyse the pro and limits of such a procedure. As FEI (for the first time for French) produced an annotated corpus for training, assessment, and research), we will explain in which way the work with corpus is opening new avenues to improve the alignment of the TCF with CEFR levels. We will analyse the benefit and emerging challenges to deal with this new facet in our test validity argument.