The ILR Skill Level Descriptions for the 21st century: Comparability across tasks, tests, skills, examinees, languages, and organizations

This submission has open access

Abstract Summary

In 2021, the Interagency Language Roundtable revised its Skill Level Descriptions for Proficiency to focus on ability rather than traits (Purpura, 2016) and move toward measuring effective communication, focusing more on task, meaning, and contextual appropriateness. Efforts were made to remove measures that relied on nativism, linguistic essentialism, or membership in certain social groups (such as "well-educated speakers"). The revisions were an iterative process, including level progression within a skill, comparability of a level across skills, and applicability across languages, with a final review from stakeholders. A study was conducted to measure the reliability of the revisions, which included 5 organizations, 4 languages, 40 participants, 120 speaking tests, and over 500 ratings. Results indicated that the raters evaluated the tests reliably and had more confidence and clarity in the rating process. The FBI implemented the revisions, updating the test protocol for better alignment of test tasks to the ILR levels and comparability of tasks among languages. The FBI is focused on eliminating score error coming both from sources of rater bias and linguistic elitism and from the production of standards and testing protocol that are valid and reliable across tasks, tests, skills, examinees, languages, and government organizations.

Submission ID :

AILA938

Submission Type

Oral Presentation

Select a Symposium

[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world

Argument :

Each US Government organization uses 1985 Interagency Language Roundtable (ILR) Skill Level Descriptions (SLDs) to develop its own language assessments. Post 9/11, the demands for qualified language personnel increase and testing practices shifted from being somewhat compartmentalized to increasingly collaborative. Changes were needed in the ILR SLDs, which were geared toward second language learners rather than heritage and native speaker examinees. Moreover, government testing programs wanted to integrate advances in language acquisition and assessment research to the ILR SLDs and remove outdated, unclear, and missing concepts.

This presentation will detail how a US Government committee took the lessons learned from the norming sessions, discussions, and shared testing resources to update the ILR SLDs for Proficiency between 2014 and 2021. The standards were restructured to focus on ability rather than traits (Purpura, 2016) and move away from primarily measuring correctness toward measuring effective communication, focusing more on task, meaning, and contextual appropriateness. The revisions also provided an opportunity to embrace the diversity among government examinees and reduce bias and marginalization as a "fundamental institutional change" (Rosa & Flores, 2021). Explicit efforts were made to remove measures that relied on nativism, linguistic essentialism, or membership in certain social groups (such as "well-educated speakers"). The revisions were an iterative process, including level progression within a skill, comparability of a level across skills, and applicability across languages, with a final review from stakeholders.

The committee collected data from a pilot study to measure reliability of the revisions, which included 5 organizations, 4 languages, 40 participants, 120 speaking tests, and over 500 ratings. Results indicated that the raters were able to evaluate the tests reliably and had more confidence and clarity in the rating process. Study data and documentation of support for the revisions contributed a validation argument for the ILR SLDs (Knoch & Chappelle, 2018).

The FBI implemented the 2021 ILR SLDs to its Speaking Proficiency Test program, including revisions to the test protocol for better alignment of test tasks to the ILR levels and comparability of tasks among languages. Additionally, the FBI revised protocol for measuring pragmatic skills through situation tasks for consistency across languages and cultures. Considerations for non-American/English cultural expectations were prioritized and made explicit in the language-specific materials that were developed as a result. The FBI is focused on eliminating score error coming both from sources of rater bias and linguistic elitism and from the production of standards and testing protocol that are valid and reliable across tasks, tests, skills, examinees, languages, and government organizations.

Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477-499.

Purpura, J. E. (2016). Assessing meaning. Shohamy et al. (eds.), Language Testing and Assessment, Encyclopedia of Language and Education, 1-26.

Rosa, J., & Flores, N. (2021). Decolonization, language, and race in Applied Linguistics and social justice. Applied Linguistics, 42(6), 1162–1167. https://doi.org/10.1093/applin/amab062