Rating across multiple languages: Perceptions of training and operational rating

This submission has open access
Abstract Summary

This presentation examines the results of both surveys and interviews of raters of a multi-language, large-scale assessment. By examining rater perceptions both widely, via a questionnaire and narrowly, via individual interviews, the study explores the questions: do raters using the same scale have similar or different experiences with rater training and rating across languages? How can such information support training, operational testing and even changes made to rating scales?

Over 60 raters using the same scale applied to constructed responses (speaking and writing) in 11 languages completed a survey about their perceptions of rater training and operational rating, including the aspects of rating to which they attended (Pill & Smart) and how the training prepared them for operational rating. A subset of respondents also participated in an interview to more deeply investigate how these issues intersect and can inform improvements. The study examines similarities and differences among languages.


Submission ID :
AILA413
Submission Type
Argument :

A great deal of research (for example, Pill & Smart, 2020; Attali, 2016; Kuiken, & Vedder, 2014) examines language test rater behavior and consistency, while complementary research (for example, Davis, 2016; Sato, 2014) focuses on rater perceptions of both their training activities, their operational rating processes, and how the two interrelate. However, most of this research focuses on raters working in a single language, often English. Far less research investigates rater perceptions of rater training and operational rating processes, including both challenges and successes, for raters working with a shared scale and a similar test across languages. Borger (2016) and Harsch & Malone (2020) also suggest that raters and other stakeholders can provide critical information not only about how a scale or scales are applied but also the difficulties in applying scales. Thus, this study asks: do raters using the same scale have similar or different experiences with rater training and rating across languages? How can such information support training, operational testing and even changes made to rating scales?

This current study examines research conducted with raters of constructed response tasks from a multi-language, large-scale assessment administered to over 100,000 learners annually in 11 languages (Chinese (Mandarin), English, French, German, Italian, Japanese, Korean, Portuguese, Russian and Spanish). The study first analyzes the results of a short questionnaire sent to speaking and writing raters (N=65) to examine their perceptions of rater training as well as what they attend to in operational rating, including task level and task function. The study also analyzes the outcomes of short interviews with a sub-set of raters (N=25) to shed light on the challenges of rating both generally according to the scale and specifically applying the training and the scale to different languages. 

The presentation will specifically focus on how rater perceptions can inform and improve rater training approaches, exercises and activities. It will also show connections between rater training and operational rating. Additionally, the presentation will identify ways that rater perceptions and recommendations across languages can help rating approaches globally and within each language. 

Beyond applications to rating and rater training, the study and its results will provide an opportunity to reflect on how raters interpret the scale and how it can be improved for clarity and accessibility across languages.


Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115.

Borger, L. (2019). Assessing interactional skills in a paired speaking test: Raters' interpretation of the construct. Apples-Journal of Applied Language Studies, 13(1), 151-174.

Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.

Harsch, C., & Malone, M. E. (2020). Language proficiency frameworks and scales. In The Routledge handbook of second language acquisition and language testing (pp. 33-44). Routledge.

Kuiken, F., & Vedder, I. (2014). Raters' decisions, rating procedures and rating scales. Language testing, 31(3), 279-284.

Pill, J., & Smart, C. (2020). Raters: Behavior and training. In The Routledge Handbook of Second Language Acquisition and Language Testing (pp. 135-144). Routledge.

Sato, M. (2014). Exploring the construct of interactional oral fluency: Second language acquisition and language testing approaches. Syst

Director of Assessment and Research
,
ACTFL
Director, Professional Learning and Certification
,
ACTFL

Similar Abstracts by Type

Submission ID
Submission Title
Submission Topic
Submission Type
Primary Author
AILA851
[SYMP59] OPEN CALL - Language & holistic ecology
Oral Presentation
She/Her Aliyah Morgenstern
AILA911
[SYMP17] Adult Migrants Acquiring Basic Literacy Skills in a Second Language
Oral Presentation
She/Her Kaatje Dalderop
AILA990
[SYMP17] Adult Migrants Acquiring Basic Literacy Skills in a Second Language
Oral Presentation
She/Her MOUTI ANNA
AILA484
[SYMP47] Literacies in CLIL: subject-specific language and beyond
Oral Presentation
She/Her Natalia Evnitskaya
AILA631
[SYMP15] AILA ReN Social cohesion at work: shared languages as mortar in professional settings
Oral Presentation
He/Him Henrik Rahm
AILA583
[SYMP24] Changing perspectives towards multilingual education: teachers, learners and researchers as agents of social cohesion
Oral Presentation
She/Her Alessandra Periccioli
AILA238
[SYMP81] Reflections on co-production as a research practice in the field of foreign language teaching and learning
Oral Presentation
She/Her Martina Zimmermann
AILA290
[SYMP36] Fluency as a multilingual practice: Concepts and challenges
Oral Presentation
He/Him Shungo Suzuki
30 hits