[SYMP40] Language Futures: Tensions And Synergies In The Use Of Standards In Language Assessment In A Multi/plurilingual World

Loading Session...

To ensure smooth communication and collaboration, here are some troubleshooting tips to address common issues:

Check Internet Connection: Verify that you have a stable and reliable internet connection. Use a wired connection when possible, as it tends to be more stable than Wi-Fi. If using Wi-Fi, make sure you have a strong signal.
Update the Browser or App: Ensure that you are using the latest version of the web browser. Developers frequently release updates to address bugs and improve performance.
Clear Browser Cache: Sometimes, cached data can cause conflicts or issues. Clear the browser cache and cookies before joining the meeting.
Test Audio and Video: Before the meeting, check your microphone and camera to ensure they are working correctly. If you are a speaker, you can click on "Start Practice Session" button test to ensure audio and video devices are functioning.
Close Other Applications: Running multiple applications in the background can consume system resources and lead to performance issues. Close unnecessary apps to free up resources for the Dryfta meeting platform.
Restart Your Device: If you encounter persistent issues, try restarting your computer or mobile device. This can help resolve various software-related problems.
Use Supported Browsers: Ensure you are using a browser supported by the meeting platform. Recommended browsers: Chrome, Firefox, Edge, and Brave.
Allow Necessary Permissions: Make sure the Dryfta meeting platform has the required permissions to access your microphone, camera, and other necessary features.
Disable VPN or Firewall: Sometimes, VPNs or firewalls can interfere with the connection to the meeting platform. Temporarily disable them and see if the issue persists.
Switch Devices: If possible, try joining the meeting from a different device to see if the problem is specific to one device.
Reduce Bandwidth Usage: In cases of slow or unstable internet connections, ask participants to disable video or share video selectively to reduce bandwidth consumption.
Update Drivers and Software: Ensure your operating system, audio drivers, and video drivers are up to date. Outdated drivers can cause compatibility issues with the Dryfta meeting platform.
Contact Support: If none of the above steps resolve the issue, reach out to the platform's support team. They can provide personalized assistance and troubleshoot specific problems.

By following these troubleshooting tips, you can tackle many common problems encountered on Dryfta meeting platform and have a more productive and seamless meeting experience.

Session Information

D2-104 Live Meeting | Pre-recorded + Live Q&A

Jul 18, 2023 08:30 - Jul 18, 2024 16:15(Europe/Amsterdam)

Venue : Hybrid Session (onsite/online)

20230718T0830 20230718T1615 Europe/Amsterdam [SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world Hybrid Session (onsite/online) AILA 2023 - 20th Anniversary Congress Lyon Edition cellule.congres@ens-lyon.fr

Add to my schedule

Sub Sessions

View Abstract

An innovative approach to setting standards and testing claims of CEFR alignment across multiples languages

Oral Presentation[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world 08:30 AM - 04:15 PM (Europe/Amsterdam) 2023/07/18 06:30:00 UTC - 2024/07/18 14:15:00 UTC

Little research has been produced to evaluate the comparability of alignment claims of tests of different languages that target the same levels of the Common European Framework of Reference. This presentation describes an innovative procedure to put standard-setting results for reading tasks in English, French, Spanish, and German onto a common scale. Participants in this experimental study first carried out standard setting using a Modified Angoff method with reading tasks in English. They then split into three groups to carry out standard setting using the same method on reading tasks in one of the other three languages mentioned above. Exemplar tasks of reading comprehension collected by the Council of Europe and made freely available to facilitate best practice in linking exams to the CEFR were used. The standard-setting judgements were then analysed in a concurrent data matrix using multi-facet Rasch model (MFRM) analysis. While this experimental study is not intended as definitive evidence for any of the test developer's claims, it is a potentially useful method for programs that produce tests targeting the same CEFR levels in different languages to gather evidence to support those claims.

ArgumentThe Common European Framework of Reference has become a key resource in language education and assessment not only within Europe but internationally. A key goal of the Common European Framework of Reference was to provide "a common basis for the elaboration of language syllabuses, curriculum guidelines, examinations, textbooks, etc. across Europe" (Council of Europe, 2001). Since its launch, a great deal of work has been put into the methodology for aligning an exam to the CEFR and some work into ensuring the comparability of claims of CEFR alignment of different foreign language exams targeting the same language. However, little research has been produced to facilitate the comparability of exams claiming alignment to the CEFR across different languages that target the same CEFR levels. This presentation presents evidence from an innovative pilot procedure to link standard setting results for tasks across English, French, Spanish, and German to the CEFR, and to compare the results to the original claims of alignment with the CEFR from the test developers. Exemplar tasks of reading comprehension that were collected by the Council of Europe and made freely available to facilitate best practice in linking exams to the CEFR were used. Participants in this experimental study first carried out standard setting using a Modified Angoff method with reading tasks in English. They then split into four groups to carry out standard setting using the same method on reading tasks in one of the other three languages mentioned above. The English tasks thus acted as an anchor set, linking all judges. The standard-setting judgement data were then pooled and analysed in a concurrent data matrix using a multi-facet Rasch model (MFRM) analysis which allowed the results to be placed onto a common scale. The common scale approach allows for a comparison of difficulty using the Rasch logit scale. Thus test tasks for different languages that are posited to be at a B2 level of difficulty, for example, can be compared in terms of their difficulty estimation on a common scale by the four standard-setting panels (English, French, Spanish and German). The results show that the original CEFR levels posited by the different test developers generally held in practice. It is important to note that this was an experimental procedure designed to investigate the potential of this innovative methodology, and it not intended as definitive evidence for any of the test developer's claims. It is presented as a potentially useful method for language education programs that need to ensure that assessments they produce for targeting the same CEFR levels in different languages can be supported by standard-setting evidence. The methodology is feasible and sustainable for such programs with access to language education experts who can participate in standard-setting panels for more than one language.

Presenters

Co-authors

View Abstract

Pro and cons of standardization of CEFR texts using NLP : the case of automatic text readability assessment

ArgumentSince the beginning of the 21st century, Automatic Text Readability (ARA) has undergone significant advances as the result of the merging of the traditional approach of readability assessment (from the field of education) with innovative technologies from Natural Language Processing (NLP). Therefore, new readability models have been developed by combining different language technologies. They are able to rely on a much richer set of text features and to make more reliable predictions (see Collins-Thompson, 2014; François, 2015; Vajjala, 2022 for syntheses of the recent advances). A by-product of using the NLP approach is the need for corpora – with texts whose reading difficulty has been previously identified – of larger size than for classical readability formulas. Whereas the annotation of text difficulty in classic readability research was based on reader data, the NLP-enabled readability has abandoned the reader data gathering in favor of pedagogical texts, then the difficulty is assessed by experts. As a result, recent readability models no longer directly model measures of reading comprehension – even though any of these, such as cloze test or multiple-choice questions, has limitations –, but instead model expert judgments about what is simpler or more complex. In this communication, we will discuss the influence of NLP techniques on the standardization of the difficulty scales used to assess the readability of texts, focusing on the case of the CEFR scale. We will argue that current practices not only implicitly transform the view of experts into standards once the algorithms have captured associations between text features and expert evaluations on a proficiency scale, but they also run the risk of standardizing the point of view of a limited number of professionals, given that the corpora currently recognized and used in the field are generally very homogeneous (WeeBit by Vajjala and Meurers, 2012; Newsela by Xu et al., 2015; OneStopEnglish by Vajjala and Lucic, 2018; CLEAR by Crossley et al., 2022). Moreover, although the corpora homogeneity, corpus-based readability formulas can overspecialize; in other words, they model the properties observed in a limited observation based on a single point of view. Therefore, based on the different perspectives between the available corpora, we will also discuss the risk of partial formulas, which can be caused by incomplete text features or coverage, that do not generalize a corpus enough and lead the biased measures. Using these biased measures might negatively impact, for example, the selection of texts based on the CEFR scale. Our talk will first introduce the task of automatic readability assessment, then describe the characteristics of the available corpora as well as the typically text features used, and, finally, discuss their impact on a possible standardization of proficiency scales. Bibliography: Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, 165(2), 97-135. Crossley, S., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2022). A largescaled corpus for assessing text readability. Behavior Research Methods, 1-17. François, T. (2015). When readability meets computational linguistics: a new paradigm in readability. Revue française de linguistique appliquée, (2), 79-97. Vajjala, S. (2022). Trends, limitations and open challenges in automatic readability assessment research. Proceedings of LREC 2022 (in press). Vajjala, S., & Lučić, I. (2018). OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications (pp. 297-304). Vajjala, S., & Meurers, D. (2012, June). On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the seventh workshop on building educational applications using NLP (pp. 163-173). Xu, W., Callison-Burch, C., & Napoles, C. (2015). Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3, 283-297.

Presenters

Co-authors

View Abstract

Comparing students’ proficiency levels across languages and contexts

ArgumentThe large-scale empirical research project LANGUAGES aims to compare language teaching policies, methodologies and outcomes for secondary school learners (aged 13-15) of English and French in three European countries: England, France and Norway. English and French have different statuses in these countries: depending on the context, they are first, second or foreign languages. In addition, students' language proficiency in second and foreign languages is known to vary extensively across contexts (European Commission, 2012).

In order to conduct meaningful cross-context comparisons, the LANGUAGES research team needed valid and reliable measures of language proficiency. Altogether, the selected tests needed to offer comparable data and be appropriate for adolescent learners of French and English in each country at different levels. In terms of the Common European Framework of Reference (CEFR) levels, we expected proficiency to vary from the pre-A1 level (Norwegian and English learners of French), via A, B and possibly C-levels (Norwegian and French learners of English), to native speaker competence (secondary school students of English in England and secondary school students of French in France). For feasibility reasons, the tests also needed to be relatively quick and easy to conduct.

This paper discusses the challenges with selecting tests that are often developed for specific contexts and levels, and use them in a way that would allow for cross-context comparisons. We will present our decision to use a standardised reading comprehension test aligned with the CEFR levels for English (the Evalang test) in combination with a new vocabulary test for beginner learners of French that we developed specifically for the LANGUAGES project in order to cover the pre-A1 level, building on existing resources and tests (Cobb, n.d.; Meara & Milton, 2003).

In the paper, we present findings from the first year of the project, with data gathered from students (n=1000; aged 13-15) in eight English classes in each country and eight French classes in England and Norway (classes n=40). The contribution outlines key characteristics of each national context and presents a comparison of the estimated proficiency levels achieved in English by students in each country and in French by students in England and Norway. Implications for the refinement of language attainment comparisons across national contexts are discussed, alongside the potential for such tests to inform language teaching policy and practice.

References

Cobb, T. (n.d). Lextutor. Website, lextutor.ca.
European Commission. (2012). The first European survey on language competences. Final report.
Meara, P. & Milton, J. (2003). X_Lex, The Swansea Levels Test. Newbury: Express.

Presenters

Co-authors

View Abstract

From task development to quality assurance: Using an iterative model to develop and improve tasks

[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world 08:30 AM - 04:15 PM (Europe/Amsterdam) 2023/07/18 06:30:00 UTC - 2024/07/18 14:15:00 UTC

Developing comparable tasks and tests across multiple languages poses a challenge for language testing organizations in how they ensure that tasks are developed consistently across languages to ensure equity. However, the literature is limited in addressing the effectiveness of task developer training and the effectiveness of this training as perceived by the quality assurance advisors who review tasks for fidelity to task specifications both within and across specific languages and cultures.
This presentation examines the perceptions of stakeholders at the two ends of the task development process: the task developers themselves and the multi-lingual quality assurance advisors who review and revise such tasks for a multi-language, large-scale assessment administered to over 100,000 learners annually. First, an analysis of a short questionnaire sent to speaking and writing task developers and quality assurance advisors shows the relative importance of task in development, review and revision. Second, short interviews with developers and quality assurance advisors were coded and reviewed. Finally, the results are compared by stakeholder group and across languages to determine how to improve task development.
The presentation will specifically focus on how to improve task development training across languages in a multi-language test development approach.

ArgumentDeveloping a task-based language assessment (TBLTA) instrument begins by developing test specifications that result in tasks that articulate into a cohesive whole. In an assessment situation, such tasks must both individually and together present a solid picture of what the test taker can do with the language. While Long (2016), Winke (2014) and others have focused primarily on classroom teaching and formative assessment as settings for TBLTA, large-scale assessments can also mirror TBLT principles not only in task development but also throughout the process, including task review. This presentation focuses on ways an existing, large-scale, multiple language assessment incorporates principles of TBLTA from item development through quality. The key to such development is ensuring a strong functional approach to test development and rating (Norris, 2016) and focuses on both tasks and the intersection of function, authenticity, reliability and practicality in developing a large-scale, TBLTA.
However, there is often a gap between what different stakeholders in the task development, rating and rating adjudication process attend to based on their differing roles and perspectives. While some of research (for example, Pill & Smart, 2020; Attali, 2016; Kuiken, & Vedder, 2014) has focused on test raters and their processes, rather less research (Rossi, O., & Brunfaut, T., 2018) examines the role of task developers and the quality assurance advisors who review tasks for fidelity to task specifications. Determining what task developers attend to during task development and as they revise these tasks can shed light on the effectiveness of the item development process as well as the training procedures used. Moreover, quality assurance advisors, who review and suggest revisions for tasks, provide insight on how the task developers adhere to task specifications and the effectiveness of training
This current study examines research conducted with both task developers and quality assurance advisors working with a multi-language, large-scale assessment administered to over 100,000 learners annually. First, an analysis of a short questionnaire sent to speaking and writing task developers (N=20), and quality assurance advisors (N=26) shows the relative importance of task in development, review and revision. Next, short interviews with developers, and quality assurance advisors (N=10) were coded and reviewed. Finally, the results are compared by stakeholder group and across languages to determine what can be improved in task development.
The presentation will specifically focus on how to improve task development training across languages in a multi-language test development approach.
Kremmel, B., Eberharter, K., Holzknecht, F., & Konrad, E. (2018). Fostering language assessment literacy through teacher involvement in high-stakes test development. In Teacher involvement in high-stakes language testing (pp. 173-194). Springer, Chap.
Long, M. H. (2016). In defense of tasks and TBLT: Nonissues and real issues. Annual Review of Applied Linguistics, 36, 5-33.
Norris, J. M. (2016). Current uses for task-based language assessment. Annual Review of Applied Linguistics, 36, 230-244.
Rossi, O., & Brunfaut, T. (2018). Test item writers. The TESOL Encyclopedia of English Language Teaching, 1-7
Winke, P. M. (2014). Formative, task-based oral assessments in an advanced Chinese-language class. In Technology-mediated TBLT (pp. 263-294). John Benjamins.

Presenters

Co-authors

View Abstract

New perspectives for examinations alignment on frameworks of reference for languages: corpora approach and new perspectives and challenges opened by automation and Artificial Intelligence

Bachman, L. F. (2000). Modern Language Testing at the Turn of the Century : Assuring That What We Count Counts. Language Testing, 17(1), 1‑42.
Figueras, N., North, B., Takala, S., Verhelst, N., & Van Avermaet, P. (2005). Relating examinations to the Common European Framework : A manual. Language Testing, 22(3), 261‑279. https://doi.org/10.1191/0265532205lt308oa
Folny, V. (2020). Adossement des épreuves d'expression orale et écrite du Test de connaissance du français (TCF) sur les Niveaux de compétences linguistiques canadiens (NCLC) et correspondance avec les niveaux du Cadre européen commun de référence pour les langues (CECRL). Canadian Journal of Applied Linguistics, 23(2), 20‑72. https://doi.org/10.37213/cjal.2020.30437
Jiao, H., & Lissitz, R. W. (Éds.). (2020). Application of artificial intelligence to assessment. Information Age Publishing, inc.
Klebanov, B. B., & Madnani, N. (2022). Automated essay scoring. Morgan & Claypool Publishers.
Yan, D., Rupp, A. A., & Foltz, P. W. (2020). Handbook of automated scoring : Theory in practice. CRC Press Taylor & Francis Group.
Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores : A manual for setting standards of performance on educational and occupational tests. Educational Testing Service.

ArgumentAt the corner of the millennium, the language testing field has seen the publication of diverse frameworks of reference for languages (CEFR, CLB/NCLC, STANAG…), Reference Level Descriptions and Illustrations of the levels of language proficiency. This clear professionalisation was desired by the field itself. The use of these frameworks for languages has facilitated the alignment of the examinations on diverse proficiency levels. In Europe, Standard setting and benchmarking have become a regular activity for professional test developers and are clearly part of their test validation agenda. It is also a way to rationalise the decision made on people.
If key publications have helped to promote good practices and improvement concerning Standard setting procedures, questions are still opened concerning the procedures and the way to reach a significative level of quality: use of one or various standard setting methods, reproducibility of the findings, way to optimize efficiency (number of panellists, number of items / number of productions to be revised, remote work, reliability analysis of panellists ratings…). The last decade has not been an opportunity to see emergent procedures or dramatic improvements in this area.
During the last decade, dramatic innovations emerged elsewhere mainly from the technological ground: big data collection, artificial intelligence procedures (machine learning, deep learning…), language models (GPT3, BERT, Lambda, Big science project...), externalisation of data storage (cloud)… This context is challenging well-established language test providers with their practices. These innovations are interpreted by some as a threat coming from "outside" or by others as a way to renew the field.
France Education International (FEI), a french public agency ("opérateur"), is in charge of a test, the TCF (Test de connaissance du français). This test has a 20-year history, is implemented in 200 countries and taken by 200 000 candidates annually. The TCF is use mainly for studies in French universities, migration purposes in France and Canada. Two years ago, FEI has taken the decision to "modernize" and partially automate the writing rating of this test. This project is part of the modernisation of the French administration and FEI got public founding to support a research agenda.
During this presentation, we will explain how FEI is planning to use new technologies, artificial intelligence procedures to help the writing assessment of the candidates and the raters in their work. We will analyse the pro and limits of such a procedure. As FEI (for the first time for French) produced an annotated corpus for training, assessment, and research), we will explain in which way the work with corpus is opening new avenues to improve the alignment of the TCF with CEFR levels. We will analyse the benefit and emerging challenges to deal with this new facet in our test validity argument.

Presenters

View Abstract

The ILR Skill Level Descriptions for the 21st century: Comparability across tasks, tests, skills, examinees, languages, and organizations

In 2021, the Interagency Language Roundtable revised its Skill Level Descriptions for Proficiency to focus on ability rather than traits (Purpura, 2016) and move toward measuring effective communication, focusing more on task, meaning, and contextual appropriateness. Efforts were made to remove measures that relied on nativism, linguistic essentialism, or membership in certain social groups (such as "well-educated speakers"). The revisions were an iterative process, including level progression within a skill, comparability of a level across skills, and applicability across languages, with a final review from stakeholders. A study was conducted to measure the reliability of the revisions, which included 5 organizations, 4 languages, 40 participants, 120 speaking tests, and over 500 ratings. Results indicated that the raters evaluated the tests reliably and had more confidence and clarity in the rating process. The FBI implemented the revisions, updating the test protocol for better alignment of test tasks to the ILR levels and comparability of tasks among languages. The FBI is focused on eliminating score error coming both from sources of rater bias and linguistic elitism and from the production of standards and testing protocol that are valid and reliable across tasks, tests, skills, examinees, languages, and government organizations.

ArgumentEach US Government organization uses 1985 Interagency Language Roundtable (ILR) Skill Level Descriptions (SLDs) to develop its own language assessments. Post 9/11, the demands for qualified language personnel increase and testing practices shifted from being somewhat compartmentalized to increasingly collaborative. Changes were needed in the ILR SLDs, which were geared toward second language learners rather than heritage and native speaker examinees. Moreover, government testing programs wanted to integrate advances in language acquisition and assessment research to the ILR SLDs and remove outdated, unclear, and missing concepts.

This presentation will detail how a US Government committee took the lessons learned from the norming sessions, discussions, and shared testing resources to update the ILR SLDs for Proficiency between 2014 and 2021. The standards were restructured to focus on ability rather than traits (Purpura, 2016) and move away from primarily measuring correctness toward measuring effective communication, focusing more on task, meaning, and contextual appropriateness. The revisions also provided an opportunity to embrace the diversity among government examinees and reduce bias and marginalization as a "fundamental institutional change" (Rosa & Flores, 2021). Explicit efforts were made to remove measures that relied on nativism, linguistic essentialism, or membership in certain social groups (such as "well-educated speakers"). The revisions were an iterative process, including level progression within a skill, comparability of a level across skills, and applicability across languages, with a final review from stakeholders.

The committee collected data from a pilot study to measure reliability of the revisions, which included 5 organizations, 4 languages, 40 participants, 120 speaking tests, and over 500 ratings. Results indicated that the raters were able to evaluate the tests reliably and had more confidence and clarity in the rating process. Study data and documentation of support for the revisions contributed a validation argument for the ILR SLDs (Knoch & Chappelle, 2018).

The FBI implemented the 2021 ILR SLDs to its Speaking Proficiency Test program, including revisions to the test protocol for better alignment of test tasks to the ILR levels and comparability of tasks among languages. Additionally, the FBI revised protocol for measuring pragmatic skills through situation tasks for consistency across languages and cultures. Considerations for non-American/English cultural expectations were prioritized and made explicit in the language-specific materials that were developed as a result. The FBI is focused on eliminating score error coming both from sources of rater bias and linguistic elitism and from the production of standards and testing protocol that are valid and reliable across tasks, tests, skills, examinees, languages, and government organizations.

Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477-499.

Purpura, J. E. (2016). Assessing meaning. Shohamy et al. (eds.), Language Testing and Assessment, Encyclopedia of Language and Education, 1-26.

Rosa, J., & Flores, N. (2021). Decolonization, language, and race in Applied Linguistics and social justice. Applied Linguistics, 42(6), 1162–1167. https://doi.org/10.1093/applin/amab062

Presenters

View Abstract

Multilingual Standards in Language Assessment? Perspectives from Participatory Engagement

Educational and professional assessment of knowledge and skills have been strongly associated with 'standards'. The use of standards or proficiency benchmarks in language assessment in test-driven systems, broadly speaking, presupposes at least a degree of 'buying-in' on the part of test-takers and test-users (e.g. university admissions tutors). The buying-in is, inter alia, premised on (a) the perceived usefulness of the standards involved, and (b) the standards have universal and stable validity (however defined) within the domain concerned. The growing research in flexible and fluid use of languages in multiethnic-multilingual social interaction suggests that such putative qualities of universality and stability should not be assumed. In this presentation I will discuss relevant questions triggered by the notions of plurilingualism and plurilingual mediation from the CEFR, and also translingual community interpreting. I will attend to the highly contingent, situated and unpredictable nature of participant uptake in multilingual interactional language use. Externally introduced standards are unlikely to be able to account for the complex and diverse ways in which participants initiate and respond to situated multilingual communication. I will conclude with some thoughts on possible ways of handling contingency and fluidity in assessing multilingual interactional communication.

ArgumentEducational and professional assessment of knowledge and skills have been strongly associated with 'standards'. The use of standards or proficiency benchmarks in language assessment in test-driven systems, broadly speaking, presupposes at least a degree of 'buying-in' on the part of test-takers and test-users (e.g. university admissions tutors). The buying-in is premised on (a) the perceived usefulness of the standards involved, and (b) the standards have universal and stable validity (however defined) within the domain concerned. The marketing values of assessing language with reference to some undisputed standards, particularly in relation to the English language (as an additional/second language), have been in no small measure associated with the promotion of Standard English (e.g. Quirk, 1990). The description and projection of Standard English have, however, been largely exemplified by representations of formal lexico-grammatic features at sentence or clause levels in monologic and dialogic texts. There is also eliding of citational forms of high-status pronunciation conventions (e.g. Received Pronunciation, General American Pronunciation or Educated Australian) with Standard English. Furthermore, it is now becoming increasingly clear that the ideal-type English language proficiency, as operationalized in many large scale international English language tests, is referenced to a narrow seam of language used by middle-class speakers in public and professional contexts (Leung, in press). The public face of language proficiency tends not to include the language of conflict or intimacy.

The growing research in flexible and fluid use of languages in multiethnic-multilingual social interaction suggests that such putative qualities of universality and stability should not be assumed. In this presentation I will discuss relevant questions triggered by the notions of plurilingualism and plurilingual mediation from the CEFR, and also from aspects of language use in translingual community interpreting. I will pay particular attention to the highly contingent, situated and unpredictable nature of participant uptake in multilingual interactional language use that are implicated in both plurilingual mediation and translingual community interpretation. The use of speakers' multilingual repertoire flexibly for real-life purposes to facilitate peer-to-peer communication involves more than invocation of language knowledge, it also calls forth an exercise of sensitivity and sensibility in respect of, inter alia, access to message content and language support (for others), and in-group face maintenance. Such online decision making in situ far exceeds issues of language knowledge and skills, and established descriptions of social conventions of language use. The interactional flux may call for communicative 'one-offs'. The Hymesian dictum of factum valet come to mind. Externally introduced language standards are unlikely to be able to account for the complex and diverse ways in which participants initiate and respond to situated multilingual communication. I will conclude with some thoughts on possible ways of handling contingency and fluidity in assessing multilingual interactional communication.

Leung, C. (In press). Language Proficiency: From Description To Prescription And Back? Educational Linguistics.
Quirk, R. (1990). Language varieties and standard language. English Today, 6(1), 3-10.

Presenters

View Abstract

Different languages different standards?

Achieving a common understanding of standards, level descriptors, curriculum objectives, marking schemes or assessment criteria amongst professionals of different languages has long been an issue which has made training necessary prior to the implementation of any new curriculum, programme or examination. There is quite a lot of research available on the impact of training (referred to variously as familiarization, standardization, cloning or norming) on the interpretation of standardized descriptors within the same target language by users from different contexts or with different backgrounds. There is little research, however, on multilingual contexts where teachers and testers of different target languages are asked to use the same standards, descriptors or assessment criteria.
This session addresses this important issue by reporting on two studies which focused on whether teachers and assessors of different languages interpret and show a common understanding and use of the so-called "common" standards. The findings of these studies have clear implications for the application of the CEFR or other frameworks and standards across languages. These implications will be presented and discussed

ArgumentTaking into account the impact of the CEFR (2001) worldwide in language education, and considering how the Manual for Relating Examinations to the CEFR (2009) has also been used widely in linking assessments in multiple languages to the CEFR, it is surprising that few research reports are available on the implementation of these documents in languages other than English (Deygers,B., 2018; Tschirner, E., 2012).
The publication of the CEFR Companion Volume (Council of Europe 2020) has caused quite a stir in the field of language education and prompted renewed interest in the content and applicability of the CEFR thus opening new grounds for further research into the applicability of common standards in different – and/or multilingual - contexts and scenarios. This should be seen as an opportunity, not only to encourage research into the use of a standard like the CEFR in languages other than English but into comparative studies across languages to find out whether professionals in those language interpret the scales, their descriptors, and even the recommendations in a similar manner.
The Council of Europe itself in the Foreword to the CEFR Companion volume highlights that the
"CEFR is intended to promote quality plurilingual education, facilitate greater social mobility and stimulate reflection and exchange between language professionals for curriculum development and teacher education. Furthermore the CEFR provides a metalanguage for discussing the complexity of language proficiency for all citizens in a multilingual and intercultural Europe…" (2020:11)
More than one hundred professionals coming from different countries in Europe and also from the USA and Japan attending the EALTA-UKALTA Symposium hosted by the British Council in London in February 2020 (O'Dwyer, Hunke and Schmidt 2020, Little and Figueras 2022) focused on the potential impact of the CEFR CV on language and on its implications for language education in general. Discussion at this event suggested possible ways to increase transparency and collaboration in aligning different components of language education to the CEFR in different contexts and pointed to the need for continued work in supporting alignment(s) with the greatly expanded descriptive scheme of the CEFR CV in multilingual contexts. Although this will surely evidence differences across languages in terms of values and principles re. education as they may attach different importance to the issues involved in designing or improving curricula and assessments, it will also provide a richer picture for further study in the field of language education.

British Council, UKALTA, EALTA and ALTE (2022). Aligning language education with the CEFR: A handbook. Available at http://www.ealta.eu.org/documents/resources/CEFR%20alignment%20handbook.pdf
Deygers, B. , Van Gorp, K. & Demeester, T. (2018): The B2 Level and the Dream of a Common Standard, Language Assessment Quarterly, DOI: 10.1080/15434303.2017.1421955
Little, D. and Figueras, N. (eds) (2022) Reflecting on the Common European Framework of Reference for Languages and its Companion Volume. Bristol: Multilingual Matters
O'Dwyer, F., Hunke, M., and Schmidt, G. (2020) The EALTA UKALTA 'Roadmap' conference. Available at https://cefrjapan.net/images/PDF/Newsletter/CEFRJournal-vol2.pdf#page=91
Tschirner, E. (ed.) (2012). Aligning frameworks of reference in language testing: The ACTFL Proficiency Guidelines and the Common European Framework of Reference, Tübingen: Stauffenburg

Presenters

View Abstract

Using the Handbook for aligning language education with the CEFR

The Handbook, published in April 2022 by leading players in the field of language teaching and assessment: the British Council, UKALTA,EALTA and ALTE, and is available online. The Handbook provides practical and accessible guidance to assist with the process of CEFR alignment in a variety of language education contexts. It has been prepared for those who are teaching, testing and developing materials in language education, as well as stakeholders concerned with education policy matters and decision making. Feedback on the uses and views on the Handbook is currently being collected so that they can be incorporated in a future revised edition. In this session the editors of the Handbook will report on documented projects and initiatives re. its implementation. They are particularly interested in issues related to the impact of multilingualism on language education, and in how uses of the Handbook in different language contexts worldwide can contribute to a better understanding of the added value(s) and the challenges that a multilingual perspective presents.
We will present an overview of the rationale behind the development of the Handbook and outline its contents and discuss the plans for future editions of the document, calling for participation by researchers in the field.

ArgumentTaking into account the impact of the CEFR (2001) worldwide in language education, and considering how the Manual for Relating Examinations to the CEFR (2009) has also been used widely in linking assessments in multiple languages to the CEFR, it is surprising that very few research reports are available on the implementation of these documents in languages other than English (Deygers,B., 2018; Tschirner, E., 2012).
The publication of the CEFR Companion Volume with new descriptors (Council of Europe 2020) has caused quite a stir in the field of language education and prompted renewed interest in the content and applicability of the CEFR thus opening new grounds for further research into the applicability of common standards in different – and/or multilingual - contexts and scenarios. This should be seen as an opportunity, not only to encourage research into the use of a standard like the CEFR in languages other than English but into comparative studies across languages to find out whether professionals in those language interpret the scales, their descriptors, and even the recommendations in a similar manner.
The Council of Europe itself in the Foreword to the CEFR Companion volume highlights that the
"CEFR is intended to promote quality plurilingual education, facilitate greater social mobility and stimulate reflection and exchange between language professionals for curriculum development and teacher education. Furthermore the CEFR provides a metalanguage for discussing the complexity of language proficiency for all citizens in a multilingual and intercultural Europe…" (2020:11)
More than one hundred professionals coming from different countries in Europe and also from the USA and Japan attending the EALTA-UKALTA Symposium hosted by the British Council in London in February 2020 (O'Dwyer, Hunke and Schmidt 2020, Little and Figueras 2022) focused on the potential impact of the CEFR CV on language and on its implications for language education in general. Discussion at this event suggested possible ways to increase transparency and collaboration in aligning different components of language education to the CEFR in different contexts and pointed to the need for continued work in supporting alignment(s) with the greatly expanded descriptive scheme of the CEFR CV in multilingual contexts. Although this will surely evidence differences across languages in terms of values and principles re. education as they may attach different importance to the issues involved in designing or improving curricula and assessments, it will also provide a richer picture for further study in the field of language education.

British Council, UKALTA, EALTA and ALTE (2022). Aligning language education with the CEFR: A handbook. Available at http://www.ealta.eu.org/documents/resources/CEFR%20alignment%20handbook.pdf
Deygers, B. , Van Gorp, K. & Demeester, T. (2018): The B2 Level and the Dream of a Common Standard, Language Assessment Quarterly, DOI: 10.1080/15434303.2017.1421955
Little, D. and Figueras, N. (eds) (2022) Reflecting on the Common European Framework of Reference for Languages and its Companion Volume. Bristol: Multilingual Matters
O'Dwyer, F., Hunke, M., and Schmidt, G. (2020) The EALTA UKALTA 'Roadmap' conference. Available at https://cefrjapan.net/images/PDF/Newsletter/CEFRJournal-vol2.pdf#page=91
Tschirner, E. (ed.) (2012). Aligning frameworks of reference in language testing: The ACTFL Proficiency Guidelines and the Common European Framework of Reference, Tübingen: Stauffenburg

Presenters

Co-authors

View Abstract

Rating across multiple languages: Perceptions of training and operational rating

This presentation examines the results of both surveys and interviews of raters of a multi-language, large-scale assessment. By examining rater perceptions both widely, via a questionnaire and narrowly, via individual interviews, the study explores the questions: do raters using the same scale have similar or different experiences with rater training and rating across languages? How can such information support training, operational testing and even changes made to rating scales?
Over 60 raters using the same scale applied to constructed responses (speaking and writing) in 11 languages completed a survey about their perceptions of rater training and operational rating, including the aspects of rating to which they attended (Pill & Smart) and how the training prepared them for operational rating. A subset of respondents also participated in an interview to more deeply investigate how these issues intersect and can inform improvements. The study examines similarities and differences among languages.

ArgumentA great deal of research (for example, Pill & Smart, 2020; Attali, 2016; Kuiken, & Vedder, 2014) examines language test rater behavior and consistency, while complementary research (for example, Davis, 2016; Sato, 2014) focuses on rater perceptions of both their training activities, their operational rating processes, and how the two interrelate. However, most of this research focuses on raters working in a single language, often English. Far less research investigates rater perceptions of rater training and operational rating processes, including both challenges and successes, for raters working with a shared scale and a similar test across languages. Borger (2016) and Harsch & Malone (2020) also suggest that raters and other stakeholders can provide critical information not only about how a scale or scales are applied but also the difficulties in applying scales. Thus, this study asks: do raters using the same scale have similar or different experiences with rater training and rating across languages? How can such information support training, operational testing and even changes made to rating scales?
This current study examines research conducted with raters of constructed response tasks from a multi-language, large-scale assessment administered to over 100,000 learners annually in 11 languages (Chinese (Mandarin), English, French, German, Italian, Japanese, Korean, Portuguese, Russian and Spanish). The study first analyzes the results of a short questionnaire sent to speaking and writing raters (N=65) to examine their perceptions of rater training as well as what they attend to in operational rating, including task level and task function. The study also analyzes the outcomes of short interviews with a sub-set of raters (N=25) to shed light on the challenges of rating both generally according to the scale and specifically applying the training and the scale to different languages.
The presentation will specifically focus on how rater perceptions can inform and improve rater training approaches, exercises and activities. It will also show connections between rater training and operational rating. Additionally, the presentation will identify ways that rater perceptions and recommendations across languages can help rating approaches globally and within each language.
Beyond applications to rating and rater training, the study and its results will provide an opportunity to reflect on how raters interpret the scale and how it can be improved for clarity and accessibility across languages.

Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115.
Borger, L. (2019). Assessing interactional skills in a paired speaking test: Raters' interpretation of the construct. Apples-Journal of Applied Language Studies, 13(1), 151-174.
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.
Harsch, C., & Malone, M. E. (2020). Language proficiency frameworks and scales. In The Routledge handbook of second language acquisition and language testing (pp. 33-44). Routledge.
Kuiken, F., & Vedder, I. (2014). Raters' decisions, rating procedures and rating scales. Language testing, 31(3), 279-284.
Pill, J., & Smart, C. (2020). Raters: Behavior and training. In The Routledge Handbook of Second Language Acquisition and Language Testing (pp. 135-144). Routledge.
Sato, M. (2014). Exploring the construct of interactional oral fluency: Second language acquisition and language testing approaches. Syst

Presenters

View Abstract

“Please use CEFR levels (A1-C2) to describe your language skills” - Insights into the role of language certificates and assessment standards in recruitment processes

ArgumentEvaluation standards for language proficiency are used in different settings, ranging from educational up to professional contexts. In getting access to universities for example, it is quite common in the European context to refer to the levels of the CEFR (Deygers et al. 2018; Harsch 2018). In entering the labour market after having finished education, however, these levels play only a marginal role. Employers do not really draw on them in recruitment processes (Cernicova-Buca 2020). The little use of these formal classifications in these gatekeeping encounters can be explained by the effort needed for HR managers to understand and effectively apply those standards (Beadle et al. 2015). It is not clear though which standards and forms of assessment really count for recruiters as gatekeepers in their decision-making processes. This is particularly true when it comes to plurilingual assessment and assessing linguistic competence of plurilingual individuals, an aspect that has been put forward though in the CEFR Companion Volume (Council of Europe 2020).
The study to be presented within this symposium addresses the following questions:
Do recruiters in Austria and France refer to official assessment standards like the CEFR (or other frameworks) when they need to evaluate linguistic skills in recruitment processes? Do they take certification or test results following these standards into consideration when making their decisions? Do they find frameworks like the CEFR helpful in their decision-making processes?
The study draws on survey data asking recruiters in France and Austria about their views of the CEFR and its role in language assessment for professional purposes. It is part of a PhD-project dealing with attitudes, representations, and conceptualizations of recruiters towards and concerning linguistic competence, or even more precisely, plurilingual competence. Therefore, the presentation also aims to shed light on the potential and the challenges of using evaluation standards for assessing language proficiency in professional settings in Europe.

Beadle, Shane, Patricia Vale, Martin Humburg, and Richard Smith. 2015. Study on foreign language proficiency and employability. Final report.
Cernicova-Buca, Mariana. 2020. 'Communication and Linguistic Competences for Middle Management'. Buletinul Stiintific al Universitatii Politehnica Din Timisoara, Seria Limbi Moderne 19, 5–14.
Council of Europe, ed. 2020. Common European Framework of Reference for Languages: Learning, Teaching, Assessment ; Companion Volume. Strasbourg: Council of Europe Publishing.
Deygers, Bart, Beate Zeidler, Dina Vilcu, and Cecilie Hamnes Carlsen. 2018. 'One Framework to Unite Them All? Use of the CEFR in European University Entrance Policies'. Language Assessment Quarterly 15 (1): 3–15.
Harsch, Claudia. 2018. 'How Suitable Is the CEFR for Setting University Entrance Standards?' Language Assessment Quarterly 15 (1): 102–8.

Presenters

204 hits

People
Chat

Session Participants

User Online

Session speakers, moderators & attendees

He/Him Dunlea Jamie

Manager & Senior Researcher Assessment Research Group

British Council

He/Him Richard Spiby

Test Development Researcher

British Council

Thomas François

She/Her Eva Thue Vold

Professor

University of Oslo

She/Her Laura Molway

Departmental Lecturer in Modern Languages Education

University of Oxford

+ 10 more speakers. View All

He/Him Vincent Folny

Innovation and prospective manager

France Education International

She/Her Margaret Malone

Director of Assessment and Research

ACTFL

He/Him Dunlea Jamie

Manager & Senior Researcher Assessment Research Group

British Council

Attendees public profile is disabled.

66 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

[SYMP40] Language futures: tensions and synergies in the use of standards in language assessment in a multi/plurilingual world

To ensure smooth communication and collaboration, here are some troubleshooting tips to address common issues:

Session Information

Sub Sessions

An innovative approach to setting standards and testing claims of CEFR alignment across multiples languages

Pro and cons of standardization of CEFR texts using NLP : the case of automatic text readability assessment

Comparing students’ proficiency levels across languages and contexts

From task development to quality assurance: Using an iterative model to develop and improve tasks

New perspectives for examinations alignment on frameworks of reference for languages: corpora approach and new perspectives and challenges opened by automation and Artificial Intelligence

The ILR Skill Level Descriptions for the 21st century: Comparability across tasks, tests, skills, examinees, languages, and organizations

Multilingual Standards in Language Assessment? Perspectives from Participatory Engagement

Different languages different standards?

Using the Handbook for aligning language education with the CEFR

Rating across multiple languages: Perceptions of training and operational rating

“Please use CEFR levels (A1-C2) to describe your language skills” - Insights into the role of language certificates and assessment standards in recruitment processes

Session Participants

Session Chat

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.