The Common European Framework of Reference has become a key resource in language education and assessment not only within Europe but internationally. A key goal of the Common European Framework of Reference was to provide "a common basis for the elaboration of language syllabuses, curriculum guidelines, examinations, textbooks, etc. across Europe" (Council of Europe, 2001). Since its launch, a great deal of work has been put into the methodology for aligning an exam to the CEFR and some work into ensuring the comparability of claims of CEFR alignment of different foreign language exams targeting the same language. However, little research has been produced to facilitate the comparability of exams claiming alignment to the CEFR across different languages that target the same CEFR levels. This presentation presents evidence from an innovative pilot procedure to link standard setting results for tasks across English, French, Spanish, and German to the CEFR, and to compare the results to the original claims of alignment with the CEFR from the test developers. Exemplar tasks of reading comprehension that were collected by the Council of Europe and made freely available to facilitate best practice in linking exams to the CEFR were used. Participants in this experimental study first carried out standard setting using a Modified Angoff method with reading tasks in English. They then split into four groups to carry out standard setting using the same method on reading tasks in one of the other three languages mentioned above. The English tasks thus acted as an anchor set, linking all judges. The standard-setting judgement data were then pooled and analysed in a concurrent data matrix using a multi-facet Rasch model (MFRM) analysis which allowed the results to be placed onto a common scale. The common scale approach allows for a comparison of difficulty using the Rasch logit scale. Thus test tasks for different languages that are posited to be at a B2 level of difficulty, for example, can be compared in terms of their difficulty estimation on a common scale by the four standard-setting panels (English, French, Spanish and German). The results show that the original CEFR levels posited by the different test developers generally held in practice. It is important to note that this was an experimental procedure designed to investigate the potential of this innovative methodology, and it not intended as definitive evidence for any of the test developer's claims. It is presented as a potentially useful method for language education programs that need to ensure that assessments they produce for targeting the same CEFR levels in different languages can be supported by standard-setting evidence. The methodology is feasible and sustainable for such programs with access to language education experts who can participate in standard-setting panels for more than one language.