It is important to understand variations in research topics in a discipline across time periods and geographical regions because it gives us a birds-eye view of progress in the field and its present state. In recent quantitative studies on research trends in SLA (e.g., Zhang, 2020), however, the actual text of research articles, which arguably best reflects the topic of the article, has rarely been analysed.
I therefore examined research trends in SLA and their variability between regions through a topic model, a machine learning technique that automatically identifies 'topics' in a corpus. A topic in topic models is characterised by a group of co-occurring words. For instance, a research paper including a high frequency of the word aspect may also include the high frequency of such words as progressive, perfect, and tense. A topic model identifies such groups in a corpus and quantifies the proportion of each topic in each text.
I compiled a corpus including all the full-length research articles published in Language Learning, Studies in Second Language Acquisition, and Second Language Research between 1970 and 2020. The metadata of the corpus included the journal each article was published in, its publication year, and the country/region where the first author's institution is located. The countries/regions were then collapsed into seven continents (e.g., Europe, North America). Seventy topics were identified in the corpus through a structural topic model (Roberts et al., 2016), and an interpretive label was given to each of them (e.g., 'tense and aspect').
In this talk, I will highlight regional (i.e., inter-continental) differences and their interaction with publication years. Findings based on the analysis of topic proportions include the following:
- In North America, topics such as 'attention, awareness, and noticing in L2 acquisition' and 'L2 consonants' are more prominent than in the other regions;
- In Europe, those like 'word-internal and L1-related factors influencing vocabulary learning' and 'L2 German' are more prominent than in the other places; and
- In Asia, topics such as 'comparing and contrasting learner groups, target varieties, and within-learner languages' and 'L1/L2 Japanese and Korean' are more popular than in the other regions.
In some topics, regions interact with publication years. For instance, 'statistical modeling' and 'gestures' have increased popularity over the years in Europe and North America, while their proportions have remained relatively constant in Asia. On the other hand, 'interactionist approach' has gained popularity in North America until mid-2000s, while its popularity has remained constant in Europe and Asia.
Whereas some findings straightforwardly make sense (e.g., 'L2 German' being popular in Europe), the topic model also allowed us to identify patterns that are not necessarily intuitive for many of us in the field (e.g., interactions between regions and the chronological change).
References
Roberts, M. E., Stewart, B. M., Airoldi, E. (2016). A model of text for experimentation in the social sciences. Journal of the American Statistical Association, 111(515), 988–1003.
Zhang, X. (2020). A bibliometric analysis of second language acquisition between 1997 and 2018. Studies in Second Language Acquisition, 42(1), 199-222