Data-driven learning (DDL) popularly involves the explicit use of corpus data, whether hands-on or via prepared materials, for learners and teachers of a foreign or second language. Since the basic concepts were first introduced in the 1980s, hundreds of academic publications have appeared. Drawing and expanding on the results of our recent study (Authors, 2021), we present a scoping review of DDL research up to and including 2021, with the focus on the 156 research articles (RAs) in journals listed by the latest Clarivate Web of Science ranking for Linguistics (192 journals in 2020) and answer the following question: What research trends have been most prominent in DDL research and what new trends have been emerging?
Methodologically, we treated our RA collection as a corpus in its own right to analyze using corpus tools and methods (Authors, 2021; Pérez-Paredes, 2022). The corpus gives a total of just over 1 million words from full authors' texts of 156 RAs. We first coded the research characteristics of each article by main theme (Theory and Methodology; Learning Contexts, Implementation, and Technology) and subthemes (e.g., Learning Contexts: L1 and L2, region, proficiency, institution, discipline). We then divided the RA publication timeline into three periods: early (1997- 2004), middle (2005-2013), and late (2014-2021), with the start of each period corresponding to a visible increase in the number of publications (Fig. 1). Finally, we conducted an analysis of the usage patterns of keywords and key clusters related to each subtheme using corpus analysis tools (Keyword, Collocation, Cluster, Dispersion) in AntConc 4.0 (Anthony, 2021).
The overall picture displays a wide variety of theories, methods, and settings employed in DDL research, with some characteristics remaining remarkably stable and others showing declining or rising trends. For example, we found an initial increase of the number of RAs with explicit theoretical grounding; however, the proportion of such articles remained stable from 2005 with only about two thirds of the articles mentioning theories. In terms of setting and methodology, there are some encouraging recent trends such as DDL reaching out to new learner populations and learning environments (e.g., Asian and Middle Eastern regions, variety of target languages, younger learners, lower proficiency levels). On the other hand, there is remarkably little change in other methodological aspects, with most studies targeting university students, English for general purposes classes, relatively small groups, short DDL interventions using concordancers, and lexico-grammar as the learning target. We conclude by inviting DDL researchers to diversify their research methodology, design multi-institutional studies, integrate contemporary multifactorial data analysis methods, improve the rigor of the methodological reporting, and, while doing the above, to open DDL up to Applied Linguistics theories and research methods, which would undoubtedly bring both fields forward.
Figure 1. RAs by date

References
Anthony, L. (2021). AntConc, v4.0. Tokyo: Waseda University.
Pérez-Paredes, P. (2022). A systematic review of the uses and spread of corpora and data- driven learning in CALL research during 2011-2015. Computer Assisted Language Learning, 35(1-2), 36-61.