This presentation details a project to create the first digital archive documenting the linguistic diversity and minority endangered languages of Hong Kong. It focuses on different languages as well as language varieties that are part of Hong Kong's heritage. The aim of the project is to create an easily accessible database of language data that can be used both by researchers as well as by teachers to explore a range of linguistic issues including linguistic diversity, language endangerment, minority languages, language use, language discrimination, and language awareness.
The project first aimed to identify and document the linguistic diversity of Hong Kong (e.g., English, Cantonese, Putonghua, Hakka, Waitau, Teochew, Hokkien, Tanka, Tagalog, Urdu, Hindi, Nepali, Tagalog, Thai, French, German, Spanish, Korean, Japanese, Dutch etc.). This was done through extensive historical document analysis. A particular focus was placed on Hong Kong's minority indigenous languages that are now on the verge of extinction – this includes the Chinese languages of Hakka, Waitau, and Tanka. In addition, due to migration to Hong Kong from mainland China after the civil war in China, Hong Kong has also been home to a number of minority languages of China that are also becoming endangered, including Shanghainese, Suzhounese, and Wenzhounese. Finally, the project also focused on collecting a significant corpus of data on Hong Kong Sign Language, a unique sign language used in Hong Kong that is also endangered, with fewer than 8000 users today.
Using the data collection task of the storytelling of folktales, traditions, and customs, more than 150 language samples in audio format were collected from speakers of over 70 languages and varieties. Each sample was transcribed into English, traditional, and written Chinese, as well as in Hong Kong Sign Language (video format) for some samples. A brief introduction to the language/variety, including its historical and present day usage in Hong Kong, demographics of usage of the language globally, and an overview of features of the language, were written and are presented along with the audio samples and the translations.
This presentation will focus on the development of the linguistic archive, by detailing the background research involved, selection of languages and speakers, and collection of data. The actual construction of the website will also be discussed. This will be of interest to researchers who are exploring the use of digital technologies to develop corpora of minority and/or endangered languages.