Students' hands-on experience in creating web-derived language databases

Yoko Hirata and Yoshihiro Hirata
Hokkai-Gakuen University
Sapporo, Japan


As the World Wide Web becomes an enormous language database at students' fingertips, the potential benefits and advantages of web-extracted data for linguistic research and language learning have been the subject of recent discussion. For example, a growing number of web-based linguistic databases, such as WebCorp (Kehoe and Renouf 2002) and KWiCFinder (Fletcher 2007), are readily available free of charge for linguistic research; and, in addition, a number of studies have indicated the successful application of a commercial web search engine for locating pages relevant to the target language study and retrieving collections of written texts as authentic language data.

For Japanese students, who have limited opportunities for exposure to authentic English information in everyday contexts, the web data could be a valuable, easily available source. However, there is still a general lack of research on how the web data can best be presented to students at different levels of proficiency. Also, there has been little research to date on how students, many of whom lack previous online learning experience, can be trained to discover what works and does not for their own language learning using the Web as a source of language information.

The purpose of this study is to investigate how Japanese students at lower intermediate levels of English proficiency perceive the effectiveness of setting up and exploring web-derived language databases, and whether they gain a good understanding of the collocational and recurrent lexical patterns of target words. The paper first outlines the system structure of a user-friendly analytical computer program called Lex for novice Internet users to retrieve lexical and syntactic patterns from electronic texts. It then illustrates an inductive 'data-driven learning' project which was implemented in a half-term English reading course in a computer-assisted language learning classroom. This project focused on encouraging individual students to build their own databases of web documents with the use of commercial web search engines for their own individual linguistic investigations. The students critically evaluated how words behave through tangible examples of actual usage with limited assistance from the instructor. The findings suggest that students' hands-on experience of compilation and consultation of web-derived language databases can enhance their awareness of how to take advantage of web data to facilitate their language learning by themselves. In the presentation, the benefits and problems students experienced during this project will also be discussed, based on statistical data collected from students' questionnaires as well as their open-ended responses.