GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics. It is a user-friendly search engine for the exploitation of treebanks.
The first version of GrETEL was a result of the Nederbooms project, a CLARIN project which aimed at the development of user-friendly tools for the exploitation of treebanks by linguists who are not familiar with language technology.
A treebank is a digital text corpus in which each sentence is enriched with syntactic annotations. Those sentences can be represented as a syntax tree. Treebanks are interesting resources for linguists who want to empirically investigate syntactic constructions.
For more information about the treebanks included in GrETEL, click here.
The construction of treebanks has created exciting opportunities for the empirical investigation of syntax. While treebanks have the potential to be an added value for descriptive and theoretical linguistics, the exploitation of such treebanks is usually only possible if the user has in-depth knowledge of the annotation guidelines and if the user masters a formal query language. Some users are not deterred by this, but many are, so that the potential of the treebanks will not be realised. GrETEL tries to overcome this problem. Instead of a formal search instruction, GrETEL accepts a natural language example as input to the system, allowing users to search for similar constructions as the example they provide.
October, 2010 - February, 2012
The construction of treebanks for spoken and written Dutch (CGN, LASSY) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. At the moment the exploitation of those treebanks requires knowledge of specific data structures and query languages. The purpose of this project is to develop user-friendly and well-documented tools for the exploitation of treebanks by linguists who are not familiar with language technology. The Nederbooms project is in line with the main CLARIN goal of applying the results of speech and language technology to research in the humanities.
The Nederbooms project was carried out in the framework of CLARIN Flanders, funded by the Flemish Government, Department of Economy, Science and Innovation.
June, 2013 - May, 2014
The aim of this project is the adaption of GrETEL for querying very large treebanks, such as the SoNaR reference corpus for written Dutch (ca. 500M tokens, 41M sentences). The main challenge is to make this huge treebank searchable as fast as the small treebanks that are currently included in GrETEL.
GrETEL 2.0 was funded by the Dutch Language Union (Nederlandse Taalunie).
June, 2013 - May, 2014
While a wide range of resources and technologies exist for Dutch, enabling the implementation of sophisticated language technologies and advanced linguistic research, Afrikaans is still considered a resource-scarce language. Due to the under-resourced status of Afrikaans and the fact that it is closely-related to Dutch, it often turns out to be faster and cheaper to use and adapt existing Dutch tools than to develop tools for Afrikaans from scratch. The aim of this project is the creation of an Afrikaans treebank using a Dutch parser. The treebank will be included in GrETEL in order to make it searchable online.
The AfriBooms project was funded by the Dutch Language Union (Nederlandse Taalunie) and the Department of Arts and Culture of the Government of South Africa.
October, 2015 - September, 2016
In the context of the SCATE project, we have developed Poly-GrETEL, an online tool which enables syntactic querying in parallel treebanks and which is based on the monolingual GrETEL environment. We provide online access to the Europarl parallel treebank for Dutch and English, allowing users to query the treebank using either an XPath expression or an example sentence in order to look for similar constructions. We provide automatic alignments between the nodes. By combining example-based query functionality with node alignments, we limit the need for users to be familiar with the query language and the structure of the trees in the source and target language, thus facilitating the use of parallel corpora for comparative linguistics and translation studies.
Poly-GrETEL was created in the context of SCATE, an IWT SBO project.