The corpus application is developed by the INT. The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (http://inl.github.io/BlackLab/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/INL/corpus-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).
Approximately 40,000 Dutch letters from the second half of the 17th to the early 19th centuries have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes.
The research programme Brieven als Buit/Letters as loot. Towards a non-standard view on the history of Dutch has explored this extraordinary source of Dutch letters from the past (see www.brievenalsbuit.nl). This programme, initiated and directed by prof. dr. Marijke van der Wal (Leiden University) and funded by the Netherlands Organisation for Scientific Research (NWO), successfully ran from 1 September 2008 till 1 September 2013.
The historical linguistic research of the Letters as Loot programme was based on the original Brieven als Buit corpus. For the research results we refer to the following monograph and PhD dissertations which are all available in open access:
Brieven als Buit-2 (BAB2) is a spin-off of the Brieven als Buit/ Letters as Loot research programme. Just as the earlier Brieven als Buit internet application (BAB), the present BAB2 comprises Dutch letters which were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England from the second half of the 17th to the early 19th centuries.
The original Brieven als Buit corpus - comprising 1033 letters - was launched as an internet application on 5 September 2013 and is one of the programme's results. This application - like the second, updated version that came online on 18 June 2015 - was developed in close cooperation with the Institute for Dutch Lexicology (INL). A third, revised version was - in cooperation with the Dutch Language Institute (INT) - the successor of INL -, published on 29 January 2021.
As a separate extension, BAB2 differs from the original BAB corpus in various aspects.
Firstly, the original BAB corpus is a balanced corpus, which was compiled for and based on the Brieven als Buit/Letters as Loot research at the University of Leiden. This corpus, to which metadata from the research programme's database were added, was lemmatised, grammatically tagged and provided with elaborate search facilities by the INT. BAB2 is not a balanced corpus, but a collection of additional letters from the Prize Papers (National Archives, Kew, UK). This collection was not lemmatized or grammatically tagged, and, therefore, fewer search facilities are available. Metadata for the additional letters are provided fully in the case of the gender variable, but are more limited for social class and age. Research of the autograph status was conducted in 83% of the additional letters.
Secondly, BAB2 presents a collection of 1386 both private and business letters. As the Brieven als Buit/Letters as Loot research was focused primarily on private letters, the original BAB internet application comprises only 10% business letters, whereas BAB2 comprises 26% business letters or letters of a mixed private-business character.
Users of the original BAB corpus will find additional letters of particular letter writers in BAB2. These letters were not added to the original corpus in order to avoid imbalance by particularly prolific writers. BAB2 allows users to compile various subcorpora, for instance, of female letter writers, letters of family members and relatives, letters sent from a particular region or letters dating from a particular time period.
The main characteristics and differences of BAB and BAB2 are listed in the table below:
|BAB (original)||BAB2 (extension)|
|Balanced corpus||Collection of letters|
|Number of letters: 1033||Number of letters: 1386|
|Periods: 1661-1673; 1777-1783||Periods: 1661-1673; 1751-1758; 1773-1783|
|Offers transcriptions, photos of the original letters and metadata||Offers transcriptions, photos of the original letters and metadata|
|Grammatically tagged and lemmatized transcriptions. This allows both word and lemma searches||No grammatically tagged and lemmatized transcriptions. This allows only word searches|
|Private letters (90%), business letters (10%)||Private letters (74%), private-business or business letters (26%)|
|Gender fully examined||Gender fully examined|
|Social class and age fully examined||Social class and age partly examined|
|Autograph status fully examined||Autograph status 83% examined|
Contrary to the original Brieven als Buit corpus, BAB2 has not yet been annotated with part of speech and lemma, as stated above. To make the corpus more accessible, suggestions for query expansion are given, using the INT lexicon service with the historical computational lexicon GiGaNT-HILEX.
The current version of GiGaNT-HILEX in the lexicon service contains the lexicon modules based on the Dictionary of the Dutch Language and the Dictionary of Middle Dutch.
If you want to make use of this service, please contact Katrien Depuydt (firstname.lastname@example.org).
Marijke van der Wal initiated and conducted BAB2 after the completion of the Brieven als Buit research programma (2008-2013). She did the final corrections of the BAB2 letters, finalized the BAB2 files submitted to DANS in December 2017 and adapted the BAB2 files and metadata that were submitted to the INT in July 2019. Preliminary transcriptions of the BAB2 letters were made by the student-assistants Brenda Assendelft and Marlies Reitsma, and volunteers of the Leiden-based Wikiscripta Neerlandica transcription project. In 2015, Wikiscripta Neerlandica volunteer DickJan Braggaar assisted in a part of the correction stage. Tanja Simons checked and added part of the metadata during the last quarter of 2014.
When referring to the present BAB2-website or using the data, please use the following reference:
Brieven als Buit-2/Letters as Loot-2 (26 February 2021), compiled by Marijke van der Wal, Leiden University (for full details see About BAB2 http://hdl.handle.net/10032/tm-a2-s4). Available at the Dutch Language Institute.
Software available at https://github.com/INL/BlackLab
Does, Jesse de, Jan Niestadt en Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi
For the corpus frontend:
Software available at: https://github.com/INL/corpus-frontend