Brieven als Buit-2 (‘Letters as Loot-2’)

About the Corpus application

The corpus application is developed by the Dutch Language Institute (Instituut voor de Nederlandse Taal or INT). The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (https://blacklab.ivdnt.org/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/instituutnederlandsetaal/blacklab-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).

About the Brieven als Buit project

Approximately 40,000 Dutch letters from the second half of the 17^th to the early 19^th centuries have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes.

The research programme Brieven als Buit/Letters as loot. Towards a non-standard view on the history of Dutch has explored this extraordinary source of Dutch letters from the past (see www.brievenalsbuit.nl). This programme, initiated and directed by prof. dr. Marijke van der Wal (Leiden University) and funded by the Netherlands Organisation for Scientific Research (NWO), successfully ran from 1 September 2008 till 1 September 2013.

The historical linguistic research of the Letters as Loot programme was based on the original Brieven als Buit corpus. For the research results we refer to the following monograph and PhD dissertations which are all available in open access:

Gijsbert Rutten & Marijke van der Wal, Letters as Loot. A sociolinguistic approach to seventeenth- and eighteenth-century Dutch, Amsterdam & Philadelphia: John Benjamins, 2014 (Letters as Loot)
Judith Nobels, (Extra)Ordinary letters: A view from below on seventeenth-century Dutch. Utrecht: LOT, 2013 ((Extra)Ordinary letters)
Tanja Simons, Ongekend 18^e-eeuws Nederlands: Taalvariatie in persoonlijke brieven. Utrecht: LOT, 2013 (Ongekend 18e-eeuws Nederlands: Taalvariatie in persoonlijke brieven)

About the Brieven als Buit-2 corpus

Brieven als Buit-2 (BAB2) is a spin-off of the Brieven als Buit/ Letters as Loot research programme. Just as the earlier Brieven als Buit internet application (BAB), the present BAB2 comprises Dutch letters which were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England from the second half of the 17th to the early 19th centuries.

Differences between BAB2 and the original BAB corpus

The original Brieven als Buit corpus - comprising 1033 letters - was launched as an internet application on 5 September 2013 and is one of the programme's results. This application - like the second, updated version that came online on 18 June 2015 - was developed in close cooperation with the Institute for Dutch Lexicology (INL). A third, revised version was - in cooperation with the Dutch Language Institute (INT) - the successor of INL -, published on 29 January 2021.

As a separate addition, BAB2 differs from the original BAB corpus in various aspects.

Firstly, the original BAB corpus is a balanced corpus, which was compiled for and based on the Brieven als Buit/Letters as Loot research at the University of Leiden. This corpus, to which metadata from the research programme's database were added, was lemmatised, grammatically tagged and provided with elaborate search facilities by the INT. BAB2 is not a balanced corpus, but a collection of additional letters from the Prize Papers (National Archives, Kew, UK). This collection was not lemmatized or grammatically tagged, and, therefore, fewer search facilities are available. Metadata for the additional letters are provided fully in the case of the gender variable, but are more limited for social class and age. Research of the autograph status was conducted in 83% of the additional letters.

Secondly, BAB2 presents a collection of 1386 both private and business letters. As the Brieven als Buit/Letters as Loot research was focused primarily on private letters, the original BAB internet application comprises only 10% business letters, whereas BAB2 comprises 26% business letters or letters of a mixed private-business character.

Users of the original BAB corpus will find additional letters of particular letter writers in BAB2. These letters were not added to the original corpus in order to avoid imbalance by particularly prolific writers. BAB2 allows users to compile various subcorpora, for instance, of female letter writers, letters of family members and relatives, letters sent from a particular region or letters dating from a particular time period.

The main characteristics and differences of BAB and BAB2 are listed in the table below:

BAB (original)	BAB2 (addition)
Balanced corpus	Collection of letters
Number of letters: 1033	Number of letters: 1386
Periods: 1661-1673; 1777-1783	Periods: 1661-1673; 1751-1758; 1773-1783
Offers transcriptions, photos of the original letters and metadata	Offers transcriptions, photos of the original letters and metadata
Grammatically tagged and lemmatized transcriptions. This allows both word and lemma searches	No grammatically tagged and lemmatized transcriptions. This allows only word searches
Private letters (90%), business letters (10%)	Private letters (74%), private-business or business letters (26%)
Gender fully examined	Gender fully examined
Social class and age fully examined	Social class and age partly examined
Autograph status fully examined	Autograph status 83% examined

GiGaNT Lexicon service

Contrary to the original Brieven als Buit corpus, BAB2 has not yet been annotated with Part of Speech and lemma, as stated above. To make the corpus more accessible, suggestions for query expansion are given, using the INT lexicon service with the historical computational lexicon GiGaNT-HILEX.

The current version of GiGaNT-HILEX in the lexicon service contains the lexicon modules based on the Dictionary of the Dutch Language (Woordenboek der Nederlandsche Taal, WNT) and the Dictionary of Middle Dutch (Middelnederlandsch Woordenboek, MNW).

If you want to make use of this service, please contact Katrien Depuydt (katrien.depuydt@ivdnt.org).

Credits

Marijke van der Wal initiated and conducted BAB2 after the completion of the Brieven als Buit research programma (2008-2013). She did the final corrections of the BAB2 letters, finalized the BAB2 files submitted to DANS in December 2017 and adapted the BAB2 files and metadata that were submitted to the INT in July 2019. Preliminary transcriptions of the BAB2 letters were made by the student-assistants Brenda Assendelft and Marlies Reitsma, and volunteers of the Leiden-based Wikiscripta Neerlandica transcription project. In 2015, Wikiscripta Neerlandica volunteer DickJan Braggaar assisted in a part of the correction stage. Tanja Simons checked and added part of the metadata during the last quarter of 2014.

When referring to Brieven als Buit-2, please use the following reference:

Letters as Loot-2 / Brieven als Buit-2 (Version 1.0) (26 February 2021) [Online Service], compiled by Marijke van der Wal, Leiden University. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-s6

For BlackLab:

Software available at https://github.com/instituutnederlandsetaal/BlackLab

Does, Jesse de, Jan Niestadt & Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi

For the corpus frontend:

Software available at: https://github.com/instituutnederlandsetaal/blacklab-frontend

Logo provenance:

Design Martien Frijns