-
Digitalisering og opmærkning af trusselsbreve til projektet 'Truslers sprog og genre', der bygger på en innovativ kombination af sprogvidenskab og genrestudier med det formål at...
- XML
-
DK-CLARIN Reference Corpus of General Danish has been collected as part of DK-CLARIN project, WP2.1, 2008 - 2011. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with...
- XML
-
Binary wordlists for the CST lemmatizer as suplement to the rules of the lemmatizer. Works with both tagged and untagged input. Use: cstlemma -d NAME-OF-WORDLIST
- HTML
-
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN...
- XML
-
The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...
- TXT
- TMX
-
DUDS Jens Bille’s Ballad Book belongs to a corpus of the oldest Danish ballad tradition. The corpus consists of 9 ballad books handed down from Renaissance ballad collectors...
- XML
-
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...
- XML
-
The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...
- XML