The Historical Danish handwriting dataset is a Danish-language dataset containing more than 11.000 pages of transcribed and proofread handwritten text.
The dataset currently consists of the published minutes from a number of City and Parish Council meetings, all dated between 1841 and 1939.
All the text is in Danish. The BCP-47 code for Danish is da.
Each data instance represents a single scanned, segmented and transcribed physical page with handwritten text.
The dataset originates from a need to make physical public protocols discoverable via fulltext search and filtering.
The original physical minutes where collected and curated as part of the legal mandate of the Public City Archives in Denmark.
The digitized and transcribed minutes were originally written by the secretaries of the individual City and Parish Councils.
The layout segmentation and transcription of the digitized minutes is primarily done by volenteers and employees at the participating Danish city archives.
All layout segmentation and transcription is done using the Transkribus platform, either through the desktop client or via the web-based interface.
No efforts were made to anonymize the data.
The dataset might contain data that can be considered sensitive (e.g., data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history).