-
Fineweb-2
This is the second iteration of the popular FineWeb dataset, bringing high quality pretraining data to over 1000 languages. The FineWeb2 dataset is fully reproducible, available... -
Fineweb-c
FineWeb-C: Educational content in many languages, labelled by the community This is a link to the Danish part of the dataset. This is a collaborative, community-driven project...
Du kan også tilgå dette register med API (se API-dokumenter).