Skip to main content
Info meny
Aktuellt
FAQ
About us
Contact us
Sök
Plattformar
Data
Analyses
Research
Staff
Menu
Breadcrumb
Home
Language resources
Language resources
Language resources
On this page you can browse and search our datasets. Click on a row name to see what files are available for download. You can go directly to the search interface by clicking on the tool logo.
All (1397)
Collections (32)
Corpora (1236)
Lexicons (84)
Training and evaluation data (27)
Models (50)
Title
Free search
Language
- Any -
Swedish
Albanian
Arabic
Belarusian
Blissymbols
Bosnian
Bulgarian
Croatian
Czech
Danish
Dutch
English
Estonian
Faroese
Finland Swedish
Finnish
French
German
Icelandic
Iranian Persian
Italian
Kele (Papua New Guinea)
Kurdish
Latin
Latvian
Lower Sorbian
Macedonian
Modern Greek (1453-)
Multiple languages
Norwegian
Norwegian Bokmål
Old English (ca. 450-1100)
Old High German (ca. 750-1050)
Old Norse
Old Saxon
Polish
Portuguese
Romanian
Russian
Serbian
Slavomolisano
Slovak
Slovenian
Somali
Spanish
Turkish
Turkmen
Ukrainian
Upper Sorbian
Xhosa
Resurs
Typ
Språk
Åtkomst
Argumentation sentences 1.0
A translated corpus for classifying sentence stance in relation to a topic.
Corpus
Swedish
Dataset:
argumentation-sentences.zip
2023-03-30 – 827.04 KB – CC-BY-4.0
CoDeRooMor, v.01
Morphological dataset (word-building morphology), Swedish L2 profiles project
Lexicon
Swedish
Dataset:
CodeRoomor_v01_lemgramView.csv
2021-04-13 – 1.96 MB – CC-BY-4.0
Dataset:
CodeRoomor_v01_morphemeView.csv
2021-04-13 – 856.29 KB – CC-BY-4.0
Dataset:
CodeRoomor_v01_lemgramView.xlsx
2021-04-13 – 1.72 MB – CC-BY-4.0
Dataset:
CodeRoomor_v01_morphemeView.xlsx
2021-04-13 – 699.46 KB – CC-BY-4.0
Explore in:
DaLAJ-GED-SuperLim 2.0
Dataset for Linguistic Acceptability Judgments (and more), v.2.0
Corpus
Swedish
Dataset:
dalaj-ged-superlim.zip
2023-04-03 – 1.41 MB – CC-BY-4.0
Dataset:
dalaj-ged-tsv.zip
2023-05-20 – 1.15 MB – CC-BY-4.0
Dataset:
liuep197-11.pdf
2024-01-25 – 463.74 KB – CC-BY-4.0
Dalin: Then Swänska Argus 1732-1734
Manual transcription of Then Swänska Argus by Olof von Dalin, Stockholm, 1732–1734. For OCR analysis.
Corpus
Swedish
Dataset:
dalin-then-swaanska-argus-1732-1734.tar.gz
2020-06-12 – 80.21 MB – CC-BY-4.0
Eukalyptus Treebank of Written Swedish
A treebank with written Swedish data, with parts-of-speech, TIGER-style syntax, multiword expressions and sense annotation
Corpus
Swedish
Dataset:
Eukalyptus-1.0.0.zip
2024-01-25 – 4.58 MB – CC-BY-SA-4.0
Dataset:
Eukalyptus-0.1.0.zip
2024-01-25 – 3.66 MB – Other
Dataset:
Eukalyptus-0.1.1.zip
2024-01-25 – 3.8 MB – Other
Dataset:
Eukalyptus-0.2.0.zip
2024-01-25 – 4.19 MB – Other
MuClaGED
MuClaGED is a dataset for multi-class Grammatical Error Detection for Swedish. The dataset is based on the SweLL-gold corpus.
Corpus
Swedish
Explore in:
MultiGEC
MultiGEC is a dataset for Grammatical Error Correction containing parallel data for 12 languages and 17 subcorpora. Each subcorpus contains two or more parallel versions of the same texts (typically, full learner essays), where one version (orig) is the one that the author originally wrote, and the others (ref1, ref2, ...) are corrected versions of the same text. Languages included: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian (English and Russian are available on request). Texts come from different original corpora, but are reformatted to a unified format.
Corpus
Czech, German, Modern Greek (1453-), English, Estonian, Icelandic, Italian, Latvian, Russian, Slovenian, Swedish, Ukrainian
Explore in:
MultiGED
MultiGEC is a dataset for Grammatical Error Detection (a task within NLP) containing data for 5 languages (Czech, English, German, Italian and Swedish).
Corpus
Czech, German, English, Italian, Swedish
Dataset:
multiged-2023.tar.bz2
2025-01-22 – 3.82 MB – Other
Explore in:
PGV-PII
A small collection of 10 pairs of parallel texts in Swedish and English annotated with personal information categories.
Corpus
Swedish, English
Dataset:
gv-pii.bz2
2026-02-27 – 49.75 KB – CC-BY-4.0
SemEval2020 Task 1
Swedish Test Data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (extracts from Kubhist v2)
Corpus
Swedish
Dataset:
semeval2020_ulscd_swe.zip
2024-01-25 – 956.05 MB – CC-BY-4.0
SIC2 - Stockholm Internet Corpus
The Stockholm Internet Corpus (SIC2) contains Swedish blog posts, annotated with part of speech, morphological features, and named entities.
Corpus
Swedish
Dataset:
sic2.xml.bz2
2020-11-25 – 262.36 KB – CC-BY-4.0
Word statistics:
stats_sic2.csv.zip
2025-04-22 – 44.79 KB – CC-BY-4.0
Dataset:
sic2.zip
2020-11-11 – 83.63 KB – CC-BY-4.0
Dataset:
readme.txt
2020-11-17 – 2.18 KB – CC-BY-4.0
Explore in:
SUC 2.0
Stockholm-Umeå corpus 2.0
Corpus
Swedish
Word statistics:
stats_SUC2.txt.zip
2025-04-22 – 1.34 MB – CC-BY-4.0
SUC 3.0
Stockholm-Umeå corpus 3.0
Corpus
Swedish
Dataset:
suc3.xml.bz2
2024-06-03 – 84.44 MB – CC-BY-4.0
Word statistics:
stats_suc3.csv.zip
2025-04-22 – 1.43 MB – CC-BY-4.0
Explore in:
Collection
SuperLim 2
A standardized suite for evaluation and analysis of Swedish natural language understanding systems.
Corpus
Swedish
Dataset:
SuperLim-2.0.5.zip
2025-12-12 – 62.75 MB – CC-BY-4.0
Dataset:
SuperLim_maintenance.odt
2025-12-12 – 8.01 KB
SuperSim (repackaged for Superlim) 2.0
A dataset for word similarity and relatedness in Swedish
Corpus
Swedish
Dataset:
supersim-superlim.zip
2023-03-30 – 70.45 KB – CC-BY-4.0
Swe-NERC
A resource for training and evaluation of Named Entity Recognition for Swedish.
Corpus
Swedish
SweDiagnostics
Swedish version of (Super)GLUE Diagnostic
Corpus
Swedish
Dataset:
swediagnostics.zip
2023-04-04 – 72.89 KB – CC-BY-4.0
Swedish ABSAbank
An annotated Swedish corpus for aspect-based sentiment analysis
Corpus
Swedish
Dataset:
swe-absa-bank.zip
2020-03-04 – 128.55 MB – CC-BY-4.0
Dataset:
absabankimm-combined.zip
2023-02-20 – 15.87 MB – CC-BY-4.0
Swedish ABSAbank-Imm 1.1
An annotated Swedish corpus for aspect-based sentiment analysis (a version of Absabank)
Corpus
Swedish
Dataset:
absabank-imm.zip
2023-03-30 – 1.03 MB – CC-BY-4.0
Swedish analogy 2.0
Swedish semantic and syntactic similarity
Corpus
Swedish
Dataset:
sweanalogy.zip
2023-03-30 – 178.63 KB – CC-BY-4.0
Swedish book reviews
Texts from newspapers and magazines, manually annotated with book reviews.
Corpus
Swedish
Dataset:
kno-dagny.zip
2025-12-15 – 13.74 MB – CC-BY-4.0
Dataset:
kno-oob.zip
2025-12-15 – 70.13 MB – CC-BY-4.0
Dataset:
kno-kb.zip
2025-12-15 – 4.73 MB – CC-BY-4.0
Swedish EAT: question classification
A translated version of the QAQC dataset for expected-answer-type classification.
Corpus
Swedish
Dataset:
swe_qaqc_train.csv
2023-06-08 – 361.34 KB – CC-BY-4.0
Dataset:
Swedish_EAT_v1.0.tsv
2023-06-08 – 2.05 KB – CC-BY-4.0
Swedish fraktur 1626-1816
A selection of fraktur texts printed between 1626 and 1816 from the collections of the University Library of University of Gothenburg (UB). For OCR analysis.
Corpus
Swedish
Dataset:
svensk-fraktur-1626-1816.tar.gz
2021-11-26 – 757.73 MB – CC-BY-4.0
Swedish newspapers 1818-1870
A selection of Swedish newspapers printed between 1818 and 1870 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus
Swedish
Dataset:
svenska-tidningar-1818-1870.tar.gz
2020-05-26 – 458.22 MB – CC-BY-4.0
Swedish newspapers 1871-1906
A selection of Swedish newspapers printed between 1871 and 1906 from the collections of Kungliga biblioteket (KB). For OCR analysis.
Corpus
Swedish
Dataset:
svenska-tidningar-1871-1906.tar.gz
2022-05-03 – 831.74 MB – CC-BY-4.0
Swedish treebank
A Swedish treebank built from recycled language resources
Corpus
Swedish
SweDN 1.0
A Swedish text summarization corpus
Corpus
Swedish
SweFAQ 2.0
Frequently asked questions from Swedish authorities' websites with shuffled answers
Corpus
Swedish
Dataset:
swefaq.zip
2023-03-30 – 89.81 MB – CC-BY-4.0
SweFraCas 1.0
Textual inference/entailment problem set
Corpus
Swedish
Dataset:
swefracas.tsv
2021-06-10 – 100.92 KB – CC-BY-4.0
Dataset:
swefracas_documentation_sheet.tsv
2021-06-15 – 4.23 KB – CC-BY-4.0
SweNLI 1.0
A Swedish NLI dataset
Corpus
Swedish
Dataset:
swenli.zip
2023-03-30 – 55.13 MB – CC-BY-4.0
SweParaphrase 2.0
Semantic Textual Similarity reference data (STS Benchmark).
Corpus
Swedish
Dataset:
sweparaphrase.zip
2023-03-30 – 750.9 KB – CC-BY-4.0
SweSAT Swedish Scholastic Aptitude Test Synonyms 1.1
Swedish Scholastic Aptitude Test Synonyms
Lexicon
Swedish
Dataset:
swesat-synonyms.zip
2023-03-30 – 37.73 KB – CC-BY-4.0
SweWiC 2.0
A Swedish Word-in-Context dataset
Corpus
Swedish
Dataset:
swewic.zip
2023-03-30 – 587.65 KB – CC-BY-4.0
SweWinogender 2.0
A Swedish dataset for coreference and gender bias
Corpus
Swedish
Dataset:
swewinogender.zip
2023-03-30 – 28.3 KB – CC-BY-4.0
SweWinograd 2.0
A Swedish dataset for pronoun resolution
Corpus
Swedish
Dataset:
swewinograd.zip
2023-03-30 – 33.41 KB – CC-BY-4.0
Syntag treebank
A Swedish treebank with syntactic analysis of 158 articles from Press-65.
Corpus
Swedish
Dataset:
syntag.txt
2010-02-08 – 4.45 MB – CC-BY-4.0
Dataset:
syntag.html
2010-05-24 – 10.15 MB – CC-BY-4.0
TalbankenSBX
Talbanken is a Swedish treebank. This is the Språkbanken Text version of Talbanken.
Corpus
Swedish
Dataset:
talbanken.xml.bz2
2017-06-07 – 1.54 MB – CC-BY-4.0
Word statistics:
stats_TALBANKEN.txt.zip
2025-04-22 – 206.82 KB – CC-BY-4.0
Dataset:
changelog.txt
2020-06-11 – 316 bytes – CC-BY-4.0
Dataset:
TalbankenSBX_morphsplit20200610.zip
2020-06-11 – 3.64 MB – CC-BY-4.0
Dataset:
TalbankenSBX_syntsplit20200610.zip
2020-06-11 – 807.09 KB – CC-BY-4.0
Explore in:
TalbankenSTB
Talbanken is a Swedish treebank.
Corpus
Swedish
Dataset:
TalbankenSTB.zip
2020-08-11 – 2.6 MB – CC-BY-4.0
Dataset:
TalbankenSTB_README.txt
2020-08-11 – 1.05 KB – CC-BY-4.0
Dataset:
TalbankenSTB_documentation.zip
2020-08-11 – 62.23 KB – CC-BY-4.0
Dataset:
TalbankenSTB_datasplit.zip
2020-08-11 – 2.6 MB – CC-BY-4.0
Dataset:
TalbankenSTB_original_parts.zip
2020-08-11 – 2.95 MB – CC-BY-4.0
Plattformar
Hur vi arbetar
Data
Analyses
Research
Publications
Doktorandutbildning
For PhD students and supervisors
Research meetings
Staff
Aktuellt
Calendar
Conferences and workshops
Autumn Workshop
Höstworkshop 2025
Höstworkshop 2024
Höstworkshop 2023
Höstworkshop 2022
Höstworkshop 2021
Autumn Workshop 2020
Autumn Workshop 2011 and Korp-release
Autumn Workshop 2012
Autumn Workshop 2013
Autumn Workshop 2014
Autumn Workshop 2015
Autumn Workshop 2016
Autumn Workshop 2017
Autumn Workshop 2018
Autumn Workshop 2019
Språkbanken 40 years
FAQ
About us
Organisation
Språkbanken 50 years
Celebration
A brief history
How to cite
Cookies
Internal
Contact us
Help desk
Sök