dipl· psych·  m·eik michalke
topimage topimage topimage topimage topimage topimage
[blog] · [psychologie] · [hacking] · [faq] · [kontakt] · [home]
psychologische ausbildung.
kreatives problemlösen.
häufig gestellte fragen.
so erreichen sie mich.
zurück zur startseite.
hacking: textanalyse mit R
hacking: silbentrennung mit R
hacking: multiple choice auswertung mit R
hacking: XML parser für R
hacking: maintanance tool für R-pakete
hacking: gui für R
hacking: feinstaub-visualisierung mit R
hacking: bash-generiertes jeaopardy


koRpus: an R packge for text analysis

koRpus is an R package i originally wrote to measure similarities/differences between texts. over time it grew into what it is now, a hopefully versatile tool to analyze text material in various ways, with an emphasis on scientific research, including readability and lexical diversity features.

web application

to demonstrate some of the core features of koRpus, there is a public web application hosted by the heinrich heine university of düsseldorf. it was realised using the shiny package. the source files for the app come with the koRpus package, so you can also run it locally and change it to your needs.

mailing list

to ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please use the issue tracker or subscribe to the koRpus-dev mailing list.

getting koRpus

the most recent stable release should be available via CRAN. the most recent development release of koRpus can be installed from my own package repository https://reaktanz.de/R, e.g. directly from an R session:

install.packages("koRpus", repo=c(getOption("repos"), reaktanz="https://reaktanz.de/R"))

the package has its own tokenizer, which should suffice for a lot of use cases, but to use all available features an additional installation of TreeTagger is strongly recommended! this means, koRpus can be used as an R wrapper for TreeTagger.


this is a probably incomplete list of implemented features:

the koRpus-dev mailing list

you are invited to subscribe to our mailing list to discuss the development of the R package 'koRpus'.

in order to subscribe, send an e-mail to majordomo /at/ r.reaktanz.de with the only line "subscribe koRpus-dev-r-reaktanz-de@r.reaktanz.de YOUR@MAIL.ADDRESS", replacing the latter with the address you would like to subscribe.

to unsubscribe, do the same, but replace "subscribe" with "unsubsribe".

citation information

in case you need to cite koRpus for reference, consider the CITATION file or use the following:

Michalke, M. (2018, March). "Entschuldigen Sie, dass ich Ihnen einen komplizierten Artikel schreibe, für einen lesbaren habe ich keine Zeit" -- Textanalyse mit den R-Paketen koRpus & tm.plugin.koRpus Paper presented at the Tagung experimentell arbeitender Psychologen (TeaP), Marburg.

Michalke, M. (2012, April). koRpus -- ein R-paket zur textanalyse. Paper presented at the Tagung experimentell arbeitender Psychologen (TeaP), Mannheim.

full text corpus support and 'tm' integration

in case you would like to analyze full corpora instead of single texts, or use both koRpus and the tm package for text analysis, check out this add-on package:

install.packages("tm.plugin.koRpus", repo=c(getOption("repos"), reaktanz="https://reaktanz.de/R"))

a token of gratitude

if you appreciate my work an want to say "thanks", please check my wantlist on discogs (just have records sent to the address you find in the imprint/impressum). you're awesome!

work in progress

i'm still working on koRpus (see the ChangeLog). that is, as a price for progress it is possible that sometimes things won't work at all, return faulty results or will behave differently in future releases. however, in general i consider the package to be useful and usable, and i recieved several reports from variuos places where was successfully used. any feedback is most welcome!

still some work needs to be done to fully validate the implementations of various measures for readability and lexical diversity. until then, those functions will trigger a warning to interpret the results with caution. help would be appreciated!

RKWard plugin: graphical user interface for koRpus

to make working with koRpus as comfortable as possible, i've also written a plugin for RKWard:


the plugin gets installed/updated automatically with the R package, and recent versions of RKWard will automatically add the plugin to their configuration.

  © 2005—2024  ·  stand: 21.09.2020, 18:08:46  ·  [drucken?]  ·  [impressum]  ·  [datenschutz]
I sustain C3S