dipl· psych·  m·eik michalke
topimage topimage topimage topimage topimage topimage
[blog] · [psychologie] · [hacking] · [faq] · [kontakt] · [home]
psychologische ausbildung.
kreatives problemlösen.
häufig gestellte fragen.
so erreichen sie mich.
zurück zur startseite.
hacking: textanalyse mit R
hacking: multiple choice auswertung mit R
hacking: XML parser für R
hacking: maintanance tool für R-pakete
hacking: gui für R
hacking: bash-generiertes jeaopardy


flattr this!

koRpus: an R packge for text analysis

koRpus is an R package i originally wrote to measure similarities/differences between texts. over time it grew into what it is now, a hopefully versatile tool to analyze text material in various ways, with an emphasis on scientific research, including readability and lexical diversity features.

web application

to demonstrate some of the core features of koRpus, there is a public web application hosted by the heinrich heine university of düsseldorf. it was realised using the shiny package. the source files for the app come with the koRpus package, so you can also run it locally and change it to your needs.

getting koRpus

the most recent stable release should be available via CRAN. the most recent development release of koRpus can be installed from my own package repository http://R.reaktanz.de, e.g. directly from an R session:

install.packages("koRpus", repo="http://R.reaktanz.de")

there's also a debian/ubuntu package (needs recent R packages from CRAN as a dependecy).

the package has its own tokenizer, which should suffice for a lot of use cases, but to use all available features an additional installation of TreeTagger is strongly recommended! this means, koRpus can be used as an R wrapper for TreeTagger.


this is a probably incomplete list of implemented features:

citation information

in case you need to cite koRpus for reference, consider the CITATION file or use the following:

Michalke, M. (2012, April). koRpus -- ein R-paket zur textanalyse. Paper presented at the Tagung experimentell arbeitender Psychologen (TeaP), Mannheim.

work in progress

i'm still working on koRpus (see the ChangeLog). that is, as a price for progress it is possible that sometimes things won't work at all, return faulty results or will behave differently in future releases. however, in general i consider the package to be useful and usable, and i recieved several reports from variuos places where was successfully used. any feedback is most welcome!

still some work needs to be done to fully validate the implementations of various measures for readability and lexical diversity. until then, those functions will trigger a warning to interpret the results with caution. help would be appreciated!

RKWard plugin: graphical user interface for koRpus

to make working with koRpus as comfortable as possible, i'm also working on a plugin for RKWard:


the plugin gets installed/updated automatically with the R package, and recent versions of RKWard wil automatically add the plugin to their configuration.

  © 2005—2014  ·  stand: 25.03.2014, 17:37:48  ·  [drucken?]  ·  [impressum]