<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>
			tm.plugin.koRpus
		</title>
		<link>
			https://reaktanz.de/R/pckg/tm.plugin.koRpus
		</link>
		<atom:link href="https://reaktanz.de/R/pckg/tm.plugin.koRpus/RSS.xml" rel="self" type="application/rss+xml" />
		<description>
			<![CDATA[ 
				Enhances 'koRpus' text object classes and methods to also support large corpora. Hierarchical ordering of corpus texts into arbitrary categories will be preserved. Provided classes and methods also improve the ability of using the 'koRpus' package together with the 'tm' package. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please subscribe to the koRpus-dev mailing list (<https://korpusml.reaktanz.de>). 
			]]>
		</description>
		<generator>
			roxyPackage (0.10-1)
		</generator>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.4-2
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Mon, 17 May 2021 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.4-2(2021-05-17)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								updated test standards after changes to koRpus' internal calculations of numer of lines in texts imported from TIF data frames
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								kRp.corpus: replaced 
 								<code>
									prototype()
								</code>
								in class definition with initialize method
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.4-1
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Thu, 17 Dec 2020 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.4-1(2020-12-17)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								<code>
									docTermMatrix()
								</code>
								: results were wrong because numbers were assigned to wrong columns; now fixed in koRpus
							</p>
						</li>
						<li>
							<p>
								unit tests failed on windows due to an UTF-8 issue
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								the nested object class kRp.hierarchy was replaced by kRp.corpus; instead of reproducing the file hierarchy in the object structure, kRp.corpus has a flat structure with all texts in one single data frame; this data frame was also renamed from 
 								<code>
									"TT.res"
								</code>
								into 
 								<code>
									"tokens"
								</code>
								the class name kRp.corpus was used in tm.plugin.koRpus before and is just being recycled ;) kRp.corpus inherits from class kRp.text as defined in the koRpus package
							</p>
						</li>
						<li>
							<p>
								status messages are currently only shown when only one CPU is used
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusTagged()
								</code>
								: now called 
 								<code>
									taggedText()
								</code>
								as in koRpus
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusDesc()
								</code>
								: now called 
 								<code>
									describe()
								</code>
								as in koRpus
							</p>
						</li>
						<li>
							<p>
								[, [&lt;-, [[ and [[&lt;- methods no longer apply to the summary data frame but tokens slot as in koRpus (where it applies to the TT.res slot)
							</p>
						</li>
						<li>
							<p>
								<code>
									show()
								</code>
								: kRp.corpus objects now list all available features
							</p>
						</li>
						<li>
							<p>
								<code>
									read.corp.custom()
								</code>
								: removed unused mc.cores argument
							</p>
						</li>
						<li>
							<p>
								<code>
									docTermMatrix()
								</code>
								: by default behaves like most other methods and adds its result to the input object rather than returning just the matrix; also, the generic is now defined by the koRpus package and was removed, including all of the actual function code
							</p>
						</li>
						<li>
							<p>
								adjusted unit tests and vignette
							</p>
						</li>
						<li>
							<p>
								updated all examples to use a new sample corpus (see added), to the benefit that many &quot;\dontrun{}&quot; cases could be removed
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: the hierarchy levels of a text corpus can now be assumed directly from the directory structure by setting &quot;hierarchy=TRUE&quot;
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusHasFeatures()
								</code>
								, 
 								<code>
									corpusHasFeatures()
								</code>
								&lt;-, 
 								<code>
									corpusFeatures()
								</code>
								, 
 								<code>
									corpusFeatures()
								</code>
								&lt;-, 
 								<code>
									corpusHierarchy()
								</code>
								, 
 								<code>
									corpusHierarchy()
								</code>
								&lt;-, 
 								<code>
									corpusCorpFreq()
								</code>
								, 
 								<code>
									corpusCorpFreq()
								</code>
								&lt;-, 
 								<code>
									diffText()
								</code>
								, 
 								<code>
									diffText()
								</code>
								&lt;-, 
 								<code>
									originalText()
								</code>
								: new getter/setter methods for kRp.corpus objects
							</p>
						</li>
						<li>
							<p>
								<code>
									split_by_doc_id()
								</code>
								: new method transforms a kRp.corpus object into a list of kRp.text objects
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusDocTermMatrix()
								</code>
								: new method to get/set the sparse document term matrix in kRp.corpus objects
							</p>
						</li>
						<li>
							<p>
								[[/[[&lt;-: gained new argument 
 								<code>
									"doc_id"
								</code>
								to limit the scope to particular documents
							</p>
						</li>
						<li>
							<p>
								<code>
									describe()
								</code>
								/describe()&lt;-: now support filtering by doc_id
							</p>
						</li>
						<li>
							<p>
								new sample corpus for use in examples
							</p>
						</li>
					</ul>
					<h4>
						removed
					</h4>
					<ul>
						<li>
							<p>
								removed all classes and methods dealing with kRp.hierarchy
							</p>
						</li>
						<li>
							<p>
								removed deprecated methods of the pre-kRp.hierarchy era
							</p>
						</li>
						<li>
							<p>
								removed generic of 
 								<code>
									tif_as_tokens_df()
								</code>
								as it was moved to the koRpus package
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.3-1
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Tue, 14 May 2019 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.3-1(2019-05-14)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: solved a cryptic warning when more than one text was tokenized
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								<code>
									docTermMatrix()
								</code>
								: new method to generate document-term matrices, either with absolute frequencies or tf-idf values
							</p>
						</li>
						<li>
							<p>
								<code>
									query()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									filterByClass()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									jumbleWords()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									clozeDelete()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									cTest()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									textTransform()
								</code>
								: new method, extending the generic of koRpus &gt;= 0.12-1
							</p>
						</li>
						<li>
							<p>
								<code>
									show()
								</code>
								: new method for objects of class kRp.hierarchy
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								depends on koRpus &gt;= 0.12-1 now
							</p>
						</li>
						<li>
							<p>
								depends on the Matrix package now (for 
 								<code>
									docTermMatrix()
								</code>
								)
							</p>
						</li>
						<li>
							<p>
								adjusted test standards to include the additional POS tags from koRpus &gt;= 0.12-1
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.02-2
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Fri, 18 Jan 2019 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.02-2(2019-01-18)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								, 
 								<code>
									kRpSource()
								</code>
								: added missing imports from packages tm, NLP and parallel
							</p>
						</li>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: fixed status message formatting
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusTm()
								</code>
								: removed useless 
 								<code>
									"level"
								</code>
								argument and corrected the output
							</p>
						</li>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: removed unused 
 								<code>
									"level"
								</code>
								argument
							</p>
						</li>
						<li>
							<p>
								<code>
									corpusFiles()
								</code>
								: now also works with flat hierarchy objects
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: can now also import data frames in TIF format, including support for hierarchal categories
							</p>
						</li>
						<li>
							<p>
								<code>
									tif_as_corpus_df()
								</code>
								: new S4 method to transform a kRp.hierarchy object into a TIF compliant data frame
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: the tm corpora now include full hierarchy metadata
							</p>
						</li>
						<li>
							<p>
								removed pre-hierarchy portions from internal function 
 								<code>
									whatIsAvailable()
								</code>
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.02-1
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Sun, 29 Jul 2018 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.02-1(2018-07-29)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								vignette: also includes info on 
 								<code>
									readCorpus()
								</code>
							</p>
						</li>
						<li>
							<p>
								tests: adjusted test standards to new object class
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								kRp.hierarchy: new S4 class to replace kRp.sourcesCorpus and kRp.topicCorpus to allow more generic nesting of hierarchical levels
							</p>
						</li>
						<li>
							<p>
								<code>
									readCorpus()
								</code>
								: new function to generate kRp.hierarchy objects recursively
							</p>
						</li>
						<li>
							<p>
								many corpus*() getter functions can now filter by hierarchy level or category ID
							</p>
						</li>
						<li>
							<p>
								removed all code regarding 
 								<code>
									simpleCorpus()
								</code>
								, 
 								<code>
									sourcesCorpus()
								</code>
								and 
 								<code>
									topicCorpus()
								</code>
								, their object classes and methods; this is all handled much more flexible by kRp.hierarchy and 
 								<code>
									readCorpus()
								</code>
								now
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.01-4
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Wed, 07 Mar 2018 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.01-4(2018-03-07)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								<code>
									sourcesCorpus()
								</code>
								: speak of 
 								<code>
									"text"
								</code>
								instead of 
 								<code>
									"texts"
								</code>
								if it's only one
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								adjusted package to support koRpus &gt;= 0.11 and sylly, especially with regards to 
 								<code>
									summary()
								</code>
								, 
 								<code>
									hyphen()
								</code>
								, and new class contructors
							</p>
						</li>
						<li>
							<p>
								<code>
									summary()
								</code>
								: for more coherence with the koRpus package the 
 								<code>
									"text"
								</code>
								column in the summary slot was renamed into 
 								<code>
									"doc_id"
								</code>
							</p>
						</li>
						<li>
							<p>
								reaktanz.de supports HTTPS now, updated references
							</p>
						</li>
						<li>
							<p>
								vignette is now in RMarkdown/HTML format; the SWeave/PDF version was dropped
							</p>
						</li>
						<li>
							<p>
								<code>
									hyphen()
								</code>
								/lex. 
 								<code>
									div()
								</code>
								/readability(): 'quiet' is now TRUE by default
							</p>
						</li>
						<li>
							<p>
								<code>
									lex.div()
								</code>
								: 'char' is now an emtpy string by default; computing all characteristics was not a useful default for large text corpora
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								README.md
							</p>
						</li>
						<li>
							<p>
								new [, [&lt;-, [[ and [[&lt;- methods added for corpus object classes
							</p>
						</li>
						<li>
							<p>
								new methods 
 								<code>
									tif_as_tokens_df()
								</code>
								to export corpus objects as a single data.frame in fully TIF compliant format
							</p>
						</li>
						<li>
							<p>
								<code>
									summary()
								</code>
								: now also includes the total number of stopwords (if available)
							</p>
						</li>
						<li>
							<p>
								new class object contructors 
 								<code>
									kRp_corpus()
								</code>
								, 
 								<code>
									kRp_sourcesCorpus()
								</code>
								, and 
 								<code>
									kRp_topicCorpus()
								</code>
								can be used instead of new( 
 								<code>
									"kRp.corpus"
								</code>
								, ...) etc.
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.01-3
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Tue, 12 Jul 2016 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.01-3(2016-07-12)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						fixed
					</h4>
					<ul>
						<li>
							<p>
								the arguments that 
 								<code>
									simpleCorpus()
								</code>
								was supposed to pipe to 
 								<code>
									DirSource()
								</code>
								weren't used
							</p>
						</li>
					</ul>
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								the 
 								<code>
									"paths"
								</code>
								argument of 
 								<code>
									topicCorpus()
								</code>
								now expects a list, not a vector
							</p>
						</li>
						<li>
							<p>
								using the parallel package to be able to use more CPU cores
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								new argument 
 								<code>
									"format"
								</code>
								for 
 								<code>
									simpleCorpus()
								</code>
								, 
 								<code>
									sourceCorpus()
								</code>
								, and 
 								<code>
									topicCorpus()
								</code>
								, to be able to work with text objects directly, instead of files
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.01-2
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Wed, 08 Jul 2015 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.01-2(2015-07-08)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						changed
					</h4>
					<ul>
						<li>
							<p>
								using the S4 methods of koRpus 0.06-1 now, therefore renamed all methods removing the *.corpus suffix (e.g., 
 								<code>
									lex.div.corpus()
								</code>
								is now 
 								<code>
									lex.div()
								</code>
								)
							</p>
						</li>
						<li>
							<p>
								renamed classes into kRp.corpus, kRp.sourcesCorpus and kRp.topicCorpus, and their generator functions accordingly
							</p>
						</li>
					</ul>
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								new methods 
 								<code>
									read.corp.custom()
								</code>
								, 
 								<code>
									freq.analysis()
								</code>
								and 
 								<code>
									summary()
								</code>
							</p>
						</li>
						<li>
							<p>
								new getter/setter methods: 
 								<code>
									corpusSources()
								</code>
								, 
 								<code>
									corpusTopics()
								</code>
								, 
 								<code>
									corpusFreq()
								</code>
								, 
 								<code>
									corpusSummary()
								</code>
							</p>
						</li>
						<li>
							<p>
								first basic unit tests, using the testthat package
							</p>
						</li>
						<li>
							<p>
								new option 
 								<code>
									"summary"
								</code>
								for 
 								<code>
									lex.div()
								</code>
								and 
 								<code>
									readability()
								</code>
								, to automatically update the summary data.frames
							</p>
						</li>
						<li>
							<p>
								first notes in a vignette
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
		<item>
			<title>
				Changes in tm.plugin.koRpus version 0.01-1
			</title>
			<link>
				https://reaktanz.de/R/pckg/tm.plugin.koRpus
			</link>
			<pubDate>
				Mon, 29 Jun 2015 00:00:00 +0000
			</pubDate>
			<guid isPermaLink="false">
				tm.plugin.koRpus0.01-1(2015-06-29)
			</guid>
			<description>
				<![CDATA[ 
					<h4>
						added
					</h4>
					<ul>
						<li>
							<p>
								initial release
							</p>
						</li>
					</ul> 
				]]>
			</description>
		</item>
	</channel>
</rss>
