Peer Reviewed Journal via three different mandatory reviewing processes, since 2006, and, from September 2020, a fourth mandatory peer-editing has been added.
Defining text topicality is often an expensive problem that
requires significant resources for text labeling. Though many
packages already exist that provide dictionaries of labeled text,
synonyms, and Part-of-Speach tagging, the problem is ongoing
as language develops and new meanings of words and phrases
emerge. This paper proposes a cheap in human labor solution to
topic labeling of any text in the majority of languages. The
methodology uses links to the naturally emerging corpus of
labeled text – the Wikipedia. Wikipedia categories are
processed to extract a weighted set of topic labels for the
analyzed text. The approach is evaluated by processing
categorized texts and comparing the similarity of the top ranks
of topic labels to the text category. The topic labels extracted
using this methodology can be used for comparing similarity of
texts, for the assessment of the completeness of topic coverage
in automated marking of essays, and for coding in qualitative
text analysis. The paper contributes to the field of NLP by
offering a cheap and organically developing method of topical
text labeling. The paper contributes to the work of qualitative
analysts by offering a methodology for the analysis of interview
transcripts and other unstructured text.