Text classification is a classification. It deals with the content analysis of texts in order to assign them to certain predefined classes, which are structured according to content characteristics and statements. All classes have their own class profiles, which are created manually or automatically depending on the system orientation.
Text classification is a technique with which extensive amounts of information are sorted, filtered and classified by algorithms. With its help, information can be assigned to classes and it facilitates searching in large amounts of data. The number of classes is practically unlimited. The classification system can be hierarchical and each piece of information can be assigned to one or more classes. Decisive for the classification of information into a class presupposes that the information contains the specified characteristics.
Text classification is used, among other things, in newspapers and portals that divide their news and reports into different sections such as politics, sports, culture, etc.. Since the classification is relatively simple, it can be done by learning systems. In the case of electronic documents, the classification can be based on content and can be done by type of electronic service such as e-mail or short messages. This also includes the analysis and blocking of unwanted e-mails, the spams.
All text classifications work with a classifier formed by a pre-sorted collection of documents. Well-known methods of text classification are the Support Vector Machine( SVM) method and the Naive Bayes method.