Dealing with the Uncertainty Between the Text Content and the Categories by a Proposed Wavelet Similarity Metric
Citation
Dönmez, İ., & Aslan, Z. (2021, August). Dealing with the Uncertainty Between the Text Content and the Categories by a Proposed Wavelet Similarity Metric. In International Conference on Intelligent and Fuzzy Systems (pp. 233-244). Springer, Cham.Abstract
The rise of the issues related to the uncertainty of decision-making has become a warm issue in operation research. The study is dealing with the uncertainty of text contents and text categories. The complex and ambiguous structure of texts prevents us to have clear and precise categorization. The purpose of this paper is to find the similarity distance for the different categories that the text may be related to, using hidden semantic relations. Our study proposes a new method to reveal the hypernym relations (generic terms, upper classes of the term) of the words in the text and formalize the similarity distance metrics between the given text and different categories. We proposed an original and novel measurement formula to calculate the similarity of defined categories to a specific text using discrete Wavelet transformation. Utilizing the Wavelet transformation method that has been rarely used in text analysis, the upper class “hypernym” relation and their dominance on each other is found. The strength of the results examined on the sample text (I like giraffes and I am afraid of lions. I saw them when they are standing opposite of each other near a cactus in Hoanib Desert.). In the first version that the frequencies are taken into account, sample text is categorized as “warm-blooded animals” between the 8 categories with highest similarity distance. For the normalized binary-valued version, maximum and minimum similarities are calculated in the range from 0.8639 to 0.1360, respectively.