The goal of text mining is to find interesting and non-trivial patterns
or knowledge from unstructured documents. Both objective and subjective
measures have been proposed to evaluate the interestingness of discovered
patterns. However, objective measures alone are insufficient because they
do not considering users¡¦ knowledge and interests. Subjective measures
require explicit input of user expectations which is difficult or even
impossible to obtain in text mining environments.
This study proposes a user-oriented text-mining framework, and applies
it to the problem of discovering novel association rules from documents.
The system, uMining, consists of two major components: background knowledge
developer and novel association rules miner. Background knowledge is developed
from documents already known to the user (background documents), and modeled
as a key word space with a concept hierarchy developed inside. Target documents
are retrieved from a large corpus by selecting documents that are relevant
to the user¡¦s background. Association rule miner discovers association
rules among noun phrases extracted from target documents.
The user-oriented novelty measure is developed to evaluate the interestingness
(novelty and usefulness) of association rules, and it is defined as the
semantic distance between the antecedent and the consequent of a rule in
the background knowledge key word space. The novelty measure is decomposed
into two components: occurrence distance and connection distance. The former
looks at the overlapping area of two keywords: the more they overlap, the
less the distance is. The latter calculates the distance between two key
words in the concept hierarchy, which is the length of the shortest path
connecting the two key words in the hierarchy. The longer the path is,
the larger the distance is.
The evaluation focused on studying the novelty prediction accuracy and
the usefulness indication power of the user-oriented novelty measure. The
results show that the user-oriented novelty measure has high novelty prediction
accuracy, and it outperforms the WordNet novelty measure and the Support
and Confidence measures in novelty prediction. It is also found that the
user-oriented novelty measure has high usefulness indication power and
it outperforms the WordNet novelty and other seven objective interestingness
measures in usefulness indication. |