Feature Selection and Caching: Extensions of the Relevant-Set Correlation Model

Michael E. Houle
National Institute of Informatics, Japan


In recent years, a number of methods have been proposed for data clustering that make use of so-called "shared-neighbor" information. The rationale behind all such approaches is that dense, interrelated data clusters can be revealed by the degree to which the neighborhoods of their members overlap. In this talk, we look at two extensions of the Relevant-Set Correlation (RSC) model for data clustering. The first extends the model to the case of multimodal information, in which objects is associated with several ranked relevant sets (neighborhoods), each associated with its own collection of data features and similarity measures. The second extension applies the modeling methodology of RSC to the problem of active caching of query-by-example ranked result lists. Here, the goal is to avoid disk access latency by estimating a query result from cached information whenever the desired result is missing from the cache. This research is "work in progress", and the presentation focuses on the models rather than on implementation details. The extension to caching is joint work with Vincent Oria and Umar Qasim of NJIT.