set correlation clustering: Application to cashing and recommender systems
(Research conducted in collaboration Michael Houle, NII, Tokyo,
In this research we are interested in active caching mechanisms and
Active Caching: In several novel applications (such as search
engines, recommender systems and multimedia databases), the result of a
query is a ranked list obtained by applying a similarity measure to
features of database objects. Generating ranked lists is typically an
expensive operation that often results in access latency. Caching of
frequently-accessed data has been shown to have many useful applications
for reducing stress on limited resources and improving response time.
However, traditional caching techniques defined for exact match queries
cannot be applied to ranked list queries. In this paper, we propose an
`active caching' technique for ranked list queries that not only returns
cached results, but also actively processes queries whose results are not
present in the cache, by aggregating those ranked list results stored in
the cache for related queries. The solution is based on concepts from the
relevant set correlation (RSC) clustering model, which measures the
similarity between two objects in terms of the number of other objects in
the common intersection of their neighborhoods.
Multi-attribute Recommendations: This research
investigates the application of clustering to multi-criteria ratings as a
method of improving the precision of top-N recommendations. With the advent
of ecommerce sites that allow multi-criteria rating of items, there is an
opportunity for recommender systems to use the additional information to
gain a better understanding of user preference. This research proposes the
use of the relevant set correlation model for a clustering-based
collaborative filtering system. It is anticipated this novel system will
handle large numbers of users and items without sacrificing the relevance
of recommended items.
Object Trajectory Management
(Research conducted in
collaboration with Karine Zeitouni an Iulian Sandu Popa, University
of Versailles St Quentin, France)
We proposed to work on a new access
method for objects moving in trajectories. Although this subjected has been
investigated most indexes proposed make the assumption that the objects
move freely. We proposed PARINET, a new access method to efficiently
retrieve the trajectories of objects moving in networks. The structure of
PARINET is based on a combination of data partitioning and composite B+-tree
local indexes. Unlike the existing approaches, the new approach relies on
the distribution of the data to be indexed. For historical data, the data
distribution can be known in advance. The partitioning of the data is based
on graph partitioning theory, and can be tuned for a given query load. We studied different types of queries,
and provided an optimal configuration for several scenarios. PARINET can
easily be integrated in any RDBMS, which is an essential aspect
particularly for industrial or commercial applications.
The PARINET index is suitable to
index past trajectories and is neither adapted to present trajectories nor
future ones. It cannot work efficiently for real time trajectory indexing
because the index structure is based on graph partitioning and the
continuous flow of trajectories can lead dynamic partitioning of the
trajectories which is expensive. Luckily in several real life applications,
moving objects follow routines with similar trajectories. For example,
people that have to start work at the same time every week day leave home
around the same time everyday and most of the time, follow the same
itinerary. This information can be mined for past trajectories and used to
initialize the index. The other part
of this research consists in adapting the initialized index to the current
flow in order to have an index that can work for present and future trajectories.
- Semi-Automatic Image
Annotation: Knowledge Propagation in Large Image Databases Using
This is join research between NJIT and NII that involves Prof.
Michael Houle and Prof. Shin’ichi Satoh from NII, Tokyo and Jichao Sun, PhD
student at NJIT.
information associated to some objects of interest to an entire image
database has several applications ranging from home photo album management
to security. Existing solutions are labor intensive and not always
accurate. The aim of this research is to reduce to a minimum the human
intervention in semantic annotations. Ideally, we would like a sample of
each object of interest to be labeled once and have the label propagated to
the occurrences of the object in the entire image database. To that end, we
proposed a neighborhood-based approach called KProp (Knowledge
Propagation) which builds a voting model and effectively propagates the
knowledge associated to some objects to related objects in the database.
Each object iteratively collects opinions from neighbors, makes a decision
on its \status" and provides this information to the others. We show
that this procedure can perform efficiently through matrix computations. KnowledgeProp
is applicable as long as pair-wise similarities of objects are
available and requires no human interactions besides the original labeling.
We applied KnowledgeProp to simple object and face classifications.
The experimental results show that our approach is more stable and achieves
better results with fewer labeled examples per object.
- A Steorological Approach to
Sub-Query Result Integration
This is join research between NJIT and
NII that involves Prof. Michael Houle from NII, Tokyo and Xiguo Ma, PhD
student at NJIT.
In several applications such
as multimedia and recommender systems, complex queries aiming to retrieve
from large databases those objects that best match the query specification
are usually processed by splitting the queries into a set of m simpler
sub-queries, each dealing with only some of the query features. To
determine which the overall best-matching objects are, a rule is then
needed to integrate the results of such sub-queries, i.e., how to globally
rank the m-dimensional vectors of matching degrees, or partial
scores, that objects obtain on the m sub-queries. It is a fact
that state-of-the-art approaches all adopt as integration rule a scoring
function, such as weighted average, that aggregates the m partial
scores into an overall (numerical) similarity score, so that objects can be
linearly ordered and only the highest scored ones returned to the user.
This choice however forces the system to compromise between the different
sub-queries and can easily lead to miss relevant results. In this research
we propose a steorological approach to sub-query result integration. In
the past, measures of intrinsic dimension (such as the expansion dimension)
have been used strictly for the analysis of similarity search methods. This
research aims at demonstrating that tests of stereological dimension can be
used dynamically to guide the decisions made by search algorithms.