:: Brook Wu @ NJIT ::

CV & Research

»   Highlight
»   IntegraL
»   EZ Search

CV: A very short CV adapted from my ABET CV. Overview of my research: I am interested in deriving intelligence from corpora using text mining, information extraction, natural language processing, machine learning and information retrieval approaches. My work has been applied in distance learning student performance evaluation, representation of research expertise for personalized uMining, finding similar people from the web using a personal web site as a search query, personzlied query refinement, etc. The following is a list of my current projects. (Current as of June 20, 2013)
	IFME: Information filtering by multiple examples, with Ph.D. student Mingzhu Zhu
	This approach utilizes multiple representative articles provided by a user as positive samples to represent a complex information need without the user composing any search query. The system learns from the user samples and ranks all documents in a document base (such as a digital library), based on their relevance to the information need which is represented by user's sample documents using a semi-supervised Positive and Unlabeled Learning (PU Learning) approach. To achieve a high level of learning performance even with very few positive samples, the system utilizes under-sampling, which is especially beneficial when desired documents similar to the samples are not evenly distributed in the document base.
	Task-based user profiling for personalized query refinement, with Ph.D. student Chao Xu
	This project uses the user’s prior search sessions to model his or her evolving search interests with long- and short-term, and positive and negative descriptors. To reduce the noise in the dataset, the clicked pages in the user’s search sessions are represented using click graphs to form a pseudo user representation, from where the descriptors in the user’s profile are derived.
	Concept chaining utilizing meronyms in text characterization, with Ph.D. student Lori Watrous-Deversterre
	This project utilizes semantic and linguistic content categorization which will facilitate improved access methods for digital library resources.