| I am interested in issues related to the designs of systems
which automatically analyze textual entities and their relationships in a
large document collection. Specifically, I am interested in natural language
processing, information retrieval, information extraction, text mining, knowledge
representation, and knowledge organization. |
 |
Natural language processing: Noun Phrase Extractor |
| |
§ Motivation: All text related processing starts with different degrees of natural language processing. Recognizing the importance of extracting useful textual elements, I limited my research on the type of concepts. To be specific, I am interested in noun phrases (NPs) rather than just single words, because the former provide more semantic information than the latter.
§ Results: I have developed a Noun Phrase Extractor (NPE) with help from my students. The precision of NPE is high, above 90%; and the recall is above 85%. The NPE now is used in all of my research projects.
|
 |
Knowledge Organization through an Intelligent Interface: Highlight and EZSearch |
| |
§ Motivation: Often times, users find it difficult to browse returned document hits to find useful information. According to Amanda Spink (2002), users often do not look below 20th document in the search hit list. This makes users miss out good lower ranked hits. We developed an algorithm which dynamically develops a summarization hierarchy for a set of returned documents using a co-occurrence based technique. Users can browse the summarization hierarchy instead to find information faster.
§ Results: The
following search systems are developed by our research group and they are
freely available to the public.
A meta-search engine called Highlight (http://highlight.njit.edu), and
a document search system, which specifically designed for PubMed database
hosted by NIH, called EZSearch (http://siriusb.umdnj.edu:18080/EZSearch/index.html).
|
 |
Information Extraction: Keyphrase Identification Program (KIP) |
| |
§ Motivation:
Identifying key concepts is highly useful in business (text mining) and in
library science (automatic generation of content metadata and/or subject
descriptors). This research project is beyond
noun phrase identification. We developed a machine learning algorithm
which analyzes the composition of noun phrases extracted from a document and assigns scores to them. Those with high enough scores are considered keyphrases and are inserted into the database for weight adjusting. This algorithm learns to identify new keyphrases and adapts to the kind of documents of interests to its user.
§ Results: Keyphrase Identification Program (KIP) is developed and according to our evaluation, its performance is among the best reported.
Download KIP |
 |
Personalized Information Retrieval: uMining |
| |
§ Motivation: It is difficult to develop a personalized search service because each user has his/her own unique information needs, education level, language preferences, source preferences, etc. The goal of this study is to help users filter out unwanted information and derive important association between textual entities using data mining algorithms. The objective of this project is to help users digest the large amount of information returned from each search session and also from the accumulated search history.
§ Results: Still in progress.
|
 |
Keyword Density |
| |
§ Motivation: We use natural language processing techniques to find out levels of distance learning students' online participation. The model assesses student learning from three aspects: the quality of their course work, the quantity of their efforts, and the activeness of their participation; the proposed three measures - keyword density, message length, and message count, are derived from the class messages to measure each assessment aspect respectively.
§ Results: we developed a webboard tool called webboard offline, which automatically calculates keyword density. We have published 3 papers based on this work and won the 2004 AMCIS best paper award.
|