Information search is an indispensable component of our lives. Web search engines, such as Google, Yahoo! and Bing, are widely used for searching textual documents, images, and video. However, there are also vast collections of structured and semi-structured data both on the Web and in enterprises, such as relational databases, XML data, etc. The classical way of accessing these data sources is through issuing structured queries, such as SQL/XPath/XQuery. However, this demands users to learn these query languages and comprehend the possibly complex and fast-evolving data schema, which is inconvenient or impossible for users in many applications. To relieve web and scientific users from the learning curve and enable them to easily access (semi-)structured and semi-structured data, supporting keyword search on such data is highly desirable.
We have been identifying a spectrum of problem space in the domain of supporting keyword search on semi-structured data, ranging from evaluation framework of various search strategies, generating high-quality results, helping users to analyze results, user behavior analysis, to scalability using parallel processing framework.
We have been developing techniques to address these problems for users to achieve better search quality and enhanced search experience than searching unstructured data (e.g. web pages), by exploiting the rich meta-information embedded in (semi-) structured data, as outlined below. In addition to developing techniques to address the above challenges of processing keyword queries on tree-structured and graph-structured data in general, we have adapted and extended the techniques for processing a wide variety of semi-structured data, including workflows, text-rich semi-structured data such as online forums, data with incomplete or dirty information, data with varying degree of trustworthy levels, as well as archived data with temporal information.
Sushovan De, Yuheng Hu, Venkata Vamsikrishna Meduri, Yi Chen: BayesWipe: A Scalable Probabilistic Framework for Improving Data Quality. Journal of Data and Information Quality, 2016
Yi Shan, Yi Chen: Scalable Query Optimization for Efficient Data Processing using MapReduce. IEEE Big Data Congress, 2015
Ziyang Liu, Yichuan Cai, Yi Shan, Yi Chen: Ranking Friendly Result Composition for XML Keyword Search. 34th International Conference, ER 2015, Volume 9381 of the series Lecture Notes in Computer Science
Sushovan De, Yuheng Hu, Yi Chen, and Subbarao Kambhampati. BayesWipe: A Multimodal System for Data Cleaning and Consistent Query Answering on Structured Data. SIGMOD 2014 Workshop on Big Uncertain Data (BUDA). 2014.
Sushovan De, Yuheng Hu, Yi Chen, and Subbarao Kambhampati. BayesWipe: A Multimodal System for Data Cleaning and Consistent Query Answering on Structured Data. IEEE BigData 2014.
Ziyang Liu, and Yi Chen. Differentiating Search Results on Structured Data. TODS, 2012.
Ziyang Liu, and Yi Chen: Exploiting and Maintaining Materialized Views for XML Keyword Queries. ACM Transactions on Internet Technology (TOIT 2012), 12 (2)
Ziyang Liu, and Yi Chen: Processing Keyword Search on XML: a Survey. World Wide Web Journal 14(5-6), (2011)
Brian Ackerman, and Yi Chen: Evaluating Rank Accuracy based on Incomplete Pairwise Preferences. Workshop of User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI), 2011
Ziyang Liu, Sivaramakrishnan Natarajan, and Yi Chen: Query Expansion Based on Clustered Results. PVLDB 4(6), 2011
Ziyang Liu, Qihong Shao, Yi Chen: Searching Workflow with Hierarchies Views. Proceedings of the VLDB Endowment (PVLDB) 3(1), 2010.
Ziyang Liu, Yu Huang, Yi Chen: Improving XML Search by Generating and Utilizing Informative Result Snippets. ACM Trans. Database Syst. 35(3): (2010)
Ziyang Liu, Yi Chen: Return specification inference and result clustering for keyword search on XML. ACM Trans. Database Syst. 35(2): (2010)
Ziyang Liu, SivaramaKrishnan Natarajan, Stephen Booher, Tim Meehan, Robert Winkler, Yi Chen: XSACT: A Structured Search Result Comparison Tool. Proceedings of the VLDB Endowment (PVLDB) 3(2), 2010.
Ziyang Liu, Yi Chen: Query Results Ready, Now What? IEEE Data Eng. Bull. 33(1): 46-53 (2010)
Ziyang Liu, Yichuan Cai, Yi Chen: TargetSearch: A Ranking Friendly XML Keyword Search Engine. ICDE 2010
Ziyang Liu, and Yi Chen: Keyword Search on XML Data. Book Chapter In Advanced Applications and Structures in XML Processing: Label Streams, Semantics Utilization and Data Query Technologies co-edited by Changqing Li and Tok Wang Ling, IGI Global, 2010.
Ziyang Liu, Peng Sun, Yu Huang, Yichuan Cai, Yi Chen: Challenges, Techniques and Directions in Building XSeek: an XML Search Engine. IEEE Data Eng. Bull. 32(2): 36-43 (2009)
Ziyang Liu, Peng Sun, Yi Chen: Structured Search Result Differentiation. PVLDB 2(1): 313-324 (2009)
Qihong Shao, Peng Sun, Yi Chen: WISE: A Workflow Information Search Engine. ICDE 2009: 1491-1494
Ziyang Liu, Yi Chen: Answering Keyword Queries on XML Using Materialized Views. ICDE 2008: 1501-1503
Yu Huang, Ziyang Liu, Yi Chen: Query Biased Snippet Generation in XML Search. SIGMOD Conference 2008: 315-326
Ziyang Liu, Yi Chen: Reasoning and Identifying Relevant Matches for XML Keyword Search. PVLDB 1(1): 921-932 (2008)
Yu Huang, Ziyang Liu, Yi Chen: eXtract: A Snippet Generation System for XML Search. PVLDB 1(2): 1392-1395 (2008). VLDB 2007: 1330-1333
Ziyang Liu, Yi Chen: Identifying Meaningful Return Information for XML Keyword Search. SIGMOD Conference 2007: 329-340
Ziyang Liu, Jeffrey Walker, Yi Chen: XSeek: A Semantic XML Search Engine Using Keywords. VLDB 2007: 1330-1333
Yi Chen < yi.chen at njit dot edu >
Mingda Li < ml456 at njit dot edu >
Chong Wang, Yi Shan, Yunzhong Liu, Norman Hamilton, Brandon Ruggles, Ahmed Youssef, Nicholas Devlin, Brian Ackerman,Doug Stoeckmann, Stephen Booher, Yichuan Cai, Ziyang Liu, Tim Meehan, SivaramaKrishnan Natarajan, Peng Sun, Jeffrey Walker, Arthur Maciejewicz, Mike Citro, Ghislain Youdom, Viraj Bhalala.
This project is supported by NSF CAREER Award 1322406, a Google Research Award and a Google Award for Google Cloud Platform Credit.