Xiaoying Wu

Ph.D. Student

Supervisor: Professor Dimitri Theodoratos
Department of Computer Science
New Jersey Institute of Technology
Email: xw43@njit.edu
Phone: (973) 596-2655
Fax: (973) 596-5777
Address: Computer Science Department, NJIT
GITC Building, Room 4201
University Heights, Newark, NJ 07102

[Home]  [Publications]  [Teaching


Research Interests

My research interests span the areas of data management: structured, semistructured and XML data, generalized tree-pattern query evaluation and optimization on the web, keyword search on XML data, definition of semantics for keyword queries, views on XML data, data integration, and semantic web.

  • Flexible querying of XML data sources on the web.  

    XML data sources include data sources with different structures or data sources with complex or partially known structures. The queries needed for this task go beyond tree-pattern queries (TPQs) and encompass keyword-based queries and queries with arbitrary structural constraints. I consider such a class of queries in my research and refer to them as generalized TPQs (GTPQs).

    XML query evaluations usually have been conducted in two contexts: one deals with indexed XML data. The other deals with (non-indexed) XML streams. These two contexts have different requirements on evaluation algorithms and present different challenges. In my research, I have addressed the evaluation issue in both contexts and designed efficient algorithms for evaluating GTPQs on XML data. Details can be found in my following publications:  [SSDBM09][DASFAA09][WWW08] [CIKM08] [CIKM07] .

  • Defining semantics for keyword queries and generalized tree-pattern queries on XML data.  

    In recent publications [DKE08] [DASFAA07], I have devised an original approach for assigning semantics to GTPQs. The novel semantics seamlessly applies to keyword queries and to queries with structural restrictions. Previous approaches identify meaningful answers by operating locally on the data. In contrast, this new approach operates globally on structural summaries of data to compute meaningful TPQs. This overview of data gives it an advantage when compared to previous approaches. These advantages are largely confirmed by the experimental results.

  • Answering XML queries using materialized views.  

    Answering queries using views is a well-established technique in databases. In this context, two outstanding problems can be formulated. The first one consists in deciding whether a query can be answered exclusively using one or multiple materialized views. Given the many alternative ways to compute the query from the materialized views, the second problem consists in finding the best way to compute the query from the materialized views. In the realm of XML, there is a restricted number of contributions in the direction of these problems due to the many limitations associated with the use of materialized views in traditional XML query evaluation models. 

    In my research, I adopt a recent evaluation model, called inverted lists model, and holistic algorithms which together have been established as the prominent technique for evaluating queries on large persistent XML data, and I address the previous two problems.

    I am currently working on the inverted list based TPQ optimization using materialized views as well as the view selection problem. A recent publication on this area can be found in [CIKM09].