|
|
Xiaoying Wu I am a Research Associate in the Department of Computer Science at New Jersey Institute of Technology (NJIT). I received a Ph.D. degree in Computer Science from NJIT in January 2010. My advisor is Prof. Dimitri Theodoratos. After graduating from NJIT, I was a Postdoctoral Research Scientist in the Department of Biomedical Informatics at Columbia University for one year. I received a B.S. degree in Computer Science from Central South University in China and a M.S. degree in Computer Science from National University of Singapore. I welcome any comments and suggestions on my research work. I can be contacted at xiaoying.wu@gmail.com. |
|
[Home] [Publications] [Teaching] [Service] |
|
Research Interests |
My research interests span the areas of data management: structured, semistructured and XML data, generalized tree-pattern query evaluation and optimization on the web, keyword search on XML data,
definition of semantics for keyword queries, views on XML data,
data integration, and semantic web. |
|
Research Activities |
Answering queries using views is a well-established technique in databases. In this context, two outstanding problems can be formulated. The first one consists in deciding whether a query can be answered exclusively using one or multiple materialized views. Given the many alternative ways to compute the query from the materialized views, the second problem consists in finding the best way to compute the query from the materialized views. In the realm of XML, there is a restricted number of contributions in the direction of these problems due to the many limitations associated with the use of materialized views in traditional XML query evaluation models. In my research, I adopt a recent evaluation model, called inverted lists model, and holistic algorithms which together have been established as the prominent technique for evaluating queries on large persistent XML data, and I address the previous two problems. I am currently working on the inverted list based Tree Pattern Query (TPQ) optimization using materialized views as well as the view configuration problems. A recent publication on this topic can be found in [CIKM09].
|
XML data archiving has been commonly used in the scientific field and web data management for data backup and analysis purposes. Although comprehensive application software, new computing and storage technologies, and the Internet have made it easier to create, collect and store all types of data, the meaningful storing, accessing, and managing of XML database archives in a cost-effective way remains extremely challenging. The project aims to design efficient storage and query optimization techniques for an XML archiving database system. It is an ongoing project hosted by Prof. Hui (wendy) Wang from Stevens Institute of Technology. I am fortunate to participate in and contribute to this exciting and important research problem.
|
|
XML data sources include data sources with different structures or data sources with complex or partially known structures. The queries needed for this task go beyond tree-pattern queries (TPQs) and encompass keyword-based queries and queries with arbitrary structural constraints. I consider such a class of queries in my research and refer to them as generalized TPQs (GTPQs). XML query evaluations usually have been conducted in two contexts: one deals with indexed XML data. The other deals with (non-indexed) XML streams. These two contexts have different requirements on evaluation algorithms and present different challenges. In my research, I have addressed the evaluation issue in both contexts and designed efficient algorithms for evaluating GTPQs on XML data. Details can be found in my following publications: [VLDBJ10][WWWJ10][SSDBM09][WWW08][CIKM07] .
|
|
In the publications [DKE08] [DASFAA07], I devised an original approach for assigning semantics to GTPQs. The novel semantics seamlessly applies to keyword queries and to queries with structural restrictions. Previous approaches identify meaningful answers by operating locally on the data. In contrast, this new approach operates globally on structural summaries of data to compute meaningful TPQs. This overview of data gives it an advantage when compared to previous approaches. These advantages are largely confirmed by the experimental results.
|
|
The development of new web
applications requires efficient design and maintenance of large amounts of
data. It is important to design 'good' semi-structured databases to prevent
data redundancy and updating anomalies. The motivation of this research was
the lack of a data model for semi-structured data that is capable of
capturing the semantics traditionally needed for designing databases. I
created a new semantically richer data model for semi-structured data called
ORA-SS [Tech00]. I also developed a
general design methodology and detailed steps for designing semi-structured
databases using the ORA-SS model [DASWIS01].
The proposed methodology is able to prevent undesirable redundancy and
eliminate updating anomalies for the underlying semi-structured databases. The above research was done when I was supervised by Prof. Tok Wang Ling at National University of Singapore. |