People
Prof. Chitta Baral
Prof. Graciela Gonzalez
Prof. Steven Bird
Prof. Susan B. Davidson
Haejoong Lee
Yifeng Zheng
XML Stream Processing
Description
There are many applications where data arrives continuously as a stream
and requires on-line processing without loading it into a database, for example,
real time monitoring for traffic or financial information. We focus on
efficient techniques for processing XML streams. The topics that
we have
studied include
how to efficiently evaluate XPath queries on XML streams, how to
validate XML streams according to user specified constraints, how to
encode the data in order to speed up the processing of encoded XML
streams.
Publication
- Yi Chen,
Susan B. Davidson, and
Yifeng Zheng. An Efficient XPath
Query Processor for XML Streams . In Proceedings of 22nd
International Conference on Data Engineering (ICDE), 2006.
- Yi
Chen, Susan B. Davidson, and Yifeng Zheng. “ViteX:
A
Streaming XPath Processing System.” Demonstration description. In
Proceedings of 21st International Conference on Data Engineering
(ICDE), 2005 (to appear).
- Yi
Chen, George A. Mihaila, Susan B. Davidson, and Sriram Padmanabhan.
“EXPedite: A
System for Encoded XML Processing.” In Proceedings of 13rd ACM
Conference on Information and Knowledge Management (CIKM), pp. 108-117,
2004.
- Yi
Chen, George A. Mihaila, Susan B. Davidson, and Sriram Padmanabhan.
“Efficient Path Query
Processing on Encoded XML.” In Proceedings of International
Workshop on High Performance XML Processing, in conjunction with WWW,
2004.
- Yi
Chen,
Susan B. Davidson, and Yifeng Zheng. “XKvalidator:
A Constraint
Validator for XML. ” In Proceedings of 11th ACM Conference on
Information and Knowledge Management (CIKM), pp. 446-452, 2002.
People
Prof. Susan Davidson
Dr. George Mihaila
Dr. Sriram Padmanabhan Yi Chen Yifeng Zheng
XML Databases
Description
As XML has
been
a popular format for data representation, effective storage and
efficient query processing of XML data is very important. On the
other hand, relational databases have been optimized for
performance through more than 30 years of development and are
highly reliable, scalable, and well established as the backend for data
storage. We have developed storage and query
evaluation techniques for XML data by leveraging relational
database technology.
When we
transform the hierarchical structure of XML data to relational tables,
we addressed two challenges. First, how to design the transformation
so that the SQL queries generated from XML queries are efficient?
Second, when the schema of XML data is available, how to design a
normalized relational schema for data storage to ensure data
correctness and avoid update anomalies?
Publications
- Yi
Chen, Susan B. Davidson, and Yifeng Zheng. “BLAS: An
Efficient XPath Processing System.” In Proceedings of 23rd ACM
SIGMOD International Conference on Management of Data, pp. 47-58, 2004.
- Yi
Chen, George A. Mihaila, Sriram Padmanabhan, and Rajesh
Bordawekar. “L-Tree:
A Dynamic Labeling Structure for Ordered XML
Data.” In Proceedings of International Workshop on Database
Technologies for Handling XML Information on the Web (dataX), in
conjunction with EDBT, 2004. Springer Lecture Notes in Computer Science
3268, pp. 209-218, 2004.
- Yi
Chen, Susan B. Davidson, Carmem Hara, and Yifeng Zheng. “RRXS:
Redundancy Reducing XML Storage in Relations.” In Proceedings of
29th International Conference on Very Large Data Bases (VLDB), pp.
189-200, 2003.
- Yi Chen, Susan B. Davidson, and Yifeng Zheng. “Constraint
Preserving XML Storage in Relations.” In Proceedings of 5th
International Workshop on the Web and Databases (WebDB), in conjunction
with SIGMOD, pp. 7-12, 2002.
XML Constraints
Description
We have studied
various constraints of XML data, including keys,
foreign keys and functional dependencies. We investigated how to
validate XML constraints when XML data is in its native form as
a file or
a stream, or stored in relational databases. The constraints can
also be enforced incrementally when updates are made to the XML data.
Furthermore, we have studied how to use the constraint information to
guide the schema design
to ensure data correctness and to remove redundancy when we store the
data in relational databases.
Publications
- Yi
Chen,
Susan B. Davidson, Carmem Hara, and Yifeng Zheng. “RRXS:
Redundancy
Reducing XML Storage in Relations.” In Proceedings of 29th
International Conference on Very Large Data Bases (VLDB), pp. 189-200,
2003.
- Yi
Chen,
Susan B. Davidson, and Yifeng Zheng. “XKvalidator:
A Constraint
Validator for XML. ” In Proceedings of 11th ACM Conference on
Information and Knowledge Management (CIKM), pp. 446-452, 2002.
- Yi
Chen,
Susan B. Davidson, and Yifeng Zheng. “Constraint
Preserving XML Storage
in Relations.” In Proceedings of 5th International Workshop on the
Web
and Databases (WebDB), in conjunction with SIGMOD, pp. 7-12, 2002.
People
Prof. Susan Davidson
Prof. Carmem Hara Yi Chen Yifeng Zheng
Querying Linguistic Databases
Description
Describing
and
analyzing human languages depend on being able to manage large
databases of annotated text and recorded speech. This project will
apply research in relational and XML databases to linguistics, develop
linguistic data models and query languages, and deploy them for
creating, managing, analyzing, and displaying annotated linguistic
databases.
Project
web page: http://www.ldc.upenn.edu/Projects/QLDB/
Publications
- Steven Bird, Yi Chen, Susan B. Davidson, Haejoong Lee, and Yifeng Zheng. "Designing
and Evaluating an XPath Dialect for Linguistic Queries." In Proceedings
of 22nd International Conference on Data Engineering (ICDE), 2006.
- Steven
Bird, Yi Chen, Susan B. Davidson,
Haejoong Lee, and Yifeng Zheng. “Extending
XPath to Support Linguistic
Queries.” In Proceedings of Programming Language Technologies for
XML
(PLAN-X), 2005 (to appear).
People
Prof. Steven Bird Prof. Susan Davidson Prof. Mark Liberman Dr. Beatrice Santorini Yi
Chen Baden Hughes Catherine Lai Haejoong
Lee Yifeng Zheng