Yi Chen's Completed Projects

People
Prof. Chitta Baral Prof. Graciela Gonzalez Prof. Steven Bird Prof. Susan B. Davidson Haejoong Lee Yifeng Zheng

XML Stream Processing

Description

There are many applications where data arrives continuously as a stream and requires on-line processing without loading it into a database, for example, real time monitoring for traffic or financial information. We focus on efficient techniques for processing XML streams. The topics that we have studied include how to efficiently evaluate XPath queries on XML streams, how to validate XML streams according to user specified constraints, how to encode the data in order to speed up the processing of encoded XML streams.

Publication

Yi Chen, Susan B. Davidson, and Yifeng Zheng. An Efficient XPath Query Processor for XML Streams . In Proceedings of 22nd International Conference on Data Engineering (ICDE), 2006.
Yi Chen, Susan B. Davidson, and Yifeng Zheng. “ViteX: A Streaming XPath Processing System.” Demonstration description. In Proceedings of 21st International Conference on Data Engineering (ICDE), 2005 (to appear).
Yi Chen, George A. Mihaila, Susan B. Davidson, and Sriram Padmanabhan. “EXPedite: A System for Encoded XML Processing.” In Proceedings of 13rd ACM Conference on Information and Knowledge Management (CIKM), pp. 108-117, 2004.
Yi Chen, George A. Mihaila, Susan B. Davidson, and Sriram Padmanabhan. “Efficient Path Query Processing on Encoded XML.” In Proceedings of International Workshop on High Performance XML Processing, in conjunction with WWW, 2004.
Yi Chen, Susan B. Davidson, and Yifeng Zheng. “XKvalidator: A Constraint Validator for XML. ” In Proceedings of 11th ACM Conference on Information and Knowledge Management (CIKM), pp. 446-452, 2002.

People
Prof. Susan Davidson Dr. George Mihaila Dr. Sriram Padmanabhan Yi Chen Yifeng Zheng

XML Databases

Description
As XML has been a popular format for data representation, effective storage and efficient query processing of XML data is very important. On the other hand, relational databases have been optimized for performance through more than 30 years of development and are highly reliable, scalable, and well established as the backend for data storage. We have developed storage and query evaluation techniques for XML data by leveraging relational database technology.

When we transform the hierarchical structure of XML data to relational tables, we addressed two challenges. First, how to design the transformation so that the SQL queries generated from XML queries are efficient? Second, when the schema of XML data is available, how to design a normalized relational schema for data storage to ensure data correctness and avoid update anomalies?

Publications

Yi Chen, Susan B. Davidson, and Yifeng Zheng. “BLAS: An Efficient XPath Processing System.” In Proceedings of 23rd ACM SIGMOD International Conference on Management of Data, pp. 47-58, 2004.
Yi Chen, George A. Mihaila, Sriram Padmanabhan, and Rajesh Bordawekar. “L-Tree: A Dynamic Labeling Structure for Ordered XML Data.” In Proceedings of International Workshop on Database Technologies for Handling XML Information on the Web (dataX), in conjunction with EDBT, 2004. Springer Lecture Notes in Computer Science 3268, pp. 209-218, 2004.
Yi Chen, Susan B. Davidson, Carmem Hara, and Yifeng Zheng. “RRXS: Redundancy Reducing XML Storage in Relations.” In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), pp. 189-200, 2003.
Yi Chen, Susan B. Davidson, and Yifeng Zheng. “Constraint Preserving XML Storage in Relations.” In Proceedings of 5th International Workshop on the Web and Databases (WebDB), in conjunction with SIGMOD, pp. 7-12, 2002.

People
Prof. Susan Davidson Prof. Carmem Hara Dr. George Mihaila Dr. Sriram Padmanabhan Rajesh Bordawekar Yi Chen Yifeng Zheng

XML Constraints

Description
We have studied various constraints of XML data, including keys, foreign keys and functional dependencies. We investigated how to validate XML constraints when XML data is in its native form as a file or a stream, or stored in relational databases. The constraints can also be enforced incrementally when updates are made to the XML data. Furthermore, we have studied how to use the constraint information to guide the schema design to ensure data correctness and to remove redundancy when we store the data in relational databases.

Publications

Yi Chen, Susan B. Davidson, Carmem Hara, and Yifeng Zheng. “RRXS: Redundancy Reducing XML Storage in Relations.” In Proceedings of 29th International Conference on Very Large Data Bases (VLDB), pp. 189-200, 2003.
Yi Chen, Susan B. Davidson, and Yifeng Zheng. “XKvalidator: A Constraint Validator for XML. ” In Proceedings of 11th ACM Conference on Information and Knowledge Management (CIKM), pp. 446-452, 2002.
Yi Chen, Susan B. Davidson, and Yifeng Zheng. “Constraint Preserving XML Storage in Relations.” In Proceedings of 5th International Workshop on the Web and Databases (WebDB), in conjunction with SIGMOD, pp. 7-12, 2002.

People
Prof. Susan Davidson Prof. Carmem Hara Yi Chen Yifeng Zheng

Querying Linguistic Databases

Description
Describing and analyzing human languages depend on being able to manage large databases of annotated text and recorded speech. This project will apply research in relational and XML databases to linguistics, develop linguistic data models and query languages, and deploy them for creating, managing, analyzing, and displaying annotated linguistic databases.
Project web page: http://www.ldc.upenn.edu/Projects/QLDB/

Publications

Steven Bird, Yi Chen, Susan B. Davidson, Haejoong Lee, and Yifeng Zheng. "Designing and Evaluating an XPath Dialect for Linguistic Queries." In Proceedings of 22nd International Conference on Data Engineering (ICDE), 2006.
Steven Bird, Yi Chen, Susan B. Davidson, Haejoong Lee, and Yifeng Zheng. “Extending XPath to Support Linguistic Queries.” In Proceedings of Programming Language Technologies for XML (PLAN-X), 2005 (to appear).

People
Prof. Steven Bird Prof. Susan Davidson Prof. Mark Liberman Dr. Beatrice Santorini Yi Chen Baden Hughes Catherine Lai Haejoong Lee Yifeng Zheng

Querying Incomplete and Inconsistent Web Databases

ExpertNet: Collaboration Network for Intelligent Social Computing

SWAN: Smart Workflow Management

Information Extraction -- A Database Centric Approach

XML Stream Processing

XML Databases

XML Constraints

Querying Linguistic Databases