Prima

PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for Panta Rhei Information Management and Archival.

The main investigators are:

Hyun J. Moon (contact author): [1]

Carlo A. Curino: [2]

Alin Deutsch: [3]

Chien-Yi Hou

Carlo Zaniolo: [4]

Overview

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The ﬁrst is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a uniﬁed representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modiﬁcation Operators (SMO)s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

Experiment Data Set

Employee Database Schema Evolution: Synthetic data

Schema Change History in SMO: five schema versions over five year period
Queries: two queries for each of eight query classes
Data: MV-Document, Single-version V-Document (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers.

Wikipedia Database Schema Evolution: Real-world data

Schema Change History in SMOs: 171 schema versions over 4.5 years, taken and translated from MediaWiki SVN
Queries (tar file): 20 queries taken and translated from Wikipedia online profiler

AIMS

PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call AIMS.

Publications

"Managing and querying transaction-time databases under schema evolution" Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base VLDB, 2008. (PDF will be available soon)

Prima

Contents

Overview

Experiment Data Set

Employee Database Schema Evolution: Synthetic data

Wikipedia Database Schema Evolution: Real-world data

AIMS

Publications

Views

Personal tools

Navigation

Search

Toolbox