Difference between revisions of "Prima"

From Schema Evolution
Jump to: navigation, search
(Data set)
 
(30 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our '''PRIMA''' system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by '''PRIMA''' is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.
+
PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for ''Panta Rhei Information Management and Archival''.
  
 +
The main investigators are:
  
== Data set ==
+
Hyun J. Moon (contact author): [http://yellowstone.cs.ucla.edu/~hjmoon/]
 +
 
 +
Carlo A. Curino: [http://carlo.curino.us/]
 +
 
 +
Alin Deutsch: [http://db.ucsd.edu/people/alin/]
 +
 
 +
Chien-Yi Hou
 +
 
 +
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]
 +
 
 +
== Overview ==
 +
 
 +
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.
 +
 
 +
== Experiment Data Set ==
 
=== Employee Database Schema Evolution: Synthetic data ===
 
=== Employee Database Schema Evolution: Synthetic data ===
  
* Schema Change History in SMO: five schema versions over five year period
+
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt  Schema Change History in SMO]: five schema versions over five year period
* Queries: two queries for each of eight query classes
+
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes
* Data: MV-Document, Single-version V-Document (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers.  
+
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers.  
  
 
=== Wikipedia Database Schema Evolution: Real-world data ===
 
=== Wikipedia Database Schema Evolution: Real-world data ===
  
* Schema Change History in SMOs: 171 schema versions over 4.5 years, taken and translated from MediaWiki SVN
+
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]
* Queries (tar file): 20 queries taken and translated from Wikipedia online profiler
+
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]
 +
 
 +
== AIMS ==
 +
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[AIMS]].
 +
 
 +
== Publications ==
  
=== Publication ===
+
''"Managing and querying transaction-time databases under schema evolution"''  Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)
Coming soon
+
Contact: Hyun J. Moon [http://www.cs.ucla.edu/~hjmoon/]
+

Latest revision as of 17:33, 6 December 2010

PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for Panta Rhei Information Management and Archival.

The main investigators are:

Hyun J. Moon (contact author): [1]

Carlo A. Curino: [2]

Alin Deutsch: [3]

Chien-Yi Hou

Carlo Zaniolo: [4]

Contents

[edit] Overview

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators (SMO)s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

[edit] Experiment Data Set

[edit] Employee Database Schema Evolution: Synthetic data

[edit] Wikipedia Database Schema Evolution: Real-world data

[edit] AIMS

PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call AIMS.

[edit] Publications

"Managing and querying transaction-time databases under schema evolution" Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base VLDB, 2008. (PDF will be available soon)

Personal tools