http://yellowstone.cs.ucla.edu/schema-evolution/api.php?action=feedcontributions&user=Hjmoon&feedformat=atomSchema Evolution - User contributions [en]2024-03-28T18:11:46ZUser contributionsMediaWiki 1.20.2http://yellowstone.cs.ucla.edu/schema-evolution/index.php/PublicationsPublications2010-12-06T17:39:45Z<p>Hjmoon: </p>
<hr />
<div>This page collects the publications related to the Panta Rhei project: <br />
<br />
* "Schema Evolution in Wikipedia: toward a Web Information System Benchmark " Carlo A. Curino, Hjun J. Moon, Letizia Tanca, Carlo Zaniolo, '''ICEIS 2008'''[http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-schema-evolution.pdf]<br />
* "Information Systems Integration and Evolution: Ontologies at Rescue" Carlo A. Curino, Letizia Tanca, Carlo Zaniolo, '''STSM 2008'''[http://carlo.curino.us/documents/curino-STSM08-CR.pdf]<br />
* "Graceful database schema evolution: the prism workbench" Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo. '''VLDB, 2008''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/vldb08prism.pdf]<br />
* "Managing and querying transaction-time databases under schema evolution" H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. '''VLDB, 2008'''. [http://yellowstone.cs.ucla.edu/~hjmoon/publications/vldb08prima.pdf]<br />
* "Managing the History of Metadata in support for DB Schema Evolution" Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, '''ECDM 2008''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/ecdm08.pdf]<br />
* "The PRISM Workwench: Database Schema Evolution Without Tears" Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, demo paper at '''ICDE 2009''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/icde09prism-demo.pdf]<br />
* "PRIMA: Archiving and Querying Historical Data with Evolving Schemas" Hyun J. Moon, Carlo A. Curino, MyungWon Ham, Carlo Zaniolo, demo paper at '''SIGMOD, 2009''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/sigmod09prima-demo.pdf]<br />
* "Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas", Hyun J. Moon, Carlo Curino, Carlo Zaniolo, '''SIGMOD 2010''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/sigmod2010aims.pdf]<br />
* "Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++", Carlo Curino, Hyun J. Moon, Alin Deutsch, Carlo Zaniolo accepted at '''VLDB 2011''' [http://yellowstone.cs.ucla.edu/~hjmoon/publications/pvldb2011prism.pdf]</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PublicationsPublications2010-12-06T17:37:21Z<p>Hjmoon: </p>
<hr />
<div>This page collects the publications related to the Panta Rhei project: <br />
<br />
* "Schema Evolution in Wikipedia: toward a Web Information System Benchmark " Carlo A. Curino, Hjun J. Moon, Letizia Tanca, Carlo Zaniolo, '''ICEIS 2008'''[http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-schema-evolution.pdf]<br />
* "Information Systems Integration and Evolution: Ontologies at Rescue" Carlo A. Curino, Letizia Tanca, Carlo Zaniolo, '''STSM 2008'''[[http://carlo.curino.us/documents/curino-STSM08-CR.pdf]]<br />
* "Graceful database schema evolution: the prism workbench" Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo. '''VLDB, 2008'''<br />
* "Managing and querying transaction-time databases under schema evolution" H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. '''VLDB, 2008'''. <br />
* "Managing the History of Metadata in support for DB Schema Evolution" Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, '''ECDM 2008'''<br />
* "The PRISM Workwench: Database Schema Evolution Without Tears" Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, demo paper at '''ICDE 2009'''<br />
* "PRIMA: Archiving and Querying Historical Data with Evolving Schemas" Hyun J. Moon, Carlo A. Curino, MyungWon Ham, Carlo Zaniolo, demo paper at '''SIGMOD, 2009'''<br />
* "Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas", Hyun J. Moon, Carlo Curino, Carlo Zaniolo, '''SIGMOD 2010'''<br />
* "Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++", Carlo Curino, Hyun J. Moon, Alin Deutsch, Carlo Zaniolo accepted at '''VLDB 2011''' [http://www.vldb.org/pvldb/vol4/p117-curino.pdf]</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/Schema_Evolution_BenchmarkSchema Evolution Benchmark2010-12-06T17:35:41Z<p>Hjmoon: </p>
<hr />
<div>[[Image:Logo_SCHEVOBEN.png]]<br />
<br />
This webpage publishes the results of an intense analysis of the MediaWiki DB backend, which constitutes the core of the '''Pantha Rei Schema Evolution Benchmark'''. These results have been presented at ICEIS 2008 [http://www.iceis.org/] in the paper "Schema Evolution in Wikipedia: toward a Web Information System Benchmark". The paper is available here: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-schema-evolution.pdf]<br />
<br />
4.5 year of development have been analyzed and over 170 schema versions compared and studied. <br />
In this website we report the results of our analysis and provide the entire dataset we collected, to the purpose<br />
of defining a unified Benchmark for Schema Evolution.<br />
<br />
'''Author:'''<br />
* Carlo A. Curino (contact author) [http://carlo.curino.us/]<br />
* Hyun J. Moon[http://yellowstone.cs.ucla.edu/~hjmoon/] <br />
* Letizia Tanca[http://home.dei.polimi.it/tanca/] <br />
* Carlo Zaniolo[http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
<br />
__TOC__<br />
<br />
''If you are only interested in downloading the dataset please visit our [[Benchmark_Downloadables | '''Downloads section''']].''<br />
<br />
== Presentation and Screencast ==<br />
<br />
A PDF version of the presentation given at ICEIS 2008 is available at [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-ICEIS2008.pdf]. <br />
<br />
And the corresponding Screencast is available at: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-iceis2008.mov].<br />
<br />
<br />
== Introduction ==<br />
<br />
Every Information System (IS) is the subject of a constant evolution process to adapt the system to many factors such as changing requirements, new functionalities, compliance to new regulations, integration with other systems, and new security and privacy measures. The data management core of an IS is one of the most critical portions to evolve. Often based on Relational DataBase (DB) technology, the data management core of a system needs to evolve whenever the revision process requires modifications in the logical and physical organization of the data. Given its fundamental role, the evolution of the DB underlying an IS has a very strong impact on the applications accessing the data; thus, support for graceful evolution is of paramount importance. The complexity of DB and software maintenance, clearly, grows with the size and complexity of the system. Furthermore, when moving from intra-company systems -- typically managed by rather small and stable teams of developers/administrators -- to collaboratively-developed-and-maintained public systems, the need for a well-managed evolution becomes indispensable. Leading-edge web projects, characterized by massive collaborations and fast growth, experience a relentless drive for changes, which in turn generates a critical need for widespread consensus and rich documentation. Schema evolution has been extensively studied in the scenario of traditional information systems. An authoritative and comprehensive survey of the approaches to relational schema evolution and schema versioning is presented in <bibref f="defbib.bib">roddick95schema</bibref>. More recently, <bibref f="defbib.bib">rametal03</bibref> has surveyed schema evolution on the object-oriented, relational, and conceptual data models. Case studies on schema evolution on various application domains appear in <bibref f="defbib.bib">sjoberg93quantifying</bibref><bibref f="defbib.bib">marche93measuring</bibref>. Schema evolution has also been studied in the context of ''model management'' -- research which aims at developing a systematic approach to schema management and mapping <bibref f="defbib.bib">bernstein03applying</bibref>. Other interesting approaches tackled the problem of schema evolution in XML <bibref f="defbib.bib">MoroML07</bibref>, data warehouse <bibref f="defbib.bib">Rizzi07</bibref> and object-oriented databases <bibref f="defbib.bib">Galante:2005eq</bibref><bibref f="defbib.bib">Franconi:2001da</bibref>.<br />
<br />
Of particular interest, are Web Information Systems (WIS), often based on open-source solutions. This large and fast-growing class include, among many other examples: Content Management Systems, Wiki-based web portals, E-commerce systems, Blog, and Public Scientific Databases from `Big Science' Projects. The common denominator among these systems is the collaborative and distributed nature of their development and content management. Among the best known examples we have: ''MediaWiki'' [http://www.mediawiki.org], a website software underlying a huge number of web portals, including Wikipedia [http://wikipedia.org] and this one, ''Joomla'' [http://www.joomla.org], a complete Content Management System (CMS) and Web Application Framework, and ''TikiWiki'' [http://www.tikiwiki.org], an open source groupware and CMS solution.<br />
<br />
Moreover, inasmuch as large collaborative projects are now very common in natural science research, their reliance on databases and web systems as the venue needed to promptly shared results and data has created many large Scientific Databases, including the Human Genome DB [http://www.gdb.org/], HGVS [http://www.hgvs.org/index.html], and many others. Although different in many ways, these all share a common evolution problem for which the slow labor-intensive solutions of the past have become inadequate. New conceptual and operational tools are needed to enable graceful evolution by systematically supporting the migration of the DB and the maintenance of its applications. Among the desiderata in such a scenario, we seek systems that preserve and manage the past contents of a database and the history of its schema, while allowing legacy applications to access new contents by means of old schemas <bibref f="defbib.bib">vldb2008a</bibref><bibref f="defbib.bib">vldb2008b</bibref>.<br />
<br />
In the rest of this paper, we shall analyze the case of MediaWiki, a data-intensive, open-source, collaborative, web-portal software, originally developed to run Wikipedia, a multilingual, web-based, free-content encyclopedia: this platform is currently used by over 30,000 wikis, for a grand total of over 100 million pages (See http://s23.org/wikistats/). While the Wikipedia content evolution has been analyzed previously <bibref f="defbib.bib">barzan</bibref>, this report is the first that focuses on the problem of DB schema evolution. MediaWiki has seen, during its 4 years and 7 months of life, 171 different DB schema versions released to the public by means of a CVS/Subversion versioning system [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql]. As one can easily imagine, every schema change has a profound impact on the application queries and the code managing the results, which must thus be revised. In the case of MediaWiki, we observed in our analysis that only a small fraction (about 22% queries designed to run on old schema versions are still valid throughout the schema evolution (see discussion in Section ). Our analysis was made possible by the collaborative, public, and open-source nature of the development, documentation and release of MediaWiki and Wikipedia. <br />
<br />
'''Contributions''' The main contributions of this paper are the following: (i) we present the first schema evolution analysis of a real-life Web Information System DB, by studying the MediaWiki DB backend. This provides a deep insight on Wikipedia, one of the ten most popular websites to date [http://www.alexa.com] and reveals the need for DB schema evolution and versioning techniques, and (ii) we provide and plant the seeds of the first public, real-life-based, benchmark for schema evolution, which will offer to researchers and practitioners a rich data-set to evaluate their approaches and solutions. As a part of the benchmark, we also release a simple but effective tool-suite for evolution analysis.<br />
<br />
== '''Why Wikipedia''' ==<br />
[[Image:Viz.png|thumb|Google Search statistics on wikipedia popularity]]<br />
[[Image:Alexa.png|thumb|Alexa.com statistics on wikipedia popularity]]<br />
<br />
Wikipedia represent one of the 10 most popular websites in the WWW, is a DB-centric Web Information System, and is released under open-source license. <br />
<br />
As shown by the two following graphs, showing respectively the google search popularity and the percentage of user visiting Wikipedia by [http://www.alexa.com http://www.alexa.com].<br />
Moreover, the PHP back-end underlying wikipedia is shared by other 30,000 wikis. Both software and content are released under open-source licenses. This make Wikipedia, and thus MediaWiki a perfect starting point for our goals.<br />
<br />
== '''MediaWiki Schema Evolution''': a short Introduction ==<br />
Evolving the database that is at the core of an Information System represents a difficult maintenance problem <br />
that has only been studied in the framework of traditional information systems. However, the problem is likely <br />
to be even more severe in web information systems, where open-source software is often developed through <br />
the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in-depth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known <br />
example of a large family of web information systems built using the open-source '''MediaWiki''' software. Our <br />
study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation <br />
for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed <br />
us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of <br />
growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis <br />
suggests the need for developing better methods and tools to support graceful schema evolution (see [[Prism]] for the solution we propose). Therefore, <br />
we briefly discuss documentation and automation support systems for database evolution, and suggest that the <br />
Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.<br />
<br />
== MediaWiki Architecture ==<br />
[[Image:MediaWikiArchitecture.png|MediaWiki Architecture]]<br />
<br />
The MediaWiki software is a browser-based web-application, whose<br />
architecture is described in details in [Help:MediaWikiarchitecture] and in the MediaWiki Workbook2007 [http://www.scribd.com/doc/43868/Wikipedia-site-internals-workbook-2007?ga_related_doc=1].<br />
As shown in Figure, the users interact with the PHP frontend<br />
through a standard web browser, submitting a page request (e.g.,<br />
a search for pages describing ``Paris''). The frontend software consists of a<br />
simple presentation and management layer (MediaWiki PHP Scripts)<br />
interpreted by the Apache PHP engine.<br />
The user requests are carried out by generating appropriate SQL queries <br />
(or updates), that are then issued against the data stored in the backend DB<br />
(e.g., the database is queried looking for article's text containing the term ``Paris''). The backend DB can<br />
be stored in any DBMS: MySQL, being open-source and scalable, is the<br />
default DBMS for the MediaWiki software. The results returned by<br />
the DBMS are rendered in XHTML and delivered to the user's<br />
browser to be displayed (e.g., a set of of links to pages mentioning ``Paris''<br />
is rendered as an XHTML list). Due to the heavy load of the Wikipedia<br />
installation of this software, much of effort has been devoted to<br />
performance optimization, introducing several levels of caching<br />
(Rendered Web Page, DB caches, Media caches), which is particularly<br />
effective thanks to the very low rate (0.04\%) of updates w.r.t.<br />
queries. Obviously, every modification of the DB schema<br />
has a strong impact on the queries the frontend can pose. Typically<br />
each schema evolution step can require several queries to be<br />
modified, and so several PHP scripts (cooperating to interrogate the DB and render a page) to be manually fixed, in order to balance<br />
the schema changes.<br />
<br />
== MediaWiki Growth ==<br />
[[Image:Att-tab3.png|thumb|Attributes and Tables growth of MediaWiki Schema]]<br />
In this section, we analyze the schema evolution <br />
of MediaWiki based on its 171 schema versions, as <br />
committed to SVN between April 2003 (first schema <br />
revision) and November 2007 (date of this analysis). <br />
<br />
Schema Size Growth In Figures, we report <br />
the size of MediaWiki DB schema in history, in terms <br />
of the number of tables and columns, respectively. <br />
The graphs show an evident trend of growth in sizes, <br />
where the number of tables has increased from 17 <br />
to 34 (100% increase) and the number of columns <br />
from 100 to 242 (142%). Sudden drops in the graphs <br />
are due to schema versions with syntax errors, i.e., <br />
schema versions that could not be properly installed. <br />
In both graphs we observe different rates of growth <br />
over time, which seem to be related to the time periods preceding or following official releases of the <br />
overall software (see Table in section Available Software Version).<br />
<br />
Schema growth is due to three main driving forces <br />
as follows: <br />
* performance improvement, e.g., introduction of dedicated cache tables, <br />
* addition of new features, e.g., support for logging and content validation, <br />
* the growing need for preservation of DB content history, i.e., introduction of tables and columns to store outdated multimedia content such as the <br />
'filearchive' table. <br />
The Figure shows a histogram representation of the table lifetimes, in terms <br />
of number of versions. The lifetimes range from very <br />
long ones, e.g., the user table that was alive throughout the entire history, to short ones, e.g., random table that only survived for two revisions. On average, <br />
each table lasted 103.3 versions (60.4% of the total <br />
DB history). Figure 5 presents lifetimes of columns <br />
in histogram, where columns lasted 97.17 versions on <br />
average (56.8% of the total DB history). Interestingly, <br />
both figures show that there are two main groups of <br />
tables and columns: “short-living” and “long-living”. <br />
The former might be due to the fact that the schema <br />
has been growing lately so a significant portion of tables and columns has been introduced only recently. <br />
The latter can be explained noting that the core tables/columns tend to be rather stable throughout the <br />
entire history.<br />
<br />
Some historical statistics on Wikipedia content growth are available at http://stats.wikimedia.org/EN/TablesDatabaseSize.htm .<br />
<br />
[[Image:Monthly_Revision_Count.png|Per-Month Revision Count]]<br />
<br />
<br />
The Figure Per-month Revision Count shows <br />
how many schema versions were committed during <br />
each month in history, providing an estimation of the <br />
development effort devoted to the DB backend over time, while Table "MediaWiki Release History" shows the main releases of the sofware front-end.<br />
<br />
[[Image:mediawiki_releases.png|MediaWiki Relase History]]<br />
<br />
== More Advanced Statistics ==<br />
<br />
===Macro-Classification of Changes===<br />
[[Image:Change_types.png|"Macro Schema changes"]]<br />
<br />
We group the 170 evolution steps based on the types <br />
of evolution they present as in the Table. While the “actual schema changes” have an impact on the queries, <br />
as they modify the schema layout, the evolution of the <br />
DBMS engine, indexes, and data types, (while being <br />
relevant to performance) does not require any query <br />
correction, because of the physical data-independence <br />
provided by the DBMS. The table shows the frequencies of the types of changes among the 170 evolution steps. In particular, the table highlights that: <br />
<br />
* almost 55% of the evolution steps involve actual schema changes (further discussed in Section 3.3); <br />
<br />
* over 40% of the evolution steps involve index/key <br />
adjustments and this is due to the performance-critical role of the DB in a data-intensive, high-load, website such as Wikipedia; <br />
<br />
* 8.8% of the evolution steps were rollbacks to previous schema versions; <br />
<br />
* 7.6% of the analyzed evolution steps present only <br />
documentation changes.<br />
<br />
=== Micro-Classification of Changes ===<br />
<br />
[[Image:Smo_table.png| SMO Table]]<br />
<br />
'''Schema Modification Operators:''' To better understand the Relational DB schema evolution, we introduce a classification of the “actual schema changes”. <br />
Different formalisms can be exploited for this purpose. Shneiderman and Thomas proposed in [Shneiderman and Thomas, 1982] a comprehensive set of <br />
schema changes, including structural schema changes <br />
and also changes regarding the keys and dependencies. More recently, Bernstein et al. have also <br />
proposed a set of schema evolution primitives using <br />
algebra-based constraints as their primitives [Bernstein et al., 2006, Bernstein et al., 2008]. <br />
Among several options, we chose the Schema <br />
Modification Operators (SMOs) that we proposed in <br />
<bibref f="defbib.bib">vldb2008a</bibref><bibref f="defbib.bib">vldb2008b</bibref> (briefly described in Table 3). SMOs capture the essence of the <br />
existing works, but can also express schema changes <br />
not modeled by previous approaches. For example, by using function in the ADD COLUMN operator, <br />
SMOs can support semantic conversion of columns <br />
(e.g., currency exchange), column concatenation/split (e.g., different address formats), and other similar changes that have been heavily exploited in modeling MediaWiki schema changes. The effectiveness of SMOs have been validated in <bibref f="defbib.bib">vldb2008a</bibref><bibref f="defbib.bib">vldb2008b</bibref>, where the PRISM and <br />
PRIMA systems used SMOs to describe schema evolution in transaction-time databases and to support <br />
historical query reformulations over multi-schema-version transaction-time databases. <br />
The syntax of SMO is similar to that of SQL DDL <br />
[ISO/IEC 9075-*: 2003, 2003, Eisenberg et al., 2004], <br />
and provides a concise way to describe typical modifications of a database schema and the corresponding <br />
data migration. Every SMO takes as input a schema <br />
and produces as output a new version of the same <br />
schema. Table 3 presents a list of SMOs, operating on <br />
tables (the first six) and on columns (the last five) of a <br />
given DB schema, together with a brief explanation. <br />
Note that simple SMOs can be arbitrarily combined <br />
in a sequence, to describe complex structural changes, <br />
as those occured in the MediaWiki DB schema evolution. <br />
Classification Using SMOs In this context we exploit SMOs as a pure classification instrument to provide a fine-grained analysis of the types of change <br />
the schema has been subject to. While there might <br />
be several ways to describe a schema evolution step <br />
by means of SMOs, we carefully select, analyzing <br />
the available documentation, the most natural set of <br />
SMOs describing each schema change in the MediaWiki history. <br />
<br />
[[Image:Smo_frequency.png|SMO Frequency]]<br />
<br />
Table "SMO Frequency" shows the distribution of the <br />
SMOs, presenting, for each type, how many times <br />
it has been used in the entire schema evolution history. Is interesting to notice that the more sophisticated SMOs (e.g., MERGE TABLE) while being indispensable are not very common. The balance between <br />
column/table additions and deletions highlights the <br />
“content preserving” attitude of Wikipedia14 . <br />
<br />
[[Image:Smo_per_version.png| SMO per version]]<br />
<br />
In Figure we show the number of SMOs (overall) for <br />
each evolution step. The curve shows how the schema <br />
evolution has been mainly a continuous process of adjustment, with few exceptions shown as spikes in the <br />
figure, coinciding with major evolution steps, such as: <br />
<br />
* v6696 (41st version) - v6710 (42nd version), 92 <br />
SMOs: a change in the storage strategy of the ar- <br />
ticle versions, <br />
<br />
* v9116 (61st version) - v9199 (62nd version), 12 <br />
SMOs: a change in link management, <br />
<br />
* v20393(138th version) - v20468 (139th version), 9 SMOs: history management (deletion and log <br />
features added to several tables).<br />
<br />
== The Impact on Applications ==<br />
<br />
[[Image:real_impact.png|thumb|Real Queries Impact (backward)]]<br />
<br />
<br />
In order to study the effect of schema<br />
evolution on the frontend application, we analyze the impact of the<br />
schema changes on six representative sets of queries. Each<br />
experiment tests the success or failure of a set of queries,<br />
originally designed to run on a specific schema version, when issued<br />
against other schema versions.<br />
<br />
=== Experimental Setting ===<br />
To simulate a case where current applications are run on databases<br />
under older schema versions, we test three sets of queries, valid on<br />
the last schema version, on all the previous schema versions<br />
(first Figure). Also, to study how legacy<br />
applications succeed or fail on newer versions of the database schema,<br />
we test three sets of legacy queries on all the<br />
subsequent schema versions (second Figure). The six<br />
sets considered in our experiments are as follows:<br />
<br />
* real-world templates, current (first Figure): the 500 most common query templates (extracted from over 780 millions of query instances), derived from the [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia on-line profiler] and post-processed for cleaning;<br />
<br />
* lab-gen queries, current (first Figure): 2496 query instances generated by a [http://yellowstone.cs.ucla.edu/wiki-real-data/index.php/Main_Page local installation] of the current version of MediaWiki (release 1.11, schema version 171), interacting with the frontend and logging the queries issued against the underlying MySQL DBMS;<br />
<br />
* lab-gen templates, current (first Figure): 148 templates of queries extracted from the above lab-gen queries, current; <br />
<br />
[[Image:lab_impact.png|thumb|Lab-generated Queries Impact (forward)]]<br />
<br />
* lab-gen queries, legacy (second Figure): 4175 query instances generated by a local installation of an old version of MediaWiki (release 1.3\footnote{The oldest version compatible with the environment of our experimental setting.}, schema version 28), interacting with the frontend and logging the queries issued against the underlying MySQL DBMS;<br />
<br />
* lab-gen templates, legacy (second Figure): 74 templates extracted from the above lab-gen queries, legacy;<br />
<br />
* synthetic probe queries, legacy (second Figure): 133 synthetic queries accessing single columns (i.e., <tt> select tab_j.att_i from tab_j</tt>) of schema version 28, designed to highlight the schema portion affected by the evolution.<br />
<br />
<br />
=== The Results ===<br />
Each set has been tested against all schema versions: the resulting query execution success rates are shown in the first Figure (for the first three sets) and second Figure (for the last three sets).<br />
The outliers in the graphs (sudden and extremely low values) are due to syntactically incorrect DB schema versions.<br />
<br />
The first three sets are shown in the first Figure. It is<br />
interesting to notice that:<br />
<br />
* proceeding from right to left, a series of descending steps illustrates that more and more of the current queries become incorrect as we move to older schemata.<br />
<br />
* the sudden drop in query success -- of about 30% -- which appears between commit revisions v6696 (41st schema version) and v6710 (42nd schema version) for SVN commit version to ordinal numbers conversion. highlights one of the most intense evolution steps of the MediaWiki data management core, involving a deep change in the management of article revisions;<br />
<br />
* the lab-generated and real-world templates carry very similar information. This seems to indicate that our local query generation method is capable of producing a representative set of queries.<br />
<br />
The second Figure shows a graph of the average execution<br />
success rates for the latter three query sets. Some interesting<br />
observations are as follows:<br />
<br />
* the synthetic probe queries, by failing systematically when a column or a table are modified, highlight the portion of the schema affected (changed in such a way that makes query to fail) by the evolution. The figure shows how the schema evolution invalidates (in the worst case) only the 32% of the schema. <br />
<br />
* in the last version, a very large portion (77%) of the lab-gen templates fails due to schema evolution.<br />
<br />
* for lab-gen templates, the big evolution step between commit revisions v6696 (41st schema version) and v6710 (42nd schema version) invalidates over 70% of the queries.<br />
<br />
* lab-gen templates failure rate compared to synthetic probe queries failure rate (representing the affected portion of the schema) exposes that the schema modifications mainly affected the portion of the schema heavily used by the application (32% of the schema being affected invalidates 77% of the query templates).<br />
<br />
* the gap between the success rate of legacy query instances (2.9%) and legacy query templates (22%) shows that the failing templates actually correspond to the most common query instances (in our distribution).<br />
<br />
Finally it is interesting to notice that the number of features of the MediaWiki<br />
software has grown in time; this explains the growth in the number of the query templates <br />
extracted from legacy queries (74) and current queries (148).<br />
This also affects the percentage (but not the absolute number) of queries failing due to each schema<br />
evolution (the current-query graph appear smoother).<br />
<br />
All in all these experiments provide a clear evidence of the strong impact of schema changes on applications, and support the claim for better schema evolution support.<br />
<br />
== The Tool-suite ==<br />
To collect the statistics described in this paper, we developed a set of tools, organized in a tool-suite.<br />
This step-by-step process, primarily designed to help researchers to gain better insight in the schema evolution of existing Information Systems, can be effectively exploited by:<br />
<br />
<br />
* DB administrators and developers, in any data-centric scenario, to analyze the history of the DB schema and create a (summarized) view of its evolution history. The tool suite will support the analysis of the evolution process and help to highlight possible flaws in the design and maintenance of the Information System.<br />
* Researchers and designers of support methods and tools for DB evolution and versioning, to test their approaches against real-life scenarios.<br />
<br />
We now discuss some of the features of our tool-suite referring to its application to the MediaWiki DB.<br />
<br />
First of all, by means of an appropriate tool, the 171 MediaWiki DB schema versions have been downloaded<br />
from SVN repository and batch-installed in a MySQL DBMS (MySQL version 5.0.22-Debian).<br />
We developed a tool, named <tt> statistics_collection</tt>, that can be applied on this data to derive the basic statistics of schema versions, such as schema size and average table/column lifetime.<br />
The <tt>statistics_collection</tt> tool queries the MySQL data dictionary (the <tt>information_schema</tt> meta-database) to gather the statistical measures presented so far.<br />
<br />
For fine-grained view of the schema evolution we also provide the <tt>Schema Modification Operators extraction</tt> tool. <br />
This tool, by operating on the differences between subsequent schema versions, semiautomatically extracts a set of candidate SMOs <br />
describing the schema evolution, minimizing the user effort (Complex evolution patterns as the one appeared from the 41st and 42nd schema versions in MediaWiki, require the user to refine the set of SMOs according to his/her understanding of the schema evolution.).<br />
To estimate query success against different schema versions, the users can<br />
exploit a tool named <tt>query_success analyzer</tt>. This tool performs a query success rate analysis by<br />
batch-running its input queries against all schema versions. The tool, relying on MySQL query engine, measures and computes both per-query and aggregate success ratios (this is clearly a rough-estimation).<br />
<br />
For users' convenience, we also provide a <tt>log_analyzer</tt> which can be used to extract and clean the<br />
SQL query instances and templates from the raw <tt>mysql_log</tt> format. <br />
<br />
Every component of the tool-suite stores the collected information, in a non-aggregated form, in a database,<br />
named <tt>evolution_metadb</tt>. This database is later queried to provide<br />
statistical measures of the schema evolution. This approach, relying<br />
on the SQL aggregation operators, offers the user a flexible<br />
interface. The graphs and tables presented in this paper have been<br />
derived by means of appropriate SQL queries on the <tt> evolution_metadb</tt>; all the data<br />
collected for our MediaWiki analysis are released to the public and available in our [[Benchmark_Downloadables | '''Downloads section''']].<br />
<br />
== Downloads ==<br />
<br />
All the data, schemas, queries, tools are available to be download. Moreover, a couple of useful services are running on our servers providing a sandbox for testing and investingating schema evolution. <br />
<br />
Please visit our [[Benchmark_Downloadables | '''Downloads section''']]<br />
<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/AIMSAIMS2010-12-06T17:35:02Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
<br />
AIMS is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In AIMS, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes. For more detail check the paper at SIGMOD 2010 <bibref f="defbib.bib">Sigmod2010</bibref><br />
The main investigators are:<br />
<br />
Hyun J. Moon (contact author): [http://yellowstone.cs.ucla.edu/~hjmoon/]<br />
<br />
Carlo A. Curino: [http://carlo.curino.us/]<br />
<br />
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
== Astract ==<br />
The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database contents using the schema versions under which they were originally created. This causes major usability and scalability problems in preservation,retrieval and querying of databases with intense evolution histories, i.e., hundreds of schema versions.This scenarios are common in web information systems and scientific databases that frequently accumulate that many versions in just few years.<br />
<br />
Our system, Archival Information Management System (AIMS), solves this usability issue by letting users write queries against a chosen schema version and then performing for the users the rewriting and execution of queries on all the appropriate schema versions. AIMS achieves scalability by using (i) and advanced storage strategy based on relational technology and attribute-level-timestamping of the history of the database content, (ii) suitable temporal indexing and clustering techniques, and (iii) novel temporal query optimizations. In particular, with AIMS we introduce a novel technique called CoalNesT that achieves unprecedented performance when temporal coalescing tuples fragmented by schema changes.Extensive experiments show that the performance and scalability thus achieved greatly exceeds those obtained by previous approaches.<br />
The AIMS technology is easily deployed by plugging into existing DBMS replication technologies, leading to very low overhead;moreover, by decoupling logical and physical layer provides multiple query interfaces, from the basic archive&query features considered in the upcoming SQL standards, to the much richer XML XQuery capabilities proposed by temporal database researchers.<br />
<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2010-12-06T17:33:39Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for ''Panta Rhei Information Management and Archival''.<br />
<br />
The main investigators are:<br />
<br />
Hyun J. Moon (contact author): [http://yellowstone.cs.ucla.edu/~hjmoon/]<br />
<br />
Carlo A. Curino: [http://carlo.curino.us/]<br />
<br />
Alin Deutsch: [http://db.ucsd.edu/people/alin/]<br />
<br />
Chien-Yi Hou<br />
<br />
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Experiment Data Set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== AIMS ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[AIMS]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrismPrism2010-12-06T17:30:13Z<p>Hjmoon: </p>
<hr />
<div>[[Image:PRISM-LOGO.png]]<br />
<br />
'''PRISM''' is a system that supports Schema Evolution by means of schema mapping and query rewriting. <br />
'''PRISM''' is a joint project of Politecnico di Milano and University of California, Los Angeles. <br />
The main investigators are:<br />
<br />
Carlo A. Curino (contact author): [http://carlo.curino.us/]<br />
<br />
Hyun J. Moon: [http://yellowstone.cs.ucla.edu/~hjmoon/]<br />
<br />
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
== Abstract ==<br />
<br />
Supporting graceful schema evolution represents an unsolved<br />
problem for traditional information systems that is further<br />
exacerbated in web information systems, such as Wikipedia and<br />
public scientific databases: in these projects based on multiparty<br />
cooperation the frequency of database schema changes has increased<br />
while tolerance for downtimes has nearly disappeared. As of<br />
today, schema evolution remains an error-prone and time-consuming<br />
undertaking, because the DB Administrator (DBA) lacks the methods<br />
and tools needed to manage and automate this endeavor by (i)<br />
predicting and evaluating the effects of the proposed schema<br />
changes, (ii) rewriting queries and applications to operate on the<br />
new schema, and (iii) migrating the database. <br />
<br />
Our '''PRISM''' system takes a big first step toward addressing this pressing need by<br />
providing: (i) a language of Schema Modification Operators ([[SMO]]) to<br />
express concisely complex schema changes, (ii) tools that allow<br />
the DBA to evaluate the effects of such changes, (iii) optimized<br />
translation of old queries to work on the new schema version, (iv)<br />
automatic data migration, and (v) full documentation of intervened<br />
changes as needed to support data provenance, database flash<br />
back, and historical queries. '''PRISM''' solves these<br />
problems by integrating recent theoretical advances on mapping<br />
composition and invertibility, into a design that also achieves<br />
usability and scalability. Wikipedia and its 240+ schema versions<br />
provided an invaluable testbed for validating '''PRISM''' tools<br />
and their ability to support legacy queries.<br />
<br />
<br />
Furthermore, we address this issue by introducing a formal evolution model for the database schema structure and its integrity constraints, and use it to derive update mapping techniques akin to the rewriting techniques used for queries. Thus, we (i) propose a new set of Integrity Constraints Modification Operators (ICMOs), (ii) characterize the impact on integrity constraints of structural schema changes, (iii) devise representations that enable the rewriting of updates, and (iv) develop a unified approach for query and update rewriting under constraints. We then describe the efficient implementation of these techniques provided by our PRISM++ system. The effectiveness of PRISM++ and its enabling technology has been verified on a testbed containing the evolution histories of several scientific databases and web information systems, including the Genetic DB Ensembl (410+ schema versions in 9 years), and Wikipedia (240+ schema versions in 6 years).<br />
<br />
== On-line Demo ==<br />
<br />
[[Image:Prism screenshot.png|thumb|Prism: a tool for schema evolution support]]<br />
<br />
While the actual demo is available online at: [[PrismDemo | Prism a tool for schema evolution support]].<br />
The Demo is still under development and has limited functionalities w.r.t. the internal prototype, but we are working on it... stay tuned!<br />
<br />
== Screencasts ==<br />
<br />
There is a '''Video''' of the Demo available at: http://yellowstone.cs.ucla.edu/schema-evolution/documents/videos/PRISM++.mov<br />
<br />
Update rewriting functionalities are shocase in a '''Video''' of the Demo available at: http://yellowstone.cs.ucla.edu/schema-evolution/documents/Prism++.mov<br />
<br />
<br />
There is a '''Video''' of the VLDB presentation at: http://yellowstone.cs.ucla.edu/schema-evolution/documents/prism-vldb2008.mov<br />
<br />
== Publications ==<br />
<br />
''"Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++"'' Carlo Curino Hyun J. Moon, Alin Deutsch, Carlo Zaniolo, PVLDB, (2011).<br />
<br />
''"Graceful database schema evolution: the prism workbench"'', Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, to appear in '''VLDB 2008''' [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino08graceful.pdf PDF]<br />
<br />
''"Information Systems Integration and Evolution: Ontologies at Rescue"'', Carlo A. Curino, Letizia Tanca, Carlo Zaniolo, '''STSM 2008''' [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino-STSM08-CR.pdf PDF]<br />
<br />
''"The PRISM Workwench: Database Schema Evolution Without Tears"'', Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, DEMO paper at '''ICDE 2009'''.<br />
<br />
== Bibtex ==<br />
@INPROCEEDINGS{curino-vldb2008a,<br />
author = {Carlo A. Curino and Hyun J. Moon and Carlo Zaniolo},<br />
title = {Graceful database schema evolution: the prism workbench},<br />
booktitle = {Very Large Data Base (VLDB)},<br />
year = {2008}<br />
}<br />
<br />
@INPROCEEDINGS{curino-stsm2008,<br />
author = {Carlo A. Curino and Letizia Tanca and Carlo Zaniolo},<br />
title = {Information Systems Integration and Evolution: Ontologies at Rescue},<br />
booktitle = {International Workshop on Semantic Technologies in System Maintenance (STSM)},<br />
year = {2008}<br />
}<br />
<br />
== Commercial Competitors ==<br />
Here we list a series of commercial tools which partially tackle the problem of Schema Evolution. None of this support automatic query rewriting and data migration impact analysis as done by prototype PRISM. Most of them focus on comparing schema versions and creating report of the differences. Another common features is analyzing the impact of the change w.r.t. other DB objects such as store procedures, views, constraints, but none of this provide analysis of the impact on query and applications.<br />
<br />
* DB2 Change Management Expert [http://publib.boulder.ibm.com/infocenter/mptoolic/v1r0/index.jsp?topic=/com.ibm.db2tools.chx.doc.ug/chxucoview01.htm]<br />
* Oracle Change Management Pack [http://www.oracle.com/technology/products/oem/pdf/ds_change_pack.pdf]<br />
* MySQL Workbench for Schema Change [http://www.mysql.com/products/workbench/]<br />
* SwisSQL DBChangeManager (MSSQL) (DB compare and synchronization) [http://www.swissql.com/products/database-compare-synchronize-tool/index.html]<br />
* Idera SQL Change [http://www.idera.com/products/sqlchange/]<br />
* Embarcadero Change Manager [http://www.embarcadero.com/products/changemanager/]<br />
* Red-Gate SQL Compare [http://www.red-gate.com/products/SQL_Compare/index.htm]<br />
* Best Soft Tool [http://bestsofttool.com/SQLDBCompare/SDC_Feature.aspx]<br />
* Toad® DBA Suite [http://www.quest.com/toad-dba-suite-for-oracle/] [http://www.quest.com/toad-dba-suite-for-db2]<br />
* Aldon lifecycle management solution [http://www.aldon.com/solutions/standards/database.aspx]<br />
<br />
== Useful Links ==<br />
<br />
* Management-hub.com [http://www.management-hub.com/changemanagement70.html]<br />
* Application specific schema evolution in Django [http://code.djangoproject.com/wiki/SchemaEvolution]<br />
<br />
== Detailed Description ==<br />
<br />
=== Introduction ===<br />
The incessant pressure of schema evolution is impacting every database, from the world's<br />
largest (Source: http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html) ``World Data<br />
Centre for Climate'' featuring over 6 petabytes of data, to the smallest single-website DB.<br />
DBMSs have long addressed, and largely solved, the physical data independence problem,<br />
but their progress toward logical data independence and graceful schema evolution<br />
has been painfully slow. Both practitioners and researchers are well aware that schema<br />
modifications can: (i) dramatically impact both data<br />
and queries <bibref f="defbib.bib">iceis2008</bibref>, endangering the data integrity,<br />
(ii) require expensive application maintenance for queries, and (iii) cause unacceptable system downtimes. <br />
The problem is particularly serious in Web Information Systems, such as Wikipedia, where<br />
significant downtimes are not acceptable while a mounting pressure for<br />
schema evolution follows from the diverse and complex requirements of its<br />
open-source, collaborative software-development environment <bibref f="defbib.bib">iceis2008</bibref>.<br />
The following comment(From the SVN commit 5552 accessible at: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=5552). by a senior MediaWiki <bibref f="defbib.bib">mediawiki</bibref> DB designer,<br />
reveals the schema evolution dilemma faced today by DataBase Administrators (DBAs):<br />
''This will require downtime on<br />
upgrade, so we're not going to do it until we have a better idea of the cost<br />
and can make all necessary changes at once to minimize it.''<br />
<br />
Clearly, what our DBA needs is the ability to (i) predict and evaluate the impact<br />
of schema changes upon queries and applications using those queries, and (ii)<br />
minimize the downtime by replacing, as much as possible, the current manual process with<br />
tools and methods to automate the process of database migration and query rewriting.<br />
The DBA would also like (iii) all these changes documented automatically<br />
for: data provenance, flash-backs to previous schemas,<br />
historical queries, and case studies to assist on future problems.<br />
<br />
There has been much recent work and progress on<br />
theoretical issues relating to schema modifications<br />
including mapping composition, mapping invertibility, and<br />
query rewriting. <br />
<br />
These techniques have often been used for heterogenous database integration; in<br />
the ''Panta Rhei Information Schema Manager (PRISM)'' we exploit them to automate the transition to a new schema on behalf of a DBA. <br />
In this setting, the semantic relationship between source and target schema, <br />
deriving from the schema evolution, is more crisp and better understood by the DBA than in typical database integration scenarios. <br />
Assisting the DBA during the design of schema evolution, '''PRISM''' can thus achieve objectives (i-iii) <br />
above by exploiting those theoretical advances, and prompting further DBA input in those rare situations <br />
in which ambiguity remains.<br />
<br />
Therefore, '''PRISM''' provides an intuitive, operational interface, used by the DBA to evaluate<br />
the effect of a possible evolution steps w.r.t. redundancy,<br />
information preservation, and impact on queries. Moreover,<br />
'''PRISM''' automates error-prone and time-consuming tasks such as<br />
query translation, computation of inverses, and data migration.<br />
As a by-product of its use '''PRISM''' creates a complete, unambiguous documentation of the schema<br />
evolution history, which is invaluable to support data provenance, database flash<br />
backs, historical queries, and user education about standard practices, methods and tools.<br />
<br />
<br />
'''PRISM''' exploits the concept of Schema<br />
Modification Operators (SMO) <bibref f="defbib.bib">BernsteinGMN08</bibref>, representing atomic schema changes, which we then modify and enhance by <br />
(i) introducing the use of functions for data type and semantic conversions, <br />
(ii) providing a mapping to Disjunctive Embedded Dependencies (DEDs), <br />
(iii) obtain invertibility results compatible to <bibref f="defbib.bib">fagin2007c</bibref>, and <br />
(iv) define the translation into efficient SQL primitives to perform the data migration.<br />
The system has been tested and validated against the benchmark for schema evolution <br />
defined in <bibref f="defbib.bib">iceis2008</bibref>, which is built over the actual database schema evolution history of Wikipedia (170+ schema<br />
versions in 4.5 years). Its ability to handle the very complex evolution of one of the ten most popular website of the World Wide Web (Source: http://www.alexa.com). <br />
offers an important validation of practical soundness and completeness of our approach.<br />
<br />
<br />
While Web Information Systems represent an extreme case, where the<br />
need for evolution is exacerbated <bibref f="defbib.bib">iceis2008</bibref> by the fast<br />
evolving environment in which they operates, every DBMS would<br />
benefit from ''graceful schema evolution''. In<br />
particular every DB accessed by applications inherently ''hard to<br />
modify'' like: public Scientific Databases accessed by<br />
applications developed within several independent<br />
institutions, DB supporting legacy applications (impossible to modify), and system involving<br />
closed-source applications foreseeing high adaptation costs. <br />
Transaction time databases with evolving schema represent an interesting scenario were similar techniques can be applied \cite{vldb2008b}.<br />
<br />
==== Contributions.====<br />
The '''PRISM''' system, harness recent theoretical advances<br />
<bibref f="defbib.bib">deutsch03mars</bibref><bibref f="defbib.bib">fagin2007a</bibref> into practical solutions,<br />
through an intuitive interface, which masks the complexity of underling tasks, such<br />
as logic-based mappings between schema versions, mapping composition, and mapping<br />
invertibility. By providing a simple operational interface and speaking<br />
commercial DBMS jargon, '''PRISM''' provides a user-friendly, robust bridge to the<br />
practitioners' world. System scalability and usability have been<br />
addressed and tested against one of the most intense histories of schema<br />
evolution available to date: the schema evolution<br />
of Wikipedia, featuring in 4.5 years over 170+ documented schema<br />
versions and over 700 gygabytes of data <bibref f="defbib.bib">barzan</bibref>.<br />
<br />
=== Related Works ===<br />
<br />
<br />
Some of the most relevant approaches to the general problem of schema evolution are the impact-minimizing methodology of <br />
<bibref f="defbib.bib">Ra:2005vj</bibref>, the unified approach to application and database evolution of <br />
<bibref f="defbib.bib">1228375</bibref>, the application-code generation of <br />
<bibref f="defbib.bib">cleve2006</bibref> and the framework for metadata model management of <br />
<bibref f="defbib.bib">melnik03rondo</bibref> and the further contributions <br />
<bibref f="defbib.bib">bernstein03applying</bibref><bibref f="defbib.bib">bernstein03data</bibref><bibref f="defbib.bib">velegrakis03mapping</bibref><bibref f="defbib.bib">yu05semantic</bibref>.<br />
While these and other interesting attempts provide solid theoretical foundations and interesting methodological approaches, the lack of operational tools for graceful schema evolution observed by Roddick in <bibref f="defbib.bib">roddick95schema</bibref> remains largely unsolved twelve years later.<br />
'''PRISM''' represents, at the best of our knowledge, the most advanced attempt in this direction available to date.<br />
<br />
The operational answer to the issue of schema evolution used by '''PRISM''' exploits some of the most recent results on mapping composition<br />
<bibref f="defbib.bib">nash05composition</bibref>, mapping invertibility<br />
<bibref f="defbib.bib">fagin2007c</bibref>, and query rewriting <br />
<bibref f="defbib.bib">deutsch03mars</bibref>.<br />
The SMO language used here captures the essence of existing works <br />
<bibref f="defbib.bib">BernsteinGMN08</bibref>, but extends them with functions,<br />
for expressing data type and semantic conversions. <br />
The translation between SMOs and Disjunctive Embedded Dependencies (DED) exploited here is similar to the incremental adaptation approach of <br />
<bibref f="defbib.bib">velegrakis03mapping</bibref>, but achieves different goals.<br />
The query rewriting portion of '''PRISM''' exploits theories and tools developed in the context of the MARS project <br />
<bibref f="defbib.bib">deutsch01ded</bibref><bibref f="defbib.bib">deutsch03mars</bibref>. <br />
The theories of mapping composition studied in <bibref f="defbib.bib">madhavan03composing</bibref><bibref f="defbib.bib">Fagin04composing</bibref><bibref f="defbib.bib">nash05composition</bibref><bibref f="defbib.bib">BernsteinGMN08</bibref>, and the concept of invertibility recently investigated by Fagin et al. in <br />
<bibref f="defbib.bib">fagin2007c</bibref><bibref f="defbib.bib">fagin2007a</bibref> support the notion of SMO composition and inversion.<br />
<br />
The big players in the world of commercial DBMSs have been mainly focusing on reducing the downtime when the schema is updated <br />
<bibref f="defbib.bib">oraclewhitepaper</bibref> and on assistive design tools <bibref f="defbib.bib">db2changemanagementexpert</bibref>, and lack the automatic query rewriting features provided in '''PRISM'''. <br />
Other tools of interest are <bibref f="defbib.bib">chimera</bibref> and LiquiBase (Available on-line: http://www.liquibase.org/).<br />
<br />
Further related works include the results on mapping information preservation by Barbosa et al. <br />
<bibref f="defbib.bib">BarbosaFM05</bibref>, the ontology-based repository of <br />
<bibref f="defbib.bib">bounif2006</bibref>, the schema versioning approaches of <br />
<bibref f="defbib.bib">jagadish97</bibref>. XML schema evolution has been addressed in <bibref f="defbib.bib">MoroML07</bibref> by means of a guideline-driven approach. Object-oriented schema evolution has been investigated in <br />
<bibref f="defbib.bib">Galante:2005eq</bibref>. <br />
In the context of data warehouse X-TIME represents an interesting step toward schema versioning by means of the notion of augmenting schema <br />
<bibref f="defbib.bib">GolfarelliLRV04</bibref><bibref f="defbib.bib">Rizzi07</bibref>. <br />
'''PRISM''' differs form all the above in terms of both goals and techniques.<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/Main_PageMain Page2010-12-06T17:27:07Z<p>Hjmoon: </p>
<hr />
<div>This wiki reports the research advances of the '''Panta Rhei''', a research project for data management under schema evolution.<br />
<br />
The Panta Rhei project aims at providing powerful<br />
tools that: (i) facilitate schema evolution and guide the Database<br />
Administrator in planning and evaluating changes, (ii) support<br />
automatic rewriting of legacy queries against the current schema<br />
version, (iii) enable efficient archiving of the histories of data<br />
and metadata, and (iv) support complex temporal queries over such<br />
histories, and (v) automate the documentation and querying of metadata histories.<br />
<br />
<br />
== People ==<br />
* MIT<br />
** Carlo A. Curino [http://carlo.curino.us/]<br />
<br />
* Politecnico di Milano<br />
** Letizia Tanca [http://home.dei.polimi.it/tanca/]<br />
** Fabrizio Moroni<br />
<br />
* UCLA<br />
** Hyun J. Moon [http://yellowstone.cs.ucla.edu/~hjmoon/] <br />
** Carlo Zaniolo [http://www.cs.ucla.edu/~zaniolo/]<br />
** Myungwon Ham<br />
<br />
* UC San Diego<br />
** Alin Deutsch [http://db.ucsd.edu/People/alin/]<br />
** Chien-Yi Hou<br />
<br />
== Projects ==<br />
Within this macro-project the following projects have been developed (please follow the links for further details):<br />
<br />
* The [[Schema_Evolution_Benchmark | '''Pantha Rei Schema Evolution Benchmark''']] <bibref f="defbib.bib">iceis2008</bibref>, a benchmark for schema evolution developed from the actual evolution of the MediaWiki DB backend.<br />
* The [[HMM | '''History Metadata Manager''']] <bibref f="defbib.bib">ecdm2008</bibref>, a tool to support temporal queries over metadata histories, and its Semantic Web Extension the [[SHMM | '''Semantic HMM''']] <bibref f="defbib.bib">stsm2008</bibref><br />
* The prototype system [[Prism|'''PRISM: tool for Graceful Schema Evolution''']] <bibref f="defbib.bib">vldb2008a</bibref>, <bibref f="defbib.bib">curinovldb2011update</bibref>, and its demo [[PrismDemo| '''Prism demo page''' ]]<bibref f="defbib.bib">icde2009demo</bibref><br />
<br />
* The prototype system [[Prima|'''PRIMA: a system for querying Transaction-Time DB under evolving schema''']] <bibref f="defbib.bib">Vldb2008b</bibref> and its demo<bibref f="defbib.bib">moon09sigmoddemo</bibref>.<br />
<br />
* The extension of PRIMA named [[AIMS|'''Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas''']] <bibref f="defbib.bib">Sigmod2010</bibref>.<br />
<br />
== Funding ==<br />
This work was supported in part by '''NSF-IIS''' award '''0705345''': ''“III-COR: Collaborative Research: Graceful Evolution and Historical Queries in Information Systems – A Unified Approach"''<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T22:17:51Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
** Data generator for employee DB under schema evolution <br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.zip PrimaEmpBenchmarkDataGen.zip] (version 0.01)<br />
** XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (1.17MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (11.7MB): same data with 10,000 employees<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (116.5MB): same data with 100,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment<br />
<br />
* Query rewriting engine - coming soon</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T22:06:22Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
** Data generator for employee DB under schema evolution <br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.zip PrimaEmpBenchmarkDataGen.zip] (version 0.01)<br />
** XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment<br />
<br />
* Query rewriting engine - coming soon</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:24:15Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
** Data generator for employee DB under schema evolution <br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
** XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment<br />
<br />
* Query rewriting engine - coming soon</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:23:43Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
** Data generator for employee DB under schema evolution <br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
** XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:23:23Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
* Data generator for employee DB under schema evolution <br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:23:00Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
* Data generator for employee DB under schema evolution <br />
<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:22:12Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data Set<br />
** Data generator for employee DB under schema evolution <br />
<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
** XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
*** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries] used in experiment</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:19:31Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the fragmented history by schema changes.<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries]</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:18:17Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes. <br />
<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries]</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-30T21:18:03Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[Prima]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes. <br />
<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees<br />
<br />
* [ http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/queries Queries]</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T17:52:57Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes. <br />
<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage re-use of our data for other research efforts.<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T17:50:33Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes. <br />
<br />
<br />
== Experiment Data Set ==<br />
Here we share the datasets and tools used for experiment with H-PRIMA to increase the repeatability and verifiability of our results, and also to encourage the usage of our data in other research projects.<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T17:47:38Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T17:19:52Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]] under development. It supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T17:18:57Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes. The project is still in-progress and we post some tools data sets used for the experiment in this page.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:53:17Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Data generator for employee DB under schema evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:53:01Z<p>Hjmoon: /* Experiment Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Data Generator for Employee DB under Schema Evolution: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:52:07Z<p>Hjmoon: /* Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Experiment Data Set ==<br />
<br />
* Employee Data Set Generator: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:51:52Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The original PRIMA is based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
<br />
* Employee Data Set Generator: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:51:11Z<p>Hjmoon: /* Data set */</p>
<hr />
<div>PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Experiment Data Set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== H-PRIMA ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:50:51Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a transaction-time DBMS that supports schema evolution. It supports management and querying of evolving data under evolving schema. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== H-PRIMA ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:49:37Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a system that supports management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== H-PRIMA ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:49:27Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== H-PRIMA ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:31:51Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
<br />
* Employee Data Set Generator: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:31:23Z<p>Hjmoon: /* Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
<br />
* Employee Data Set Generator: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/PrimaEmpBenchmarkDataGen.jar PrimaEmpBenchmarkDataGen.jar] (version: 0.01)<br />
<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:28:11Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data ('''Warning''': simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:27:53Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (Warning: simple left-click on these files could take forever - download them with right-click.)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:24:31Z<p>Hjmoon: /* Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (Warning: viewing these XML files with browser can take very long!)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp1h-vdoc.xml emp1h-vdoc.xml] (0.9MB): data for 1,000 employees over evolving schema of five year periods. generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS.<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp10h-vdoc.xml emp10h-vdoc.xml] (8.9MB): same data with 10,000 employees<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (89.2MB): same data with 100,000 employees</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:20:49Z<p>Hjmoon: /* Data Set */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** [http://yellowstone.cs.ucla.edu/schema-evolution/documents/hprima/emp100h-vdoc.xml emp100h-vdoc.xml] (0.9MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:19:58Z<p>Hjmoon: /* Performance Issues */</p>
<hr />
<div>PRIMA is the name of a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== H-PRIMA ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:16:15Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. PRIMA was based on XML DB that executes XQuery queries on XML data, which is not very efficient at the current state of art. In H-PRIMA, we instead employ RDBMS-based query execution engine, to improve the performance. We also address the problem of temporal coalesce for the broken history at schema changes.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** emp100h.xml (0.9MB)<br />
** emp100h.xml (8.9MB)<br />
** emp100h.xml (89.2MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:13:19Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]], which supports the same features of management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery queries on XML data. For the purpose, we employ the techniques of ArchIS project that shreds and stores XML data as relational data, called H-Tables.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** emp100h.xml (0.9MB)<br />
** emp100h.xml (8.9MB)<br />
** emp100h.xml (89.2MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:12:59Z<p>Hjmoon: /* Overview */</p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient incarnation of [[PRIMA]] that supports management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery queries on XML data. For the purpose, we employ the techniques of ArchIS project that shreds and stores XML data as relational data, called H-Tables.<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** emp100h.xml (0.9MB)<br />
** emp100h.xml (8.9MB)<br />
** emp100h.xml (89.2MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:11:14Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient implementation of [[PRIMA]] that supports management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery queries on XML data. For the purpose, we employ the techniques of ArchIS project that shreds and stores XML data as relational data, called H-Tables.<br />
<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** emp100h.xml (0.9MB)<br />
** emp100h.xml (8.9MB)<br />
** emp100h.xml (89.2MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:10:58Z<p>Hjmoon: </p>
<hr />
<div>== Overview ==<br />
H-PRIMA is an efficient implementation of [[[PRIMA]]] that supports management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery queries on XML data. For the purpose, we employ the techniques of ArchIS project that shreds and stores XML data as relational data, called H-Tables.<br />
<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)<br />
** emp100h.xml (0.9MB)<br />
** emp100h.xml (8.9MB)<br />
** emp100h.xml (89.2MB)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/H-PRIMAH-PRIMA2008-06-13T15:07:26Z<p>Hjmoon: New page: == Overview == PRIMA is a system that supports management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery qu...</p>
<hr />
<div>== Overview ==<br />
PRIMA is a system that supports management and querying of transaction-database systems under schema evolution. The initial system is based on XML DB that executes XQuery queries on XML data. In order to build a more efficient, we study a RDBMS-based system, which we call H-PRIMA. <br />
<br />
<br />
== Data Set ==<br />
* Employee Data Set Generator: coming soon<br />
* XML data (generated from H-Tables for the purpose of performance comparison between XML DB and RDBMS)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T15:02:39Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is the name of a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Performance Issues ==<br />
PRIMA was initially based on XML DB that execute XQuery queries. In order to improve the efficiency of the system, we are pursuing RDBMS-based system, which we call [[H-PRIMA]].<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T14:53:09Z<p>Hjmoon: /* Summary */</p>
<hr />
<div>PRIMA is the name of a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Overview ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T14:52:47Z<p>Hjmoon: /* Summary */</p>
<hr />
<div>PRIMA is the name of a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Summary ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T14:52:23Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is the name of a system that support management and querying of transaction-time database with evolving schema over history. PRIMA is an acronym for Panta Rhei Information Management and Archival.<br />
<br />
== Summary ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our '''PRIMA''' system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by '''PRIMA''' is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T14:49:29Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a joint project of UCLA, UCSD and Politecnico di Milano. <br />
<br />
'''People:'''<br />
<br />
Hyun J. Moon (contact author): [http://www.cs.ucla.edu/~hjmoon/]<br />
<br />
Carlo A. Curino: [http://carlo.curino.us/]<br />
<br />
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
Alin Deutch: [http://db.ucsd.edu/People/alin/]<br />
<br />
Chien-Yi Hou<br />
<br />
== Summary ==<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our '''PRIMA''' system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by '''PRIMA''' is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrimaPrima2008-06-13T14:49:13Z<p>Hjmoon: </p>
<hr />
<div>PRIMA is a joint project of UCLA, UCSD and Politecnico di Milano. <br />
<br />
'''People:'''<br />
<br />
Hyun J. Moon (contact author): [http://www.cs.ucla.edu/~hjmoon/]<br />
<br />
Carlo A. Curino: [http://carlo.curino.us/]<br />
<br />
Carlo Zaniolo: [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
Alin Deutch: [http://db.ucsd.edu/People/alin/]<br />
<br />
Chien-Yi Hou<br />
<br />
<br />
The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our '''PRIMA''' system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by '''PRIMA''' is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators ([[SMO]])s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.<br />
<br />
<br />
== Data set ==<br />
=== Employee Database Schema Evolution: Synthetic data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-smos.txt Schema Change History in SMO]: five schema versions over five year period<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-queries.txt Queries]: two queries for each of eight query classes<br />
* Data: [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000.xml MV-Document], [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/empdb-1000-sv.xml Single-version V-Document] (MV-Document migrated to the last version) both contain 1,000 employees, 10 departments, 4 titles, with evolving values of salary, title, department, and depart managers. <br />
<br />
=== Wikipedia Database Schema Evolution: Real-world data ===<br />
<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-smos.txt Schema Change History in SMOs]: 171 schema versions over 4.5 years, taken and translated from [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=log MediaWiki SVN]<br />
* [http://yellowstone.cs.ucla.edu/schema-evolution/documents/prima/wiki-queries.tar Queries] (tar file): 20 queries taken and translated from [http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000 Wikipedia online profiler]<br />
<br />
== Publications ==<br />
<br />
''"Managing and querying transaction-time databases under schema evolution"'' Hyun J. Moon, Carlo A. Curino, Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo. Accepted for publication at Very Large Data Base '''VLDB, 2008'''. (PDF will be available soon)</div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/Main_PageMain Page2008-06-13T14:46:26Z<p>Hjmoon: </p>
<hr />
<div>This wiki reports the research advances of the '''Pantha Rei''', a research project for data management under schema evolution.<br />
<br />
== People ==<br />
* Politecnico di Milano<br />
** Carlo A. Curino [http://carlo.curino.us/]<br />
** Letizia Tanca [http://home.dei.polimi.it/tanca/]<br />
<br />
* UCLA<br />
** Myungwon Ham<br />
** Hyun J. Moon [http://www.cs.ucla.edu/~hjmoon/] <br />
** Carlo Zaniolo [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
* UC San Diego<br />
** Alin Deutsch [http://db.ucsd.edu/People/alin/]<br />
** Chien-Yi Hou<br />
<br />
== Projects ==<br />
Within this macro-projects the following projects have been developed (please follow the links for further details):<br />
<br />
* The [[Schema_Evolution_Benchmark | '''Pantha Rei Schema Evolution Benchmark''']] <bibref f="defbib.bib">iceis2008</bibref>, a benchmark for schema evolution developed from the actual evolution of the MediaWiki DB backend.<br />
* The [[HMM | '''History Metadata Manager''']] <bibref f="defbib.bib">ecdm2008</bibref>, a tool to support temporal queries over metadata histories, and its Semantic Web Extension the [[SHMM | '''Semantic HMM''']] <bibref f="defbib.bib">stsm2008</bibref><br />
* The prototype system [[Prism|'''PRISM: tool for Graceful Schema Evolution''']] <bibref f="defbib.bib">vldb2008a</bibref>.<br />
* The prototype system [[Prima|'''PRIMA: a system for querying Transaction-Time DB under evolving schema''']] <bibref f="defbib.bib">Vldb2008b</bibref>.<br />
<br />
== Funding ==<br />
This work was supported in part by '''NSF-IIS''' award '''0705345''': ''“III-COR: Collaborative Research: Graceful Evolution and Historical Queries in Information Systems – A <br />
Unified Approach"''<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoonhttp://yellowstone.cs.ucla.edu/schema-evolution/index.php/Main_PageMain Page2008-06-13T14:46:06Z<p>Hjmoon: </p>
<hr />
<div>This wiki reports the research advances of the '''Pantha Rei''', a research project for data management under schema evolution.<br />
<br />
== Participants ==<br />
* Politecnico di Milano<br />
** Carlo A. Curino [http://carlo.curino.us/]<br />
** Letizia Tanca [http://home.dei.polimi.it/tanca/]<br />
<br />
* UCLA<br />
** Myungwon Ham<br />
** Hyun J. Moon [http://www.cs.ucla.edu/~hjmoon/] <br />
** Carlo Zaniolo [http://www.cs.ucla.edu/~zaniolo/]<br />
<br />
* UC San Diego<br />
** Alin Deutsch [http://db.ucsd.edu/People/alin/]<br />
** Chien-Yi Hou<br />
<br />
== Projects ==<br />
Within this macro-projects the following projects have been developed (please follow the links for further details):<br />
<br />
* The [[Schema_Evolution_Benchmark | '''Pantha Rei Schema Evolution Benchmark''']] <bibref f="defbib.bib">iceis2008</bibref>, a benchmark for schema evolution developed from the actual evolution of the MediaWiki DB backend.<br />
* The [[HMM | '''History Metadata Manager''']] <bibref f="defbib.bib">ecdm2008</bibref>, a tool to support temporal queries over metadata histories, and its Semantic Web Extension the [[SHMM | '''Semantic HMM''']] <bibref f="defbib.bib">stsm2008</bibref><br />
* The prototype system [[Prism|'''PRISM: tool for Graceful Schema Evolution''']] <bibref f="defbib.bib">vldb2008a</bibref>.<br />
* The prototype system [[Prima|'''PRIMA: a system for querying Transaction-Time DB under evolving schema''']] <bibref f="defbib.bib">Vldb2008b</bibref>.<br />
<br />
== Funding ==<br />
This work was supported in part by '''NSF-IIS''' award '''0705345''': ''“III-COR: Collaborative Research: Graceful Evolution and Historical Queries in Information Systems – A <br />
Unified Approach"''<br />
<br />
== References ==<br />
<bibreferences /></div>Hjmoon