Difference between revisions of "Prism"

From Schema Evolution
Jump to: navigation, search
(Useful Links)
(Publications)
Line 54: Line 54:
  
 
== Publications ==
 
== Publications ==
 +
 +
''"Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++"'' Carlo Curino Hyun J. Moon, Alin Deutsch, Carlo Zaniolo, PVLDB, (2011).
  
 
''"Graceful database schema evolution: the prism workbench"'', Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, to appear in  '''VLDB 2008''' [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino08graceful.pdf PDF]
 
''"Graceful database schema evolution: the prism workbench"'', Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, to appear in  '''VLDB 2008''' [http://yellowstone.cs.ucla.edu/schema-evolution/documents/curino08graceful.pdf PDF]

Revision as of 16:28, 15 November 2010

PRISM-LOGO.png

PRISM is a system that supports Schema Evolution by means of schema mapping and query rewriting. PRISM is a joint project of Politecnico di Milano and University of California, Los Angeles. The main investigators are:

Carlo A. Curino (contact author): [1]

Hyun J. Moon: [2]

Carlo Zaniolo: [3]

Contents

Abstract

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators (SMO) to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM tools and their ability to support legacy queries.

On-line Demo

Prism: a tool for schema evolution support

While the actual demo is available online at: Prism a tool for schema evolution support. The Demo is still under development and has limited functionalities w.r.t. the internal prototype, but we are working on it... stay tuned!

Screencasts

There is a Video of the Demo available at: http://yellowstone.cs.ucla.edu/schema-evolution/documents/Prism-Demo.mov

There is a Video of the VLDB presentation at: http://yellowstone.cs.ucla.edu/schema-evolution/documents/prism-vldb2008.mov

Publications

"Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++" Carlo Curino Hyun J. Moon, Alin Deutsch, Carlo Zaniolo, PVLDB, (2011).

"Graceful database schema evolution: the prism workbench", Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, to appear in VLDB 2008 PDF

"Information Systems Integration and Evolution: Ontologies at Rescue", Carlo A. Curino, Letizia Tanca, Carlo Zaniolo, STSM 2008 PDF

"The PRISM Workwench: Database Schema Evolution Without Tears", Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, DEMO paper at ICDE 2009.

Bibtex

 @INPROCEEDINGS{curino-vldb2008a,
   author = {Carlo A. Curino and Hyun J. Moon and Carlo Zaniolo},
   title = {Graceful database schema evolution: the prism workbench},
   booktitle = {Very Large Data Base (VLDB)},
   year = {2008}
 }
 @INPROCEEDINGS{curino-stsm2008,
   author = {Carlo A. Curino and Letizia Tanca and Carlo Zaniolo},
   title = {Information Systems Integration and Evolution: Ontologies at Rescue},
   booktitle = {International Workshop on Semantic Technologies in System Maintenance (STSM)},
   year = {2008}
 }

Commercial Competitors

Here we list a series of commercial tools which partially tackle the problem of Schema Evolution. None of this support automatic query rewriting and data migration impact analysis as done by prototype PRISM. Most of them focus on comparing schema versions and creating report of the differences. Another common features is analyzing the impact of the change w.r.t. other DB objects such as store procedures, views, constraints, but none of this provide analysis of the impact on query and applications.

  • DB2 Change Management Expert [4]
  • Oracle Change Management Pack [5]
  • MySQL Workbench for Schema Change [6]
  • SwisSQL DBChangeManager (MSSQL) (DB compare and synchronization) [7]
  • Idera SQL Change [8]
  • Embarcadero Change Manager [9]
  • Red-Gate SQL Compare [10]
  • Best Soft Tool [11]
  • Toad® DBA Suite [12] [13]
  • Aldon lifecycle management solution [14]

Useful Links

  • Management-hub.com [15]
  • Application specific schema evolution in Django [16]

Detailed Description

Introduction

The incessant pressure of schema evolution is impacting every database, from the world's largest (Source: http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html) ``World Data Centre for Climate featuring over 6 petabytes of data, to the smallest single-website DB. DBMSs have long addressed, and largely solved, the physical data independence problem, but their progress toward logical data independence and graceful schema evolution has been painfully slow. Both practitioners and researchers are well aware that schema modifications can: (i) dramatically impact both data and queries <bibref f="defbib.bib">iceis2008</bibref>, endangering the data integrity, (ii) require expensive application maintenance for queries, and (iii) cause unacceptable system downtimes. The problem is particularly serious in Web Information Systems, such as Wikipedia, where significant downtimes are not acceptable while a mounting pressure for schema evolution follows from the diverse and complex requirements of its open-source, collaborative software-development environment <bibref f="defbib.bib">iceis2008</bibref>. The following comment(From the SVN commit 5552 accessible at: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=5552). by a senior MediaWiki <bibref f="defbib.bib">mediawiki</bibref> DB designer, reveals the schema evolution dilemma faced today by DataBase Administrators (DBAs): This will require downtime on upgrade, so we're not going to do it until we have a better idea of the cost and can make all necessary changes at once to minimize it.

Clearly, what our DBA needs is the ability to (i) predict and evaluate the impact of schema changes upon queries and applications using those queries, and (ii) minimize the downtime by replacing, as much as possible, the current manual process with tools and methods to automate the process of database migration and query rewriting. The DBA would also like (iii) all these changes documented automatically for: data provenance, flash-backs to previous schemas, historical queries, and case studies to assist on future problems.

There has been much recent work and progress on theoretical issues relating to schema modifications including mapping composition, mapping invertibility, and query rewriting.

These techniques have often been used for heterogenous database integration; in the Panta Rhei Information Schema Manager (PRISM) we exploit them to automate the transition to a new schema on behalf of a DBA. In this setting, the semantic relationship between source and target schema, deriving from the schema evolution, is more crisp and better understood by the DBA than in typical database integration scenarios. Assisting the DBA during the design of schema evolution, PRISM can thus achieve objectives (i-iii) above by exploiting those theoretical advances, and prompting further DBA input in those rare situations in which ambiguity remains.

Therefore, PRISM provides an intuitive, operational interface, used by the DBA to evaluate the effect of a possible evolution steps w.r.t. redundancy, information preservation, and impact on queries. Moreover, PRISM automates error-prone and time-consuming tasks such as query translation, computation of inverses, and data migration. As a by-product of its use PRISM creates a complete, unambiguous documentation of the schema evolution history, which is invaluable to support data provenance, database flash backs, historical queries, and user education about standard practices, methods and tools.


PRISM exploits the concept of Schema Modification Operators (SMO) <bibref f="defbib.bib">BernsteinGMN08</bibref>, representing atomic schema changes, which we then modify and enhance by (i) introducing the use of functions for data type and semantic conversions, (ii) providing a mapping to Disjunctive Embedded Dependencies (DEDs), (iii) obtain invertibility results compatible to <bibref f="defbib.bib">fagin2007c</bibref>, and (iv) define the translation into efficient SQL primitives to perform the data migration. The system has been tested and validated against the benchmark for schema evolution defined in <bibref f="defbib.bib">iceis2008</bibref>, which is built over the actual database schema evolution history of Wikipedia (170+ schema versions in 4.5 years). Its ability to handle the very complex evolution of one of the ten most popular website of the World Wide Web (Source: http://www.alexa.com). offers an important validation of practical soundness and completeness of our approach.


While Web Information Systems represent an extreme case, where the need for evolution is exacerbated <bibref f="defbib.bib">iceis2008</bibref> by the fast evolving environment in which they operates, every DBMS would benefit from graceful schema evolution. In particular every DB accessed by applications inherently hard to modify like: public Scientific Databases accessed by applications developed within several independent institutions, DB supporting legacy applications (impossible to modify), and system involving closed-source applications foreseeing high adaptation costs. Transaction time databases with evolving schema represent an interesting scenario were similar techniques can be applied \cite{vldb2008b}.

Contributions.

The PRISM system, harness recent theoretical advances <bibref f="defbib.bib">deutsch03mars</bibref><bibref f="defbib.bib">fagin2007a</bibref> into practical solutions, through an intuitive interface, which masks the complexity of underling tasks, such as logic-based mappings between schema versions, mapping composition, and mapping invertibility. By providing a simple operational interface and speaking commercial DBMS jargon, PRISM provides a user-friendly, robust bridge to the practitioners' world. System scalability and usability have been addressed and tested against one of the most intense histories of schema evolution available to date: the schema evolution of Wikipedia, featuring in 4.5 years over 170+ documented schema versions and over 700 gygabytes of data <bibref f="defbib.bib">barzan</bibref>.

Related Works

Some of the most relevant approaches to the general problem of schema evolution are the impact-minimizing methodology of <bibref f="defbib.bib">Ra:2005vj</bibref>, the unified approach to application and database evolution of <bibref f="defbib.bib">1228375</bibref>, the application-code generation of <bibref f="defbib.bib">cleve2006</bibref> and the framework for metadata model management of <bibref f="defbib.bib">melnik03rondo</bibref> and the further contributions <bibref f="defbib.bib">bernstein03applying</bibref><bibref f="defbib.bib">bernstein03data</bibref><bibref f="defbib.bib">velegrakis03mapping</bibref><bibref f="defbib.bib">yu05semantic</bibref>. While these and other interesting attempts provide solid theoretical foundations and interesting methodological approaches, the lack of operational tools for graceful schema evolution observed by Roddick in <bibref f="defbib.bib">roddick95schema</bibref> remains largely unsolved twelve years later. PRISM represents, at the best of our knowledge, the most advanced attempt in this direction available to date.

The operational answer to the issue of schema evolution used by PRISM exploits some of the most recent results on mapping composition <bibref f="defbib.bib">nash05composition</bibref>, mapping invertibility <bibref f="defbib.bib">fagin2007c</bibref>, and query rewriting <bibref f="defbib.bib">deutsch03mars</bibref>. The SMO language used here captures the essence of existing works <bibref f="defbib.bib">BernsteinGMN08</bibref>, but extends them with functions, for expressing data type and semantic conversions. The translation between SMOs and Disjunctive Embedded Dependencies (DED) exploited here is similar to the incremental adaptation approach of <bibref f="defbib.bib">velegrakis03mapping</bibref>, but achieves different goals. The query rewriting portion of PRISM exploits theories and tools developed in the context of the MARS project <bibref f="defbib.bib">deutsch01ded</bibref><bibref f="defbib.bib">deutsch03mars</bibref>. The theories of mapping composition studied in <bibref f="defbib.bib">madhavan03composing</bibref><bibref f="defbib.bib">Fagin04composing</bibref><bibref f="defbib.bib">nash05composition</bibref><bibref f="defbib.bib">BernsteinGMN08</bibref>, and the concept of invertibility recently investigated by Fagin et al. in <bibref f="defbib.bib">fagin2007c</bibref><bibref f="defbib.bib">fagin2007a</bibref> support the notion of SMO composition and inversion.

The big players in the world of commercial DBMSs have been mainly focusing on reducing the downtime when the schema is updated <bibref f="defbib.bib">oraclewhitepaper</bibref> and on assistive design tools <bibref f="defbib.bib">db2changemanagementexpert</bibref>, and lack the automatic query rewriting features provided in PRISM. Other tools of interest are <bibref f="defbib.bib">chimera</bibref> and LiquiBase (Available on-line: http://www.liquibase.org/).

Further related works include the results on mapping information preservation by Barbosa et al. <bibref f="defbib.bib">BarbosaFM05</bibref>, the ontology-based repository of <bibref f="defbib.bib">bounif2006</bibref>, the schema versioning approaches of <bibref f="defbib.bib">jagadish97</bibref>. XML schema evolution has been addressed in <bibref f="defbib.bib">MoroML07</bibref> by means of a guideline-driven approach. Object-oriented schema evolution has been investigated in <bibref f="defbib.bib">Galante:2005eq</bibref>. In the context of data warehouse X-TIME represents an interesting step toward schema versioning by means of the notion of augmenting schema <bibref f="defbib.bib">GolfarelliLRV04</bibref><bibref f="defbib.bib">Rizzi07</bibref>. PRISM differs form all the above in terms of both goals and techniques.

References

<bibreferences />

Personal tools