Testbed for Schema Evolution Tools

From Schema Evolution
Revision as of 22:21, 17 March 2010 by Gurneet (Talk | contribs)

Jump to: navigation, search

Contents

Abstract

"Change is the law of nature ". Confirming to this law of nature, Information systems are highly subjected to continuous changes, including changes in data, schema, applications and queries. Open source software development has also added to the magnitude of changes or evolution. Evolution adds complexity and can have a dramatic impact on Information Systems. However research to solve the problem of schema evolution is still in its infancy. Tools such as PRISM, PRIMA, HMM (Curino et al. 2008) have been recently proposed to assist DBAs in schema evolution. The Web Information Systems (WIS) group at UCLA is in the process of setting up a benchmark for such tools. As part of the effort, a testbed has been prepared to test the efficacy of those schema evolution support tools and others such as Schema Evolution Toolsuite and SQL2SMO Translator and PRISM. The first two tools are also useful in generating input data for PRISM in the form of Evolution Database or Schema Modification Operators (SMOs). PRISM has been tested with Schema Modification Operators (SMOs) and Integrity Constraints Modification Operators (ICMOs) to evaluate its efficiency in assisting in the Graceful Schema Evolution. This page presents detailed description of the test datasets used to test the tools, the behaviors expected, and results produced.

Motivation

Schema Evolution has gained recognition as a practical problem, and it has attracted researcher’s attention. However the research is still in its infancy. Several tools such as PRISM, PRIMA, HMM etc. have been proposed in the past and Carlo Curino and DB group at UCLA are in process of setting up a benchmark for schema evolution. This testbed for testing schema evolution tools is an addition to that effort. It will provide a rich set of test cases to evaluate and provide feedback on the efficiency of such tools. The testbed also helps the beginners getting familiar with the typical changes taking place in any schema evolution and the type of inputs used by various tools to predict the impact of that evolution.

Fundamental Concepts

Information systems are continuously experiencing change with respect to data and schema. These changes are arising due to changing needs, added functionalities, new security and privacy issues and their solutions, compliance to new regulations upcoming every day. Database constructs are added removed and updated every second leading to immense impact on variety of applications and data stores. Even the small number of changes, can invalidate the legacy queries, data entry forms and the applications can crash. Adapting to the change is a costly process because it requires changes in logical and physical organization of data. It is even more time consuming and expensive to change applications, queries, views in order to adapt to the change. With the emergence of web information systems, the problem of evolution is becoming even more severe and complex. Another factor adding to this complexity is the open source software development which is usually a combined effort of many organizations or groups or people. Hence managing evolution in this kind of diverse environment is challenging. Let’s look at some important terms, concepts and definitions that are used in this page.

Schema Evolution- It refers to the process of schema undergoing structural changes in order to satisfy the changing needs. In other words, it can be considered as history of changes made to the schema of the database.

Versions- A version is defined as a stable and coherent state which the administrator or designer desires to keep. Generating a new version of a context is a process which results from human decision. This means that all context modifications do not necessarily generate new versions. [9]

Schema Modification Operators (SMOs)- An SMO is a function that receives as input a relational schema and the underlying database, and produces as output a (modified) version of the input schema and a migrated version of the database. [2]. SMOs represent one or more atomic changes proposed/applied to an input schema. Typical examples of SMOs include CREATE TABLE, COPY TABLE, and RENAME COLUMN etc.

Integrity Constraints Modification Operators (ICMOs)- An ICMO extends the idea of modification operators further to changes pertaining to purely integrity constraints. ICMOs represent atomic changes related to Primary Keys, Foreign Keys, and simple Value Constraints.

Testbed- "The term is used across many disciplines to describe a development environment that is shielded from the hazards of testing in a live or production environment. It is a method of testing a particular module (function, class, or library) in an isolated fashion. May be implemented similar to a sandbox, but not necessarily for the purposes of security. A test bed is used as a proof of concept or when a new module is tested apart from the program/system it will later be added to. A skeleton framework is implemented around the module so that the module behaves as if already part of the larger program. A typical test bed could include software, hardware, and networking components. In software, the hardware and software requirements are known as the testbed, which is also known as the test environment. Test beds are also pages on the Internet where the public is given the opportunity to test CSS or HTML they have created and wanted to see the outcome [11].”


Schema Evolution Tools

PRISM (Panta Rhei Information & Schema Manager) has been designed and developed to bridge the gap between theoretical results and their practical usability, by harvesting recent theoretical advances on schema mapping and query rewriting. PRISM represents the most advanced system supporting relational schema evolution available to date. The system automates a very large portion of the query adaptation work, thus, providing an invaluable operational tool to support DB Administrators (DBAs) in their everyday activities. PRISM addresses the two main challenges of Schema Evolution: predictability of the evolution process and logical independence of the evolution process. It provides a language of Schema Modification Operators (SMO) to express complex changes and allow DBA to evaluate the effect of such changes and also provide optimized translation of old queries to work on the new schema and automatic data migration. [3]

ArchIS supports efficient temporal queries on the archived history of (fixed schema) databases [4].

PRIMA (Panta Rhei Information Management & Archival) introduces the transparent support for complex temporal queries over archives with evolving schema. Both PRISM and PRIMA exploit an intuitive operational language of Schema Modification Operators (SMOs) to explicitly capture the semantics of the schema evolution [4].

The HMM (Historical Metadata Manager) tool assists in archiving and querying rich metadata histories [4].

The Schema Evolution Toolsuite (SET), a tool developed by Fabrizio Moroni, is a framework to easily automate the benchmarking process. It collects statistical information on schema evolution of a dataset. The data about the evolution of a schema is stored in the MySQL data dictionary i.e. Information Schema is used to store this data. SET queries this data dictionary to gather the statistical measures such as average table/column lifetime, schema size etc.

Read more about SET

Download SET

SQL2SMO Translator is also developed by Fabrizio Moroni and it translates the given patch of SQL statements into corresponding Schema Modification Operators (SMOs) using some pattern matching techniques. It identifies five types of patterns viz. Join Pattern Type1, Join Pattern Type2, Merge Pattern, Partition Pattern, and Decompose Pattern.

Read more about SQL2SMO Translator

Download SQL2SMO Translator

Useful Links

Benchmark Datasets SET Results1 SET Results2

Overview

Three datasets namely ENSEMBL, WIKIPEDIA, and XOOPS have been analyzed to extract SMOs and ICMOs. You can also find various schema versions for these datasets at SchemaSet1 or SchemaSet2 . Different schema versions have been analysed to write the SQL statements that have led to that evolution. Those SQL patches are further converted to SMOs or ICMOs or full ICSMOs that can be used as an input to the tool PRISM to predict future schemas and automatic query rewriting. These files are named using the following naming convention:

DB_FILETYPE_VERSIONFROM_TO_VERSIONTO

For example files like

ensembl_patch_31_to_32,

ensembl_SMO_31_to_32,

ensembl_ICMO_31_to_32,

ensembl_ICSMO_31_to_32

contain the SQL patch , SMOs, ICMOs, full ICSMO respectively for the Ensembl schema evolution from version 31 to 32.

Special Case: Synthetic ICMOs for Ensembl DB Synthetic ICMOs have been created to test PRISM for all possible constraint changes as the actual changes in terms of Integrity Constraint in Ensebml DB,were not enough for testing purposes. Three ICMOs sets have been created:

Ensembl_ICMO_338_to_339_Test_all: Contains some valid/invalid ICMOs.

Ensembl_ICMO_338_to_339_Test_Corrected: Contains only valid ICMOs.

Ensembl_ICMO_339_to_340_Test: Contains only valid ICMOs.

Ensembl_ICMO_340_to_341_Test: Contains only valid ICMOs.

Database versions ensembl1_338 ia part of the actual dataset, and is in the database.Other versions like ensembl_339_Test,ensembl_340_Test are synthetic schema evolution versions. Separate scripts for creating these databases are included in the directory.

Information on Different Directories An Example

ENSEMBL_SMOs_SQL2SMO: It contains SMOs for Ensembl dataset created using SQL2SMO tool.

Ensembl_Fkeys: It contains Foreign Key relationships for various tables in different releases. Though they say these constraints are implemented at application level, not at the database level.

Ensembl_ICMOs: It contains ICMOs for the Ensembl dataset.

Ensembl_ICSMOs: It contains complete set of changes(SMOs and ICMOs) that lead to an evolution from one version to other for the Ensembl dataset.

Ensembl_Inverse_SMOs: It contains Inverse SMOs for the Ensembl dataset.

Ensembl_Patches_Full:It contains complete set of SQL statements that lead to an evolution from one version to other for the Ensembl dataset. For example add a column, drop an index, drop a FK etc.

Ensembl_Patches_ICMOs: It contains set of SQL statements related to Integrity Constraint (IC) changes only. For example add a PK, drop a FK etc.

Ensembl_Patches_SQL2SMO: It contains set of SQL statements related to Schema Modification (SM) changes only. For example add a column, drop a column etc.

Ensembl_Schemas: It contains differnt versions of the schema.

Ensembl_Synthetic_ICMOs: Contains synthetic patches,ICMOs and corresponding synthetic schema versions created for testing purposes.

ensembl_revisions_analysis: Excel file contains revision analysis for Ensembl dataset.

Look Inside!! ENSEMBL WIKIPEDIA XOOPS


Histories!!!

Scripts to Create Evolution Databases for Different Datasets Evolution DB for Wikipedia Evolution DB for Ensembl Evolution DB for Wikipedia and Ensembl Evolution DB for Wikipedia, Ensembl, Xoops, Coppermine and Typo3 These databases are already available in a MYSQL server at wisgate.cs.ucla.edu. The login information can be obtained by email.

SMOs and ICMOs(i.e. ICSMOs) under Consideration 1.ADD COLUMN < col > < type > AS < const/funct > INTO < tab > 2.DROP COLUMN < col > FROM < tab > 3.CREATE TABLE < tab > (< col1 > < type1 >,< col2 > < type2 >); 4.DROP TABLE < tab > 5.RENAME TABLE < tab1 > INTO < tab2 > 6.RENAME COLUMN < col1 > IN < tab1 > TO < col2 > 7.COPY TABLE < tab1 > INTO < tab2 > 8.JOIN TABLE < tab1 >, < tab2 > INTO < tab3 > WHERE < cond > 9.DECOMPOSE TABLE < tab1 > INTO < tab2(col1,col2) >, < tab3(col1,col3) > 10.PARTITION TABLE < tab1 > INTO < tab2 > WITH < cond1 >, < tab3 > WITH < cond2 > 11.MERGE TABLE < tab1 >, < tab2 > INTO < tab3 > 12.NOP; 13.ALTER TABLE < tab > ADD PRIMARY KEY < id > (col1,col2) CHECK; 14.ALTER TABLE < tab > ADD PRIMARY KEY < id > (col1,col2) ENFORCE; 15.ALTER TABLE < tab > ADD PRIMARY KEY < id > (col1,col2) DECLARE; 16.ALTER TABLE < tab > ADD FOREGN KEY < id > (col1,col2) REFERENCES < tab2 >(col3,col4) CHECK; 17.ALTER TABLE < tab > ADD FOREGN KEY < id > (col1,col2) REFERENCES < tab2 >(col3,col4) ENFORCE; 18.ALTER TABLE < tab > ADD FOREGN KEY < id > (col1,col2) REFERENCES < tab2 >(col3,col4) DECLARE; 19.ALTER TABLE < tab > ADD VALUE CONSTRAINT < id > < cond > CHECK; 20.ALTER TABLE < tab > ADD VALUE CONSTRAINT < id > < cond > ENFORCE; 21.ALTER TABLE < tab > ADD VALUE CONSTRAINT < id > < cond > DECLARE; 22.ALTER TABLE < tab > DROP PRIMARY KEY < id > 23.ALTER TABLE < tab > DROP FOREIGN KEY < id > 24.ALTER TABLE < tab > DROP VALUE CONSTRAINT < id >

Personal tools