Difference between revisions of "Schema Evolution Toolsuite"

From Schema Evolution
Jump to: navigation, search
Line 1: Line 1:
This webpage provide a brief overview of the Schema Evolution Suite, a framework to perform analysis over database schema and gather information and statistics
+
== Introduction ==
 +
 
 +
This webpage provide a brief overview of the Schema Evolution Suite,
 +
The application is a framework to perform analysis over database schema and gather information and statistics about the evolution of the systems.
  
 
The suite is composed of three main modules:
 
The suite is composed of three main modules:
Line 7: Line 10:
  
 
Up to now only the first part, the data collection, has been developed.
 
Up to now only the first part, the data collection, has been developed.
This part however is still in a testing phase and some functions could be removed or other could be added in the final release.
+
This part however is still in a testing phase and some functions could be removed or other could be added in the final release....
  
 
== General Configuration ==
 
== General Configuration ==
The last version of the suite has been uploaded on the yellowstone svn and is ready for the download.
 
Once downloaded, I suggest to import it inside the Eclipse Ide;
 
  
Let's start showing which are the parameters that must be set to start the application.
+
The application work either by the command line or by a graphical interface. This allows the generic user who wants to avoid the work of installing an application server, configure it and deploy the application, or a user without the possibility to do it, to run his own test without losing in functionality.
 +
Moreover the application enable the overriding of the configuration via command line; this means it is possible to set the common configuration in the xml file and then override the others parameters via command line.
 +
At the end of every section in this guide, a “parameter” subsection explain the configuration to set up to run the described phase correctly.
 
The configuration is stored in the config.xml file and contains a tag for every parameter.  
 
The configuration is stored in the config.xml file and contains a tag for every parameter.  
 
The tag is structured in the following way:
 
The tag is structured in the following way:
Line 19: Line 22:
 
<param name="parameter_name">parameter_value</param>
 
<param name="parameter_name">parameter_value</param>
  
Here follows a list of the parameter to set up
 
  
 +
=== Paramters ===
 +
 +
Here follows a list of the parameter to set up:
 
*mysqlhost: the address of the mysql host
 
*mysqlhost: the address of the mysql host
 
*mysqlport:the mysql port
 
*mysqlport:the mysql port
*dburi: the uri to locate the mysql server ( for example if you have the server installed on your machine you can set dbrui to jdbc:mysql://localhost/)
+
*dburi: the uri to locate the mysql server ( for example if you have the server installed on your machine you can set dbrui to jdbc:mysql://localhost/, otherwise use the ip address of your network dbms)
 
*user: the username to access mysql services
 
*user: the username to access mysql services
 
*pass: the password to access mysql services
 
*pass: the password to access mysql services
Line 29: Line 34:
 
== Module I: Data Collection ==
 
== Module I: Data Collection ==
  
The goal of this phase is to allow the user to install the DB backend of several open source application to perform analysis in the general statistics phase.
+
The goal of this phase is to allow the user to install all the version of the database schema of the considered system to perform analysis and statistics in the following phase.
 +
 
 
This phase is organized in different steps:
 
This phase is organized in different steps:
 
*download of the schema
 
*download of the schema
 
*installation of the database
 
*installation of the database
 
*collection of the information about the installed schemas
 
*collection of the information about the installed schemas
 +
 +
Every step is decoupled from the others; this means that it possible to run the collection in a time different from the installation time or from the download time.
 +
A common situation sees the user of the system already own all the version of the schema; the user doesn't need to download the schema script, but he will only have to set in the configuration file,or    in the string of execution, in the case he's using the application from the command line, the path of the schema and let the application locate and read the schemas by itself.
 +
 +
=== Paramters ===
  
 
The following parameters allow the user to set which operation he wants to perform
 
The following parameters allow the user to set which operation he wants to perform
*download: download the schema from the repository and store inside the pathtoschema
+
*download: performs the download  ….... operation download the schema from the repository and store inside the pathtoschema
*install: install all the revision of the schema inside the server
+
*install: performs the install operation ….........install all the revision of the schema inside the server
*filling: collect information about the installed schema and fill the evolution database
+
*filling: performs the data collection operation ….............collect information about the installed schema and fill the evolution database
*global: perform download, installation and collection without the interaction with the user (batch mode)
+
*global: perform all the three previous operations ….......perform download, installation and collection without the interaction with the user (batch mode)
*dropping: drop the schema from the server
+
*dropping: performs the drop operation …..................drop the schema from the server
If the parameters is set to true, that operation will be performed, otherwise it won't
+
 
 +
In order to run the required function, the parameter corresponding to that function must be set to true, false otherwise.
 +
 
 +
 
 +
==Schema Download==
  
=== Schema Download ===
+
The phase is called schema download because during it the system reads the configuration file, get the information about the type of repository, get the root of the repository where the schema are located, get the authentication information to access the repository and retrieve from it all the revision number of the database schema.
 +
As previously said, the real operation of download of the schema is performed in the installation phase to decouple the operation and allow the user to use his own schema; during this phase the application write in the evolution database which schemas has been downloaded with the relative revision number and assign the name will be used in the following phases to install the database
  
 +
The application support both cvs (Cuncurrent Version System) and svn (Subversion) repository
 
The schemas can be downloaded from the svn or cvs repository supplied by the application's vendor
 
The schemas can be downloaded from the svn or cvs repository supplied by the application's vendor
Both the repository access protocol are supported by the application and different configuration are needed for each repository
+
The user just need to set the right paramters in the configuration file, and set the right type of the repository.
  
''' svn configuration: '''
+
=== Paramters ===
 +
These are the configuration from both the repositories, the first parameter set the type of the repository and will be used by the system to understand which protocol to use
 +
 
 +
*repositoryType: the type of the repository will be used to retrieve the schema
 +
 
 +
svn configuration:
  
 
*svnurl: the url of the svn repository  
 
*svnurl: the url of the svn repository  
 
*svnuser: the user name to access the svn repository  
 
*svnuser: the user name to access the svn repository  
 
*svnpwd: the password to access the svn repository  
 
*svnpwd: the password to access the svn repository  
All the application provided in the previous list allow anonymous access to their repository, set svnuser and svnpwd to anonymous to access the repository
 
  
''' cvs configuration: '''
+
If svnuser or svnpwd are left blank, the system will use anonymous access
 +
 
 +
cvs configuration:
  
 
*cvsRoot: the root of the cvs repository
 
*cvsRoot: the root of the cvs repository
*cvsModule: the module you wish to download from the repository
+
*cvsModule: the module to download the schema from the repository ( usually the relative location of the schema from the cvs root)
*cvsLocalPath: a local path on your system
+
*cvsLocalPath: a local path on your system (used by the cvs protocol to start the download)
 +
*pathtoschema: the local path where the application saves the schema downloaded from the repository; is is recommended to use a path with the same name of the database considered; if for example the database name is xxx, create a folder on your system with the name xxx_schema, just to avoid multiple database to download the schema in the same location
  
''' global '''
 
  
*pathtoschema: the local path where the application saves the schema downloaded from the repository
+
==Schema Installation==
*repositoryType: the type of the repository will be used to retrieve the schema
+
 
 +
During the schema installation phase the application install all the revisions of the database.
 +
The problem faced during this phase is to avoid the user th task of modifying the schema in order to get the best result during the installation phase.
 +
The Dbms used during the development phase is MySQL.
 +
The majority of the schemas belongs to Open source software and use MySQL as well, but have been created with a version older than the one considered at the moment of the deploy; (the stable version released from the MySql  Corporationi at the time of the deploy is the 5.0)
 +
Moreover there are societies that uses other DBMS like Oracle, adding another problem to the installations phase because there are features incompatible between the two DBMS.
 +
In order to solve this problem the application provide different degree of cleaning of the schema, a zero-cleaning, a light-cleaning and a full-cleaning.
 +
 
 +
=== The zero-cleaning ===
 +
 
 +
During the zero-cleaning process are removed from the schema the following field:
 +
*comments: we aren't interested to the comment when installing a schema
 +
*insert statement: we want to make statistic over the schema, not over the data
 +
 
 +
=== The light-cleaning ===
 +
 
 +
During the light-cleaning in addition to the zero-cleaning phase, are removed from the schema the following field:
 +
*default blob/text values: the dump of MySQL creates default values for text or blob field; in the latest release, blob and default field can't have a default value(cfr. MySQL reference Manual, http://dev.mysql.com/doc/refman/5.0/en/index.html)
 +
 
 +
=== The full-cleaning ===
  
 +
During the full-cleaning process either the zero-cleaning and light cleaning phase are performed. The field left into the schema are:
 +
*datatype
 +
*length of the datatype (where present)
 +
All the information about indexes, keys, default value are deleted.
  
=== Schema Installation ===
+
Future improvement will consider the integration of MySQL migration tool, a suite for migrating schema and data from various relational database systems to MySQL, into the Schema Evolution Suite
  
Only one parameter need to be set up for this phase
+
=== Paramters ===
*dbbasename: the name that will use all the revision of the download schema during the installation phase.
+
*dbbasename: the name that will use all the revision during the installation. Suppose we have downloaded the revision 1522,1523,1524 of the xxx database schema; if we set the dbbasename to xxx, this will be installed with the following name xxx1522, xxx1523, xxx1524.
Suppose we have downloaded the revision 1522,1523,1524 of the wikimedia schema; if we set the dbbasename to wikimedia, the schema will be installed with the following name wikimedia1522, wikimedia1523, wikimedia 1524
+
This avoid any conflict with previous installed version of the database.
 +
*cleaningSchema: (zero, light, full) the degree of cleaning we want to obtain over the schema
  
 
=== Schema Filling ===
 
=== Schema Filling ===

Revision as of 10:42, 1 November 2008

== Introduction ==

This webpage provide a brief overview of the Schema Evolution Suite, The application is a framework to perform analysis over database schema and gather information and statistics about the evolution of the systems.

The suite is composed of three main modules:

  • data collection
  • general statistics
  • query success

Up to now only the first part, the data collection, has been developed. This part however is still in a testing phase and some functions could be removed or other could be added in the final release....

Contents

General Configuration

The application work either by the command line or by a graphical interface. This allows the generic user who wants to avoid the work of installing an application server, configure it and deploy the application, or a user without the possibility to do it, to run his own test without losing in functionality. Moreover the application enable the overriding of the configuration via command line; this means it is possible to set the common configuration in the xml file and then override the others parameters via command line. At the end of every section in this guide, a “parameter” subsection explain the configuration to set up to run the described phase correctly. The configuration is stored in the config.xml file and contains a tag for every parameter. The tag is structured in the following way:

<param name="parameter_name">parameter_value</param>


Paramters

Here follows a list of the parameter to set up:

  • mysqlhost: the address of the mysql host
  • mysqlport:the mysql port
  • dburi: the uri to locate the mysql server ( for example if you have the server installed on your machine you can set dbrui to jdbc:mysql://localhost/, otherwise use the ip address of your network dbms)
  • user: the username to access mysql services
  • pass: the password to access mysql services

Module I: Data Collection

The goal of this phase is to allow the user to install all the version of the database schema of the considered system to perform analysis and statistics in the following phase.

This phase is organized in different steps:

  • download of the schema
  • installation of the database
  • collection of the information about the installed schemas

Every step is decoupled from the others; this means that it possible to run the collection in a time different from the installation time or from the download time. A common situation sees the user of the system already own all the version of the schema; the user doesn't need to download the schema script, but he will only have to set in the configuration file,or in the string of execution, in the case he's using the application from the command line, the path of the schema and let the application locate and read the schemas by itself.

Paramters

The following parameters allow the user to set which operation he wants to perform

  • download: performs the download ….... operation download the schema from the repository and store inside the pathtoschema
  • install: performs the install operation ….........install all the revision of the schema inside the server
  • filling: performs the data collection operation ….............collect information about the installed schema and fill the evolution database
  • global: perform all the three previous operations ….......perform download, installation and collection without the interaction with the user (batch mode)
  • dropping: performs the drop operation …..................drop the schema from the server

In order to run the required function, the parameter corresponding to that function must be set to true, false otherwise.


Schema Download

The phase is called schema download because during it the system reads the configuration file, get the information about the type of repository, get the root of the repository where the schema are located, get the authentication information to access the repository and retrieve from it all the revision number of the database schema. As previously said, the real operation of download of the schema is performed in the installation phase to decouple the operation and allow the user to use his own schema; during this phase the application write in the evolution database which schemas has been downloaded with the relative revision number and assign the name will be used in the following phases to install the database

The application support both cvs (Cuncurrent Version System) and svn (Subversion) repository The schemas can be downloaded from the svn or cvs repository supplied by the application's vendor The user just need to set the right paramters in the configuration file, and set the right type of the repository.

Paramters

These are the configuration from both the repositories, the first parameter set the type of the repository and will be used by the system to understand which protocol to use

  • repositoryType: the type of the repository will be used to retrieve the schema

svn configuration:

  • svnurl: the url of the svn repository
  • svnuser: the user name to access the svn repository
  • svnpwd: the password to access the svn repository

If svnuser or svnpwd are left blank, the system will use anonymous access

cvs configuration:

  • cvsRoot: the root of the cvs repository
  • cvsModule: the module to download the schema from the repository ( usually the relative location of the schema from the cvs root)
  • cvsLocalPath: a local path on your system (used by the cvs protocol to start the download)
  • pathtoschema: the local path where the application saves the schema downloaded from the repository; is is recommended to use a path with the same name of the database considered; if for example the database name is xxx, create a folder on your system with the name xxx_schema, just to avoid multiple database to download the schema in the same location


Schema Installation

During the schema installation phase the application install all the revisions of the database. The problem faced during this phase is to avoid the user th task of modifying the schema in order to get the best result during the installation phase. The Dbms used during the development phase is MySQL. The majority of the schemas belongs to Open source software and use MySQL as well, but have been created with a version older than the one considered at the moment of the deploy; (the stable version released from the MySql Corporationi at the time of the deploy is the 5.0) Moreover there are societies that uses other DBMS like Oracle, adding another problem to the installations phase because there are features incompatible between the two DBMS. In order to solve this problem the application provide different degree of cleaning of the schema, a zero-cleaning, a light-cleaning and a full-cleaning.

The zero-cleaning

During the zero-cleaning process are removed from the schema the following field:

  • comments: we aren't interested to the comment when installing a schema
  • insert statement: we want to make statistic over the schema, not over the data

The light-cleaning

During the light-cleaning in addition to the zero-cleaning phase, are removed from the schema the following field:

The full-cleaning

During the full-cleaning process either the zero-cleaning and light cleaning phase are performed. The field left into the schema are:

  • datatype
  • length of the datatype (where present)

All the information about indexes, keys, default value are deleted.

Future improvement will consider the integration of MySQL migration tool, a suite for migrating schema and data from various relational database systems to MySQL, into the Schema Evolution Suite

Paramters

  • dbbasename: the name that will use all the revision during the installation. Suppose we have downloaded the revision 1522,1523,1524 of the xxx database schema; if we set the dbbasename to xxx, this will be installed with the following name xxx1522, xxx1523, xxx1524.

This avoid any conflict with previous installed version of the database.

  • cleaningSchema: (zero, light, full) the degree of cleaning we want to obtain over the schema

Schema Filling

The filling operations are performed in a dynamic way; the application analyze the tables of the information schema and create a corresponding table for the evolution database and fill the table with the data retrieved from the information schema. In this way is possible to keep the application up to date even when the information schema is changed The parameters to set are

  • dbevolution: the name of the database that will store all the data about statistics and the schema informations; the database is released with the application.
  • headerEvolutionTables: allows to set a header for the evolution tables to avoid conflicts with the name of the information schema tables
  • engine: the engine of the table created during this phase
  • char_set: the character set to set up for the tables

Schema Dropping

This phase uses the same parameters of the filling phase; all the revisions of the database will be dropped.


Once the configuration file is ready, before running the application, you need to set the argument; the only argument you need is the path of the configuration file. In the argument subsection put -c ./config/config.xml to avoid any start error: the application can't start without the config file MenuApplication.java contains main functions.

Personal tools