Schema Evolution Toolsuite
This webpage provide a brief overview of the Schema Evolution Suite, The application is a framework to perform analysis over database schema and gather information and statistics about the evolution of the systems.
The suite is composed of three main modules:
- data collection
- general statistics
- query success
Up to now only the first part, the data collection, has been developed. This part however is still in a testing phase and some functions could be removed or other could be added in the final release....
The application work either by the command line or by a graphical interface. This allows the generic user who wants to avoid the work of installing an application server, configure it and deploy the application, or a user without the possibility to do it, to run his own test without losing in functionality. Moreover the application enable the overriding of the configuration via command line; this means it is possible to set the common configuration in the xml file and then override the others parameters via command line. At the end of every section in this guide, a “parameter” subsection explain the configuration to set up to run the described phase correctly. The configuration is stored in the config.xml file and contains a tag for every parameter. The tag is structured in the following way:
Here follows a list of the parameter to set up:
- mysqlhost: the address of the mysql host
- mysqlport:the mysql port
- dburi: the uri to locate the mysql server ( for example if you have the server installed on your machine you can set dbrui to jdbc:mysql://localhost/, otherwise use the ip address of your network dbms)
- user: the username to access mysql services
- pass: the password to access mysql services
Module I: Data Collection
The goal of this phase is to allow the user to install all the version of the database schema of the considered system to perform analysis and statistics in the following phase.
This phase is organized in different steps:
- download of the schema
- installation of the database
- collection of the information about the installed schemas
Every step is decoupled from the others; this means that it possible to run the collection in a time different from the installation time or from the download time. A common situation sees the user of the system already own all the version of the schema; the user doesn't need to download the schema script, but he will only have to set in the configuration file,or in the string of execution, in the case he's using the application from the command line, the path of the schema and let the application locate and read the schemas by itself.
The following parameters allow the user to set which operation he wants to perform
- download: performs the download ….... operation download the schema from the repository and store inside the pathtoschema
- install: performs the install operation ….........install all the revision of the schema inside the server
- filling: performs the data collection operation ….............collect information about the installed schema and fill the evolution database
- global: perform all the three previous operations ….......perform download, installation and collection without the interaction with the user (batch mode)
- dropping: performs the drop operation …..................drop the schema from the server
In order to run the required function, the parameter corresponding to that function must be set to true, false otherwise.
The phase is called schema download because during it the system reads the configuration file, get the information about the type of repository, get the root of the repository where the schema are located, get the authentication information to access the repository and retrieve from it all the revision number of the database schema. As previously said, the real operation of download of the schema is performed in the installation phase to decouple the operation and allow the user to use his own schema; during this phase the application write in the evolution database which schemas has been downloaded with the relative revision number and assign the name will be used in the following phases to install the database
The application support both cvs (Cuncurrent Version System) and svn (Subversion) repository The schemas can be downloaded from the svn or cvs repository supplied by the application's vendor The user just need to set the right paramters in the configuration file, and set the right type of the repository.
These are the configuration from both the repositories, the first parameter set the type of the repository and will be used by the system to understand which protocol to use
- repositoryType: the type of the repository will be used to retrieve the schema
- svnurl: the url of the svn repository
- svnuser: the user name to access the svn repository
- svnpwd: the password to access the svn repository
If svnuser or svnpwd are left blank, the system will use anonymous access
- cvsRoot: the root of the cvs repository
- cvsModule: the module to download the schema from the repository ( usually the relative location of the schema from the cvs root)
- cvsLocalPath: a local path on your system (used by the cvs protocol to start the download)
- pathtoschema: the local path where the application saves the schema downloaded from the repository; is is recommended to use a path with the same name of the database considered; if for example the database name is xxx, create a folder on your system with the name xxx_schema, just to avoid multiple database to download the schema in the same location
During the schema installation phase the application install all the revisions of the database. The problem faced during this phase is to avoid the user th task of modifying the schema in order to get the best result during the installation phase. The Dbms used during the development phase is MySQL. The majority of the schemas belongs to Open source software and use MySQL as well, but have been created with a version older than the one considered at the moment of the deploy; (the stable version released from the MySql Corporationi at the time of the deploy is the 5.0) Moreover there are societies that uses other DBMS like Oracle, adding another problem to the installations phase because there are features incompatible between the two DBMS. In order to solve this problem the application provide different degree of cleaning of the schema, a zero-cleaning, a light-cleaning and a full-cleaning.
During the zero-cleaning process are removed from the schema the following field:
- comments: we aren't interested to the comment when installing a schema
- insert statement: we want to make statistic over the schema, not over the data
During the light-cleaning in addition to the zero-cleaning phase, are removed from the schema the following field:
- default blob/text values: the dump of MySQL creates default values for text or blob field; in the latest release, blob and default field can't have a default value(cfr. MySQL reference Manual, http://dev.mysql.com/doc/refman/5.0/en/index.html)
During the full-cleaning process either the zero-cleaning and light cleaning phase are performed. The field left into the schema are:
- length of the datatype (where present)
All the information about indexes, keys, default value are deleted.
Future improvement will consider the integration of MySQL migration tool, a suite for migrating schema and data from various relational database systems to MySQL, into the Schema Evolution Suite
- dbbasename: the name that will use all the revision during the installation. Suppose we have downloaded the revision 1522,1523,1524 of the xxx database schema; if we set the dbbasename to xxx, this will be installed with the following name xxx1522, xxx1523, xxx1524.
This avoid any conflict with previous installed version of the database.
- cleaningSchema: (zero, light, full) the degree of cleaning we want to obtain over the schema
The filling operations are performed in a dynamic way; the application analyze the tables of the information schema and create a corresponding table for the evolution database and fill the table with the data retrieved from the information schema. In this way is possible to keep the application up to date even when the information schema is changed The parameters to set are
- dbevolution: the name of the database that will store all the data about statistics and the schema informations; the database is released with the application.
- headerEvolutionTables: allows to set a header for the evolution tables to avoid conflicts with the name of the information schema tables
- engine: the engine of the table created during this phase
- char_set: the character set to set up for the tables
This phase uses the same parameters of the filling phase; all the revisions of the database will be dropped.
Once the configuration file is ready, before running the application, you need to set the argument; the only argument you need is the path of the configuration file. In the argument subsection put -c ./config/config.xml to avoid any start error: the application can't start without the config file MenuApplication.java contains main functions.