Benchmark Extension

From Schema Evolution
Jump to: navigation, search

This section report the temporary results of an ongoing effort aimed at extending the existing dataset. The data available must be considered raw material to be used "as is".

Contents

CMS and Wiki

MediaWiki Schema Evolution

This is an update of the schema history to the current 05/23/2008

The SVN revision of the SQL script of the MediaWiki schema is available at:

The following .tar.gz file:

contains a dump of all the revisions (194) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • compute a simple set of statistics


Joomla! 1.5 Schema Evolution

Joomla! is an award-winning Content Management System (CMS) that will help you build websites and other powerful online applications. Best of all, Joomla! is an open source solution that is freely available to everybody.

The SVN revision of the SQL script of the Joomla! 1.5 schema is available at:

The following .tar.gz file:

contains a dump of all the revisions (46) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • (todo) compute a simple set of statistics

TikiWiki Schema Evolution

TikiWiki (Tiki) is your Groupware/CMS (Content Management System) solution. Tiki has the features you need:

  • Wikis (like Wikipedia)
  • Forums (like phpBB)
  • Blogs (like WordPress)
  • Articles (like Digg)
  • Image Gallery (like Flickr)
  • Map Server (like Google Maps)
  • Link Directory (like DMOZ)
  • Multilingual (like Babel Fish)
  • Bug tracker (like Bugzilla)
  • Free source software (LGPL)

The SVN revision of the SQL script of the TikiWiki schema is available at:

The following .tar.gz file:

contains a dump of all the revisions (152) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • (todo) compute a simple set of statistics

XOOPS Dynamic Web CMS

XOOPS is a dynamic web content management system written in PHP for the MySQL database. Its object orientation makes it an ideal tool for developing small or large community websites, intra company and corporate portals, weblogs and much more.

Popularity: 6,559,127 download from sourceforge at 05/22/2008

The SVN revision of the SQL script of the TikiWiki schema is available at:

The following .tar.gz file:

contains a dump of all the revisions (14) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • (todo) compute a simple set of statistics

Coppermine Photo Gallery:

Coppermine is an easily set-up, fast, feature-rich photo gallery script with mySQL database, user management, private galleries, automatic thumbnail creation, ecard feature and a template system for easy customization to match the rest of a site.

Popularity: 4,681,872 download from sourceforge at 05/22/2008

The SVN revision of the SQL script of the TikiWiki schema is available at:

https://coppermine.svn.sourceforge.net/svnroot/coppermine/trunk/cpg1.5.x/sql/schema.sql

The following .tar.gz file:

contains a dump of all the revisions (69) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • (todo) compute a simple set of statistics

TYPO3 Content Management Framework

TYPO3 is an enterprise class Web CMS written in PHP/MySQL. It's designed to be extended with custom written backend modules and frontend libraries for special functionality. It has very powerful integration of image manipulation.

Popularity: 3,277,323 download from sourceforge at 05/22/2008

The SVN revision of the SQL script of the TikiWiki schema is available at:

https://typo3.svn.sourceforge.net/svnroot/typo3/TYPO3core/trunk/t3lib/stddb/tables.sql

The following .tar.gz file:

contains a dump of all the revisions (39) of the schema and few simple scripts that can be used for:

  • re-download an updated set of schemas from the SVN repository
  • batch install all the schema versions in a MySQL system
  • batch remove all the schema versions from a MySQL system
  • (todo) compute a simple set of statistics

Medicine/Biology Databases

Ensembl Genetic DB

The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

Project homepage: http://www.ensembl.org

Schemas, integrity constraints and patches: http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/sql/?root=ensembl

Interesting set of queries: http://svn.warelab.org/maize/ensembl/branches/release-3a.50/sql/

Online MySQL servers:

  • Server Up to version 47 Version 48 onwards
  • ensembldb.ensembl.org 3306 5306
  • martdb.ensembl.org 3316 5316

Docuemntation of the schema and system: http://www.ensembl.org/info/docs/api/index.html

GrainGene

The GrainGenes 2.0 is a DB for Triticeae and Avena, releasing the current schema of the DB Backend available at: http://wheat.pw.usda.gov/ggmigration/gg_schema_mysql/

An on-line interface to formulate SQL queries to the DB: http://wheat.pw.usda.gov/cgi-bin/graingenes/sql.cgi?pre=0

UCSC Genome Bioinformatics

The UCSC database is a MySQL based project. http://genome.ucsc.edu/

BioSQL

BioSQL is a joint effort between the OBF projects (BioPerl, BioJava etc) to support a shared database schema for storing sequence data. In theory, you could load a GenBank file into the database with BioPerl, then using Biopython extract this from the database as a record object with featues - and get more or less the same thing as if you had loaded the GenBank file directly as a SeqRecord using SeqIO.

This is a promising source of data for our benchmark!

SVN: http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/sql/biosqldb-mysql.sql

The schema we collect at 05 Sep. 2008 are 46 and are available here: [1]

GUS

The Genomics Unified Schema (GUS) is an extensive relational database schema and associated application framework designed to store, integrate, analyze and present functional genomics data. The GUS schema supports a wide range of data types including genomics, gene expression, transcript assemblies, proteomics and others. It emphasizes standards-based ontologies and strong-typing.

The GUS Application Framework offers an object-relational layer and a Plugin API used to rapidly create robust data loading programs for diverse data sources. The GUS distribution includes plugins for standard data sources. The GUS Web Development Kit (WDK) is a rich environment for efficiently designing sophisticated query-based websites with little programming required.

Their about page: http://www.gusdb.org/about.php The SVN: https://www.cbil.upenn.edu/svn/gus/

NCBO

The National Center for Biomedical Ontology is a consortium of leading biologists, clinicians, informaticians, and ontologists who develop innovative technology and methods allowing scientists to create, disseminate, and manage biomedical information and knowledge in machine-processable form.

In this Context they use relational DB backend.

The SVN: http://smi-protege.stanford.edu/repos/cbio/ncbo/trunk/conf/ncbo_tables.sql


Open EMR

OpenEMR is a free medical practice management, electronic medical records, prescription writing, and medical billing application. These programs are also referred to as electronic health records. OpenEMR is licensed under the General Gnu Public License (General GPL). It is a free open source replacement for medical applications such as Medical Manager, Health Pro, and Misys. It features support for EDI billing to clearing houses such as MedAvant and ZirMED using ANSI X12. Medical claim and accounts receivable are accomplished through SQL-Ledger, which has been customized. Calendar features include categories for appointment types, colors associated with appointment types, repeating appointments, and the ability to restrict appointments based on type. There are customizable medical encounter forms, support for voice recognition software, and electronic or scanned digital document management for records.

The homepage: http://www.oemr.org/ The SourceForge project URL: http://sourceforge.net/projects/openemr/

Genomic DB Survey

Another relevant source is [2] where Erika De Francesco and Simona Rombo provide a survey of almost 80 genomic databases.

CERN Physics DBs

In this subsection we collect datasets coming from the CERN research center.


GridCC

The GRIDCC is a three-year project funded by the European Commission. Its goal is integrating instruments and sensors with the traditional Grid resources. The GRIDCC middleware is being designed bearing in mind use cases from a very diverse set of applications, and as the result, the GRIDCC architecture provides access to the instruments in as generic a way as possible. GRIDCC is also developing an adaptable user interface and a mechanism for executing complex workflows in order to increase both the usability and the usefulness of the system. The new middleware is incorporated into significant applications that will allow the software validation in terms both of functionality and quality of service. The pilot application this paper focuses on is applying GRIDCC to support Remote Operations of the ELETTRA synchrotron radiation facility. We describe the results of implementing via GRIDCC complex workflows involved in the both routine operations and troubleshooting scenarios. In particular, the implementation of an orbit correction feedback shows the level of integration of instruments and traditional Grid resources which can be reached using the GRIDCC middleware.

Number of Schema Versions: 6

SVN for the MySQL DB Schema: http://sadgw.lnl.infn.it:8000/cgi-cvs/gridCC/framework/installation/configuration/databases/mysql/mysqlRunNumber.sql?sortby=date&only_with_tag=MAIN


ATLAS

ATLAS is a particle physics experiment at the Large Hadron Collider at CERN. Starting in Spring 2009, the ATLAS detector will search for new discoveries in the head-on collisions of protons of extraordinarily high energy. ATLAS will learn about the basic forces that have shaped our universe since the beginning of time and that will determine its fate. Among the possible unknowns are the origin of mass, extra dimensions of space, microscopic black holes, and evidence for dark matter candidates in the universe.

Trigger

Trigger is one of the software in the ATLAS project, its homepage is here. This is the SVN for the DB schema (Oracle): http://atdaq-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/offline/Trigger/TrigConfiguration/TrigDb/share/sql/combined_schema.sql (77 schema versions in 2 years, status active)

http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/offline/MuonSpectrometer/MuonOracleScripts/MuonMatters/GENV.sql?revision=1.6&view=markup

http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/offline/Trigger/TrigConfiguration/TriggerTool/external/combined_schema.sql?view=log&pathrev=MAIN

http://atlas-sw.cern.ch/cgi-bin/viewcvs-atlas.cgi/offline/Database/RunLumi/RunLumiPrototype/ddl/Run_Lum_DB_1_tab.sql?view=log


EGEE JRA1 Middleware

The mandate of the JRA1 Activity is to provide a reference open source implementation of the foundation services that are application independent and need to be deployed at all sites connected to the infrastructure. On top this foundation, an open-ended set of application specific higher-level services that can be deployed on-demand at specific sites are provided directly by JRA1 or integrated from other sources and projects.

Grid Foundation Middleware comprises all services that need to be deployed on a production Grid infrastructure in order to provide a consistent, dependable service. It can be regarded as the Middleware Infrastructure.

Oracle CVS schema: http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/org.glite.data.transfer-common/config/schema/oracle/oracle-schema.sql?view=log&pathrev=MAIN

MySQL CVS schema: http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/org.glite.data.transfer-common/config/schema/mysql/mysql-schema.sql?view=log&pathrev=MAIN (17 version, in 8 months, 3 years old)


CASTOR

CASTOR, stands for the CERN Advanced STORage manager, is a hierarchical storage management (HSM) system developed at CERN used to store physics production files and user files. Files can be stored, listed, retrieved and accessed in CASTOR using command line tools or applications built on top of the different data transfer protocols like RFIO (Remote File IO), ROOT libraries, GridFTP and XROOTD. CASTOR manages disk cache(s) and the data on tertiary storage or tapes. Currently (2007) there are some 109 million files and about 15 petabyte of data in CASTOR.


Oracle CVS Schema SRM2: http://isscvs.cern.ch/cgi-bin/cvsweb.cgi/SRM2/srm/db/srm_oracle_create.sql?cvsroot=castor (45 versions, in 3 years, active)


Oracle CVS Schema CASTOR2: http://isscvs.cern.ch/cgi-bin/cvsweb.cgi/CASTOR2/castor/db/castor_oracle_create.sql?cvsroot=castor (149 versions, 3 years, active)

DQ2

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/dq2.binky.dao.oracle/config/db/schema-binky-oracle.sql?root=atlas-dq2&view=log (17 versions, Oracle)

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/dq2.agents.dao.mysql/config/db/schema-mysql-5.x.sql?root=atlas-dq2&view=log (51 versions, MySQL)

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/dq2.tracker.server.mysql/config/db/schema-tracker-mysql.sql?root=atlas-dq2&view=log

DRAC

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/DIRAC/python/DIRAC/WMS/DB/db-schema.sql?root=dirac&view=log (41 versions, Oracle)

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/DIRAC3/DIRAC/ProductionManagementSystem/DB/ProductionDB.sql?root=dirac&view=log (14 versions, Oracle)


ELFMS

http://isscvs.cern.ch:8180/cgi-bin/viewcvs-all.cgi/elfms/quattor/cdb2sql-ora/schemas/?root=elfms (MANY DBs, Oracle)


Other

http://glite.cvs.cern.ch:8180/cgi-bin/glite.cgi/org.glite.data.hydra-service/config/schema/mysql/mysql-schema.sql?view=log&pathrev=MAIN (5 versions, MySQL)

http://isscvs.cern.ch/cgi-bin/viewcvs-all.cgi/TriDAS/ecal/ecalDB/sql/create_new_fe_daq_config.sql?root=tridas&view=log (7 versions, Oracle)

Personal tools