Benchmark Downloadables
This page list all the available resources of the Pantha Rei Schema Evolution Benchmark
For Information or comments please Contact: Carlo A. Curino [1]
Contents |
Available Schema
The base source of information is the MediaWiki SVN, freely browsable at:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup
However to simplify life we provide a .tar.gz download of all the schema versions. It also contains a set of scripts to create, load and delete the entire MediaWiki history.
http://yellowstone.cs.ucla.edu/schema-evolution/documents/mediawiki-schema.tar.gz
Available Queries
In our dataset we have a mix of synthetic and real queries.
Synthetic Queries
The synthetic queries we have are divided into two classes, a set of queries generated by installing MediaWiki (different versions) and logging the query generating during typical user sessions, and completely synthetic queries. This last set contains queries operating on entire queries and on single attributes and can be use to obtain a rough estimation of the portion of the schema being affected by an evolution step.
Lab-Generated MediaWiki Queries:
- Mediawiki 1.3 (~4,175 query + update instances): http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw13_query_update_all.sql
- Mediawiki 1.3 (~1,948 distinct instances): http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw13_query_update_distinct.sql
- Mediawiki 1.3 (~1,657 distinct query only instances): http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw13_query_only_distinct.sql
- Mediawiki 1.3 (~75 query templates): http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw13_legacy_template.sql
- Mediawiki 1.11 (~2,346 query update instances):http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw1.11_query_update_all.sql
- Mediawiki 1.11 (~385 query update distinct instances):http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw1.11_query_update_distinct.sql
- Mediawiki 1.11 (~147 query distinct instances):http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw1.11_query_only_distinct.sql
- Mediawiki 1.11 (~256 query templates):http://yellowstone.cs.ucla.edu/schema-evolution/documents/mw1.11_256_query_templates.sql
Syntethic Data:
- MediaWiki 1.3 (Version 28, SVN commit 4696) synthetic 1-attribute (133) queries: http://yellowstone.cs.ucla.edu/schema-evolution/documents/v28_synthetic.sql
Real Queries
The real queries have been obtain by cleaning the log of the Wikipedia on-line profiler available at: http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000
The available data are the templates, as extracted by the MediaWiki profiler.
- (~1,945) query templates, cross version: http://yellowstone.cs.ucla.edu/schema-evolution/documents/wikipedia_real_all.sql
Moreover we report the data available from the Wikipedia profiler, i.e., number of execution of each query, cpu and real execution time.
From the Wikipedia profiler the information on the unit measure and exact column semantics is not available, therefore we report the data as-is, the interested user can refer to the on-line profiler for further details.
The data are reported in a CSV file.
- (~1,945) query templates and workload data: http://yellowstone.cs.ucla.edu/schema-evolution/documents/wikipedia_profiler_data.csv
Available Data
Wikipedia Dump Download
Since MediaWiki is used by over 30.000 wiki websites around the world including Wikipedia the availability of DB data is impressive. In particular the Wikimedia Foundation releases the entire Wikipedia DB dump bi-weekly. The user can experience with small data-set for some non-popular language (<10Mb) or work on the entire English Wikipedia *enwiki* or even install the entire dataset > 700Gb.
To obtain updated data contents please refer to the official Wikimedia page: http://download.wikimedia.org/backup-index.html the downloads are xml and sql and appropriate tools are offered to simplify the import of the data. Please bare in mind that the prefered DBMS for MediaWiki is MySQL.
If for your research you are interested into older backup, please contact us.
Available Installed Softwares
Installed Versions of MediaWiki
In order to ease the testing of MediaWiki backend we provide a set of data and installed version of MediaWiki freely accessible. To provide a comparison of the features available in the main MediaWiki Software Release we installed all of them, and they are available to test. [2]
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.2.0/ 13-Mar-2004 08:08
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.3.0/ 02-Aug-2004 09:51
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.4.0/ 07-Mar-2005 18:07
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.5.1/ 22-Jul-2005 23:30
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.6.0/ 05-Apr-2006 03:11
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.7.0/ 07-Jul-2006 10:30
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.8.0/ 10-Oct-2006 15:37
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.9.0/ 10-Jan-2007 12:38
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.10.0/ 22-Apr-2007 14:17
- http://yellowstone.cs.ucla.edu/mediawiki/mediawiki-1.11.0/ 28-Jun-2007 18:19
Moreover, we install the the dump of enwikisource (roughly 2.130.000 tuples for 6.4 Gb of data) which is available for browsing here:
Full Access to the Backend MySQL DB
To provide further insight we setup an access to the MySQL backend for the above installations (including the enwikisource mirror). In this way the user can freely access the MediaWiki backend and test simple queries. The access is limited to reading, in order to avoid vandalism. user: "mediawikireader" password: "imareader" The phpMyAdmin web access is the following: http://yellowstone.cs.ucla.edu/phpMyAdmin/
EXTENSION
This section report the temporary results of an ongoing effort aimed at extending the existing dataset. The data available must be considered raw material to be used "as is".
Joomla! 1.5 Schema Evolution
The SVN revision of the SQL script of the Joomla! 1.5 schema is available at:
The following .tar.gz file:
contains a dump of all the revisions (46) of the schema and few simple scripts that can be used for:
- re-download an updated set of schemas from the SVN repository
- batch install all the schema versions in a MySQL system
- batch remove all the schema versions from a MySQL system
- (todo) compute a simple set of statistics
TikiWiki Schema Evolution
The SVN revision of the SQL script of the TikiWiki schema is available at:
The following .tar.gz file:
contains a dump of all the revisions (152) of the schema and few simple scripts that can be used for:
- re-download an updated set of schemas from the SVN repository
- batch install all the schema versions in a MySQL system
- batch remove all the schema versions from a MySQL system
- (todo) compute a simple set of statistics
XOOPS Dynamic Web CMS
XOOPS is a dynamic web content management system written in PHP for the MySQL database. Its object orientation makes it an ideal tool for developing small or large community websites, intra company and corporate portals, weblogs and much more.
Pupolarity: 6,559,127 download from sourceforge at 05/22/2008
The SVN revision of the SQL script of the TikiWiki schema is available at:
The following .tar.gz file:
- (todo)
contains a dump of all the revisions (152) of the schema and few simple scripts that can be used for:
- re-download an updated set of schemas from the SVN repository
- batch install all the schema versions in a MySQL system
- batch remove all the schema versions from a MySQL system
- (todo) compute a simple set of statistics
Coppermine Photo Gallery:
Coppermine is an easily set-up, fast, feature-rich photo gallery script with mySQL database, user management, private galleries, automatic thumbnail creation, ecard feature and a template system for easy customization to match the rest of a site.
Pupolarity: 4,681,872 download from sourceforge at 05/22/2008
The SVN revision of the SQL script of the TikiWiki schema is available at:
https://coppermine.svn.sourceforge.net/svnroot/coppermine/trunk/cpg1.5.x/sql/schema.sql
The following .tar.gz file:
- (todo)
contains a dump of all the revisions (152) of the schema and few simple scripts that can be used for:
- re-download an updated set of schemas from the SVN repository
- batch install all the schema versions in a MySQL system
- batch remove all the schema versions from a MySQL system
- (todo) compute a simple set of statistics