Benchmark Downloadables

From Schema Evolution
Jump to: navigation, search

This page list all the available resources of the Pantha Rei Schema Evolution Benchmark

For Information or comments please Contact: Carlo A. Curino [1]

Contents

Available Schema

The base source of information is the MediaWiki SVN, freely browsable at:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup

However to simplify life we provide a .tar.gz download of all the schema versions. It also contains a set of scripts to create, load and delete the entire MediaWiki history.

http://yellowstone.cs.ucla.edu/schema-evolution/documents/mediawiki-schema.tar.gz

Available Queries

In our dataset we have a mix of synthetic and real queries.

Synthetic Queries

The synthetic queries we have are divided into two classes, a set of queries generated by installing MediaWiki (different versions) and logging the query generating during typical user sessions, and completely synthetic queries. This last set contains queries operating on entire queries and on single attributes and can be use to obtain a rough estimation of the portion of the schema being affected by an evolution step.

Lab-Generated MediaWiki Queries:

Syntethic Data:


Real Queries

The real queries have been obtain by cleaning the log of the Wikipedia on-line profiler available at: http://noc.wikimedia.org/cgi-bin/report.py?db=enwiki&sort=real&limit=50000

The available data are the templates, as extracted by the MediaWiki profiler.

Moreover we report the data available from the Wikipedia profiler, i.e., number of execution of each query, cpu and real execution time.

From the Wikipedia profiler the information on the unit measure and exact column semantics is not available, therefore we report the data as-is, the interested user can refer to the on-line profiler for further details.

The data are reported in a CSV file.

Available Data

Wikipedia Dump Download

Since MediaWiki is used by over 30.000 wiki websites around the world including Wikipedia the availability of DB data is impressive. In particular the Wikimedia Foundation releases the entire Wikipedia DB dump bi-weekly. The user can experience with small data-set for some non-popular language (<10Mb) or work on the entire English Wikipedia *enwiki* or even install the entire dataset > 700Gb.

To obtain updated data contents please refer to the official Wikimedia page: http://download.wikimedia.org/backup-index.html the downloads are xml and sql and appropriate tools are offered to simplify the import of the data. Please bare in mind that the prefered DBMS for MediaWiki is MySQL.

If for your research you are interested into older backup, please contact us.

Available Installed Softwares

Installed Versions of MediaWiki

In order to ease the testing of MediaWiki backend we provide a set of data and installed version of MediaWiki freely accessible. To provide a comparison of the features available in the main MediaWiki Software Release we installed all of them, and they are available to test. [2]

SoftwareRelease.png



Moreover, we install the the dump of enwikisource (roughly 2.130.000 tuples for 6.4 Gb of data) which is available for browsing here:

Full Access to the Backend MySQL DB

To provide further insight we setup an access to the MySQL backend for the above installations (including the enwikisource mirror). In this way the user can freely access the MediaWiki backend and test simple queries. The access is limited to reading, in order to avoid vandalism. user: "mediawikireader" password: "imareader" The phpMyAdmin web access is the following: http://yellowstone.cs.ucla.edu/phpMyAdmin/

Benchmark Extensions (ONGOING)

The Benchmark is currently being extended. Page Benchmark Extension contains the current extensions.

Personal tools