Articles for Developer by Paolo Medici [PMX.it]. See License on bottom of this page.

WikiPedia Offline

Guide to develop a Wikipedia offline version for mobile device, partially based on this article.

First of all download Wikipedia offline bzip2 archive from:
English, Italian, or what you want. I usually download pages-articles.xml.bz2 version.

Bzip2 files are composed by several chunks (where uncompressed data are usually 900kb) bit-based (and not byte-based). Format can be understand analyzing bzip2recover tool. Exist some project of wiki for mobile phone, in particular the Indexer application. Any Wiki-article have to be bind to a particular block_offset. I wrote a modified version of bzip2recover, called bzip2indexer, able to decode the strea, write an index file in the format start_block_offset end_block_offset, and seek inside a BZip2 stream with BIT accuracy.

# bzip2indexer itwiki-20100408-pages-articles.xml.bz2 > index

An index file is written.