Download and Installation
Server side
Licence
This software is released under the GPL2 License.
Prerequisites
A linux machine with:
- apache server with mod_wsgi enabled
- swig
- poppler
- wkhtmltopdf
For example using a Debian 6 stable distribution with the standard system, install these packages:
apt-get install \ ssh git-core \ swig \ libpoppler5 \ libpoppler-dev \ g++ \ make \ python-dev \ python-simplejson \ python-imaging
Install wkhtmltopdf: (in case of problems follow the instructions on the web site http://code.google.com/p/wkhtmltopdf/)
apt-get install openssl build-essential xorg libssl-dev libxrender-dev cd <TEMP_DIR> wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1.tar.bz2 tar -xvf wkhtmltopdf-0.11.0_rc1.tar.bz2 ln -s wkhtmltopdf-0.11.0_rc1 wkhtmltopdf apt-get install libqt4-dev qt4-dev-tools cd wkhtmltopdf qmake-qt4 make && make install
Install py-wkhtmltox:
cd <TEMP_DIR> git clone https://github.com/mreiferson/py-wkhtmltox.git cd py-wkhtmltox python setup.py install
Installation
Create an installation folder
mkdir <INSTALLATION_DIR>
cd <INSTALLATION_DIR>
Get the code source:
git clone git://github.com/rero/multivio_server.git
Build from the source:
cd multivio_server
python setup.py build --build-platlib build
Note: If you decide to install Poppler from the sources, you have to install the XPDF headers and change the configuration file setup.py in order to specify an alternate location (for example: /usr/local). Note that currently, only version 0.12.4 of 'Poppler is supported.
(Optional) Run the unit tests [currently broken]:
### cd test
### PYTHONPATH=$PYTHONPATH:../build
### python ./test_all.py
### cd ..
Copy and edit the configuration file (namely the variables General.temp_dir and Logger.name)
cp tools/mvo_config_example.py ./build/multivio/mvo_config.py vi ./build/multivio/mvo_config.py mkdir <temp_dir>
Run a standalone server and test it using the URL: http://localhost:4041/help
python ./build/multivio/dispatcher_app.py
Deployment using Apache
Install the require packages:
apt-get install libapache2-mod-wsgi
Install the Multivio server python package:
python setup.py install
Note: you can use the "--prefix" option to specify the target directory, this can be useful to have several instances of the Multivio server such as test/dev environment.
Copy the main script into the web directory:
mkdir -p /var/www/multivio/server cp /usr/bin/multivio_server.py /var/www/multivio_server.py
Copy and edit the configuration file in order to specify the used directories:
cp /usr/local/bin/mvo_config_example.py /var/www/multivio/server/mvo_config.py vi /var/www/multivio/server/mvo_config.py
Create the cache directory and make it writable for the apache user:
mkdir /var/tmp/multivio chown www-data:www-data /var/tmp/multivio
Make sure that all needed files and folders exist and are writable for the apache user, for example:
mkdir /var/www/multivio/temp chown www-data:www-data /var/www/multivio/temp touch /var/log/multivio.log chown www-data:www-data /var/log/multivio.log
Create a new apache virtual host:
vi /etc/apache2/sites-available/multivio.conf
Here is an example of the multivio.conf file:
WSGIScriptAlias /server /var/www/multivio/server/multivio_server.py
Activate the virtual host:
cd /etc/apache2/sites-enabled/
ln -s ../sites-available/multivio.conf .
Restart apache:
apache2ctl restart
The Multivio server is now up and running at: http://localhost/server/help/ (or http://<YOUR_HOST_URL>/server/help/)
Add a client layer
The instructions are provided below.
Configuration File
class MVOConfig: """Main class for configuration.""" class General: """General config.""" #this directory will contain the remote document rendered by the server #you have to clean up to avoid a full disc, for example using a crontab entry temp_dir = '/reroweb/var/tmp/multivio' #directory where the python multivio package is, especially useful to have #multiple multivio server instances #typically the directory specified by the "--prefix" option during the Installation sys_pathes = [''] class Url: """Configuration for uploads.""" #to solved file server with robots control user_agent = 'Firefox/3.5.2' #timeout for not responding server timeout = 120 #2 minutes class Logger: """Config for logging.""" #prefix log message name = "multivio" #log file name file_name = "/var/log/multivio.log" #put the message on stdout, should be true only for wsgi standalone server console = False #level of logging (DEBUG, INFO) level = logging.INFO
Implementing a new document parser
To support a new metadata format, you have to implement your parser in python by extending the Parser class. You have to implement 3 methods:
- get_metadata
- get_logical_structure
- get_physical_structure
For example create a new dummy parser
#---------------------------- Modules --------------------------------------- # import of standard modules import sys # local modules from parser import DocumentParser, ParserError #----------------------------------- Classes ----------------------------------- class DummyParser(DocumentParser): """To parse a Dummy document""" def __init__(self, file_stream, url): DocumentParser.__init__(self, file_stream) self._url = url def _check(self): """Always valid, but you should implement this method to check if the input is ok.""" return True def get_metadata(self): """Get genera infos such as title, authors and language.""" metadata = {} metadata['title'] = 'My very nice title' metadata['creator'] = ['Me', 'Others'] metadata['language'] = 'en' self.logger.debug("Metadata: %s"% json.dumps(metadata, sort_keys=True, indent=4)) return metadata def get_physical_structure(self): """Get the physical structure. Manly the list of the content files.""" phys_struct = [{ 'url': url, 'label': 'My nice pdf document' }] self.logger.debug("Physical Structure: %s"% json.dumps(phys_struct, sort_keys=True, indent=4)) return phys_struct def get_logical_structure(self): """Get the logical structure of the document. Such as Table of Contents. """ logical_struct = [ { "file_position": { "index": 1, "url": self._url }, "label": "Page one" }, ] self.logger.debug("Logical Structure: %s"% json.dumps(logical_struct, sort_keys=True, indent=4)) return logical_struct
The second step consists to add your parser in the parser chooser/selector: Modify the multivio/parser_app.py file and create your parser in the "DocParserApp._choose_parser()" method. It can be based on a regexp on the mime type, for example, for a pdf parser we have:
if re.match('.*?/pdf.*?', mime): self.logger.info("Pdf parser found!") return PdfParser(content, url, url.split('/')[-1])
Create a unit test and enjoy it!
Client side
Prerequisites
- the server layer must be preinstalled, according to the above procedure
Installation
Create a client folder:
mkdir -p /var/www/multivio/client
Download the precompiled static client files and install them:
cd <TEMP_DIR>
wget http://demo.multivio.org/multivio/client_1.0.0.zip
unzip client_1.0.0.zip
cp -r client_1.0.0/* /var/www/multivio/client/
Update the apache config created during the server setup, in order to add the client configuration:
vi /etc/apache2/sites-available/multivio.conf
Add the follwing content to the file (note the support for English and French versions, with English as default):
Alias /client/fr /var/www/multivio/client/multivio/fr/1.0.0 Alias /client/en /var/www/multivio/client/multivio/en/1.0.0 Alias /client /var/www/multivio/client/multivio/en/1.0.0
Restart apache:
apache2ctl restart
The client can be now invoked at: http://localhost/client/... (or http://<YOUR_HOST_URL>/client/...). Example:
http://localhost/client/#get&url=http://doc.rero.ch/record/13255/export/xd
For a localized version, the URL is:
http://localhost/client/fr/#get&url=http://doc.rero.ch/record/13255/export/xd
http://localhost/client/en/#get&url=http://doc.rero.ch/record/13255/export/xd (by default)