Download and Installation

Server side

Licence

This software is released under the GPL2 License.

Prerequisites

A linux machine with:

  • apache server with mod_wsgi enabled
  • swig
  • poppler
  • wkhtmltopdf

For example using a Debian 6 stable distribution with the standard system, install these packages:

apt-get install \
ssh git-core \
swig \
libpoppler5 \
libpoppler-dev \
g++ \
make \
python-dev \
python-simplejson \
python-imaging

Install wkhtmltopdf: (in case of problems follow the instructions on the web site http://code.google.com/p/wkhtmltopdf/)

apt-get install openssl build-essential xorg libssl-dev libxrender-dev
cd <TEMP_DIR>
wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1.tar.bz2
tar -xvf wkhtmltopdf-0.11.0_rc1.tar.bz2
ln -s wkhtmltopdf-0.11.0_rc1 wkhtmltopdf
apt-get install libqt4-dev qt4-dev-tools
cd wkhtmltopdf
qmake-qt4
make && make install

Install py-wkhtmltox:

cd <TEMP_DIR>
git clone https://github.com/mreiferson/py-wkhtmltox.git
cd py-wkhtmltox
python setup.py install

Installation

Create an installation folder

mkdir <INSTALLATION_DIR>
cd <INSTALLATION_DIR>

Get the code source:

git clone git://github.com/rero/multivio_server.git

Build from the source:

cd multivio_server
python setup.py build --build-platlib build

Note: If you decide to install Poppler from the sources, you have to install the XPDF headers and change the configuration file setup.py in order to specify an alternate location (for example: /usr/local). Note that currently, only version 0.12.4 of 'Poppler is supported.

(Optional) Run the unit tests [currently broken]:

### cd test
### PYTHONPATH=$PYTHONPATH:../build
### python ./test_all.py
### cd .. 

Copy and edit the configuration file (namely the variables General.temp_dir and Logger.name)

cp tools/mvo_config_example.py ./build/multivio/mvo_config.py
vi ./build/multivio/mvo_config.py
mkdir <temp_dir>

Run a standalone server and test it using the URL: http://localhost:4041/help

python ./build/multivio/dispatcher_app.py

Deployment using Apache

Install the require packages:

apt-get install libapache2-mod-wsgi

Install the Multivio server python package:

python setup.py install

Note: you can use the "--prefix" option to specify the target directory, this can be useful to have several instances of the Multivio server such as test/dev environment.

Copy the main script into the web directory:

mkdir -p /var/www/multivio/server
cp /usr/bin/multivio_server.py /var/www/multivio_server.py

Copy and edit the configuration file in order to specify the used directories:

cp /usr/local/bin/mvo_config_example.py /var/www/multivio/server/mvo_config.py
vi /var/www/multivio/server/mvo_config.py

Create the cache directory and make it writable for the apache user:

mkdir /var/tmp/multivio
chown www-data:www-data /var/tmp/multivio

Make sure that all needed files and folders exist and are writable for the apache user, for example:

mkdir /var/www/multivio/temp
chown www-data:www-data /var/www/multivio/temp
touch /var/log/multivio.log
chown www-data:www-data /var/log/multivio.log

Create a new apache virtual host:

vi /etc/apache2/sites-available/multivio.conf

Here is an example of the multivio.conf file:

WSGIScriptAlias /server /var/www/multivio/server/multivio_server.py

Activate the virtual host:

cd /etc/apache2/sites-enabled/
ln -s ../sites-available/multivio.conf .

Restart apache:

apache2ctl restart

The Multivio server is now up and running at: http://localhost/server/help/ (or http://<YOUR_HOST_URL>/server/help/)

Add a client layer

The instructions are provided below.

Configuration File

class MVOConfig:
    """Main class for configuration."""

    class General:
        """General config."""

        #this directory will contain the remote document rendered by the server
        #you have to clean up to avoid a full disc, for example using a crontab entry
        temp_dir = '/reroweb/var/tmp/multivio'

        #directory where the python multivio package is, especially useful to have
        #multiple multivio server instances
        #typically the directory specified by the "--prefix" option during the Installation
        sys_pathes = ['']

    class Url:
        """Configuration for uploads."""

        #to solved file server with robots control
        user_agent = 'Firefox/3.5.2'
        #timeout for not responding server
        timeout = 120 #2 minutes

    class Logger:
        """Config for logging."""

        #prefix log message
        name = "multivio"

        #log file name
        file_name = "/var/log/multivio.log"

        #put the message on stdout, should be true only for wsgi standalone server
        console = False
        
        #level of logging (DEBUG, INFO)
        level = logging.INFO

Implementing a new document parser

To support a new metadata format, you have to implement your parser in python by extending the Parser class. You have to implement 3 methods:

  • get_metadata
  • get_logical_structure
  • get_physical_structure

For example create a new dummy parser

#---------------------------- Modules ---------------------------------------

# import of standard modules
import sys

# local modules
from parser import DocumentParser, ParserError

#----------------------------------- Classes -----------------------------------

class DummyParser(DocumentParser):
    """To parse a Dummy document"""

    def __init__(self, file_stream, url):
        DocumentParser.__init__(self, file_stream)
        self._url = url

    def _check(self):
        """Always valid, but you should implement this method to check if the
        input is ok."""
        return True


    def get_metadata(self):
        """Get genera infos such as title, authors and language."""
        metadata = {}
        metadata['title'] = 'My very nice title'
        metadata['creator'] = ['Me', 'Others']
        metadata['language'] = 'en'
        self.logger.debug("Metadata: %s"% json.dumps(metadata, sort_keys=True,
                        indent=4))
        return metadata

    def get_physical_structure(self):
        """Get the physical structure. Manly the list of the content files."""
        phys_struct = [{
            'url': url,
            'label': 'My nice pdf document'
            }]
        self.logger.debug("Physical Structure: %s"% json.dumps(phys_struct,
                sort_keys=True, indent=4))
        return phys_struct


    def get_logical_structure(self):
        """Get the logical structure of the document.
        Such as Table of Contents.
        """
        logical_struct = [
          {
            "file_position": {
                "index": 1,
                "url": self._url
                },
            "label": "Page one"
          },
        ]
        self.logger.debug("Logical Structure: %s"% json.dumps(logical_struct,
                sort_keys=True, indent=4))
        return logical_struct

The second step consists to add your parser in the parser chooser/selector: Modify the multivio/parser_app.py file and create your parser in the "DocParserApp._choose_parser()" method. It can be based on a regexp on the mime type, for example, for a pdf parser we have:

        if re.match('.*?/pdf.*?', mime):
            self.logger.info("Pdf parser found!")
            return PdfParser(content, url, url.split('/')[-1])

Create a unit test and enjoy it!

Client side

Prerequisites

  • the server layer must be preinstalled, according to the above procedure

Installation

Create a client folder:

mkdir -p /var/www/multivio/client

Download the precompiled static client files and install them:

cd <TEMP_DIR>
wget http://demo.multivio.org/multivio/client_1.0.0.zip
unzip client_1.0.0.zip
cp -r client_1.0.0/* /var/www/multivio/client/

Update the apache config created during the server setup, in order to add the client configuration:

vi /etc/apache2/sites-available/multivio.conf

Add the follwing content to the file (note the support for English and French versions, with English as default):

Alias /client/fr /var/www/multivio/client/multivio/fr/1.0.0
Alias /client/en /var/www/multivio/client/multivio/en/1.0.0
Alias /client /var/www/multivio/client/multivio/en/1.0.0

Restart apache:

apache2ctl restart

The client can be now invoked at: http://localhost/client/... (or http://<YOUR_HOST_URL>/client/...). Example:

http://localhost/client/#get&url=http://doc.rero.ch/record/13255/export/xd

For a localized version, the URL is:

http://localhost/client/fr/#get&url=http://doc.rero.ch/record/13255/export/xd

http://localhost/client/en/#get&url=http://doc.rero.ch/record/13255/export/xd (by default)