Welcome to Pacifica Cartd’s documentation!

The Pacifica Cartd service provides data staging and bundling for user consumption of data.

Installation

The Pacifica software is available through PyPi so creating a virtual environment to install is what is shown below. Please keep in mind compatibility with the Pacifica Core services.

Installation in Virtual Environment

These installation instructions are intended to work on both Windows, Linux, and Mac platforms. Please keep that in mind when following the instructions.

Please install the appropriate tested version of Python for maximum chance of success.

Linux and Mac Installation

mkdir ~/.virtualenvs
python -m virtualenv ~/.virtualenvs/pacifica
. ~/.virtualenvs/pacifica/bin/activate
pip install pacifica-cartd

Windows Installation

This is done using PowerShell. Please do not use Batch Command.

mkdir "$Env:LOCALAPPDATA\virtualenvs"
python.exe -m virtualenv "$Env:LOCALAPPDATA\virtualenvs\pacifica"
& "$Env:LOCALAPPDATA\virtualenvs\pacifica\Scripts\activate.ps1"
pip install pacifica-cartd

Configuration

The Pacifica Core services require two configuration files. The REST API utilizes CherryPy and review of their configuration documentation is recommended. The service configuration file is a INI formatted file containing configuration for database connections.

CherryPy Configuration File

An example of Cartd server CherryPy configuration:

[global]
log.screen: True
log.access_file: 'access.log'
log.error_file: 'error.log'
server.socket_host: '0.0.0.0'
server.socket_port: 8081

[/]
request.dispatch: cherrypy.dispatch.MethodDispatcher()
tools.response_headers.on: True
tools.response_headers.headers: [('Content-Type', 'application/json')]

Service Configuration File

The service configuration is an INI file and an example is as follows:

[cartd]
; This section describes cartd specific configuration

; Local directory to stage data
volume_path = /tmp/

; Least recently used buffer time
lru_buffer_time = 0

; Bundle backend task enable/disable
bundle_task = True

[archiveinterface]
; This section describe where the archive interface is

; URL to the archive interface
url = http://127.0.0.1:8080/

[celery]
; This section describe celery task configuration

; Broker message url
broker_url = pyamqp://

; Backend task channel
backend_url = rpc://

[database]
; This section contains database connection configuration

; peewee_url is defined as the URL PeeWee can consume.
; http://docs.peewee-orm.com/en/latest/peewee/database.html#connecting-using-a-database-url
peewee_url = sqliteext:///db.sqlite3

; connect_attempts are the number of times the service will attempt to
; connect to the database if unavailable.
connect_attempts = 10

; connect_wait are the number of seconds the service will wait between
; connection attempts until a successful connection to the database.
connect_wait = 20

Starting the Service

Starting the Cartd service can be done by two methods. However, understanding the requirements and how they apply to REST services is important to address as well. Using the internal CherryPy server to start the service is recommended for Windows platforms. For Linux/Mac platforms it is recommended to deploy the service with uWSGI.

Deployment Considerations

The Cartd service stages data for consumption by data users. This service (like Ingest) should be put on the edge of your infrastructure to allow for fast access. Other considerations about data transfers over these networks should also be considerred. ESNet has some good documentation on how to optimize Linux for fast data transfers.

CherryPy Server

To make running the Cartd service using the CherryPy’s builtin server easier we have a command line entry point.

$ pacifica-cartd --help
usage: pacifica-cartd [-h] [-c CONFIG] [--cpconfig CONFIG] [-p PORT]
                      [-a ADDRESS]

Run the cart server.

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        cart config file
  --cpconfig CONFIG     cherrypy config file
  -p PORT, --port PORT  port to listen on
  -a ADDRESS, --address ADDRESS
                        address to listen on
$ pacifica-cartd-cmd dbsync
$ pacifica-cartd
[09/Jan/2019:09:17:26] ENGINE Listening for SIGTERM.
[09/Jan/2019:09:17:26] ENGINE Bus STARTING
[09/Jan/2019:09:17:26] ENGINE Set handler for console events.
[09/Jan/2019:09:17:26] ENGINE Started monitor thread 'Autoreloader'.
[09/Jan/2019:09:17:26] ENGINE Serving on http://0.0.0.0:8081
[09/Jan/2019:09:17:26] ENGINE Bus STARTED

uWSGI Server

To make running the Cartd service using uWSGI easier we have a module to be included as part of the uWSGI configuration. uWSGI is very configurable and can use this module many different ways. Please consult the uWSGI Configuration documentation for more complicated deployments.

$ pip install uwsgi
$ uwsgi --http-socket :8081 --master --module pacifica.cartd.wsgi

Example Usage

Every cart has a unique ID associated with it. For the examples following we used a uuid generated by standard Linux utilities.

MY_CART_UUID=`uuidgen`

The REST API

The REST API is available for users of the system and is in general a method based endpoint with JSON objects for data.

Create a Cart

Post a file to create a new cart.

Contents of file (foo.json).

id = the id being used on the Archive

path = internal structure of bundle for file placement

hashtype = hashlib hashtype used to generate hashsum

hashsum = the hash (hex value) of the file using the hashtype listed

{
  "fileids": [
    {"id":"foo.txt", "path":"1/2/3/foo.txt", "hashtype":"md5", "hashsum":""},
    {"id":"bar.csv", "path":"1/2/3/bar.csv", "hashtype":"md5", "hashsum":""},
    {"id":"baz.ini", "path":"2/3/4/baz.ini", "hashtype":"md5", "hashsum":""}
  ]
}

Post the file to the following URL.

curl -X POST --upload-file /tmp/foo.json http://127.0.0.1:8081/$MY_CART_UUID

Status a Cart

Head on the cart to find whether its created and ready for download.

curl -I -X HEAD http://127.0.0.1:8081/$MY_CART_UUID

Will receive headers back with the specific data needed. These are:

‘X-Pacifica-Status’ ‘X-Pacifica-Message’

Message will be blank if there is no error. The list of possible status:

If the cart is waiting to be processed and there is no current state. “X-Pacifica-Status”: “waiting”

If the cart is being processed and waiting for files to be staged locally. “X-Pacifica-Status”: “staging”

If the cart has the files locally and is currently creating the tarfile. “X-Pacifica-Status”: “bundling”

If the cart is finally ready for download. “X-Pacifica-Status”: “ready”

If the cart has an error (such as no space available to create the tarfile). “X-Pacifica-Status”: “error” “X-Pacifica-Message”: “No Space Available”

Get a cart

To download the tarfile for the cart.

curl http://127.0.0.1:8081/$MY_CART_UUID?filename=my_cart.tar

In the above url my_cart.tar can be any file name of your choice
If no filename parameter is present you will get back data_date.tar in the form data_YYYY_MM_DD_HH_MM_SS.tar

To save to file

curl -O -J http://127.0.0.1:8081/$MY_CART_UUID?filename=my_cart.tar

-O says to save to a file, and -J says to use the Content-Disposition file name the server is trying to send back

Once this finishes there will be a tar file named my_cart.tar
Untar by:

tar xf my_cart.tar

Delete a Cart

Delete a created cart.

curl -X DELETE http://127.0.0.1:8081/$MY_CART_UUID

Data returned should be json telling you status of cart deletion.

Admin Commands

There is some interfaces to the internals of the carts via the admin command line interface pacifica-cartd-cmd.

Database Management

The command line interface has a couple of database management commands to verify the state of the database and whether it needs updating.

To check the current state of the database, run the following:

pacifica-cartd-cmd dbchk
echo $?

To update the database to the current version, run the following:

pacifica-cartd-cmd dbsync
echo $?

If either of these commands fail you may have issues connecting to the database configured. Be sure you are using the right configuration files for connecting to your database.

Rebuild or Fix a Cart

Sometimes carts will fail to build, build partially, or maybe you just want to handle cart building out of band. The command line interface has a way to do this.

pacifica-cartd-cmd fixit --cartid $MY_CART_UUID

Purge Old Carts that are older than a specific date.

Sometimes carts dont get cleaned up by users, and need to be expired. The command line interface has a way to do this.

pacifica-cartd-cmd purge --time-ago="60 days ago"

Cartd Python Module

Archive Requests Python Module

Configuration Python Module

Globals Python Module

ORM Python Module

REST Python Module

Celery Tasks Python Module

Utilities Python Module

WSGI Python Module

Indices and tables