# Steps to setup KDB infrastructure in OpenShift Web interface: https://kaas.kit.edu:8443/console/ Commandline interface: ``` oc login kaas.kit.edu:8443 oc project katrin ``` ## Overview The setup uses (at least) three containers: * `kdb-backend` is a MySQL/MariaDB container that provides the database backend used by KDB server. It hosts the `katrin` and `katrin_run` databases. * `kdb-server` runs the KDB server process inside an Apache environment. It provides the web interface (`kdb-admin.fcgi`) and the KaLi service (`kdb-kali.fcgi`). * `run-processing` periodically retrieves run files from several DAQ machines and adds the processed files to the KDB runlist. This process could be distributed over several containers for the individual systems (`fpd` etc.) > The ADEI server hosting the `adei` MySQL database runs in an independent project with hostname `mysql.adei.svc`. A persistent storage volume is needed for the MySQL data (volume group `db`) and for the copied/processed run files (volume group `katrin`). The latter one is shared between the KDB server and run processing applications. ## MySQL backend ### Application This container is based on the official Redhat MariaDB Docker image. The OpenShift application is created via the CLI: ``` oc new-app -e MYSQL_ROOT_PASSWORD=XXX --name=kdb-backend registry.access.redhat.com/rhscl/mariadb-101-rhel7 ``` Because KDB uses two databases (`katrin`, `katrin_run`) and must be permitted to create/edit database users, it is required to define a root password here. ### Volumes This container needs a persistent storage volume for the database content. In OpenShift this is done by removing the default storage and adding a persistent volume `kdb-backend` for MySQL data: `db: /kdb/mysql/data -> /var/lib/mysql/data` ### Final steps It makes sense to add readiness/liveness probes as well: TCP socket, port 3306. > It is possible to access the MySQL server inside a container: `mysql -h kdb-backend.katrin.svc -u root -p -A` ## KDB server ### Application The container is created from a `Dockerfile` available in GitLab: https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/kdbserver The app is created via the CLI, but manual changes are necessary later on: ``` oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=kdb-server ``` > The build fails because the branch name and user credentials are not defined. The build settings must be adapted before the image can be created. * Set the git branch name to `kdbserver`. * Add a source secret `katrin-gitlab` that provides the git user credentials, i.e. the `katrin` username and corresponding password for read-only access. When a container instance (pod) is created in OpenShift, the main script `/run-httpd.sh` starts the Apache webserver with the KDB fastcgi module. ### Volumes Just like the MySQL backend, the container needs persistent storage enabled: `katrin: /data -> /mnt/katrin/data` ### Config Maps Some default configuration files for the Apache web server and the KDB server installation are provided with the Dockerfile. The webserver config should work correctly as it is. The main config must be updated so that the correct servers/databases are used. A config map `kdbserver-config` is created with mountpoint `/config` in the container: * `kdbserver.conf` is the main config for the KDB server instance. For the steps outlined here, it should contain the following entries: ``` sql_server = kdb-backend.katrin.svc sql_adei_server = mysql.adei.svc sql_katrin_dbname = katrin sql_run_dbname = katrin_run sql_adei_dbname = adei_katrin sql_user = root sql_password = XXX sql_adei_user = katrin sql_adei_password = XXX use_adei_cache = true adei_service_url = http://adei-katrin.kaas.kit.edu/adei adei_public_url = http://katrin.kit.edu/adei-katrin ``` * `log4cxx.properties` defines the terminal/logfile output settings. By default, all log output is shown on `stdout` (and visible in the OpenShift log). > Files in `/config` are symlinked to the respective files inside the container by `/run-httpd.sh`. ### Database setup The KDB server sources provide a SQL dump file to initialize the database. To create an empty database with all necessary tables, run the `mysql` command: ``` mysql -h kdb-backend.katrin.svc -u root -p < /src/kdbserver/Data/katrin-db.sql ``` Alternatively, a full backup of the existing database can be imported: ``` tar -xJf /src/kdbserver/Data/katrin-db-bkp.sql.xz -C /tmp mysql -h kdb-backend.katrin.svc -u root -p < /tmp/katrin-db-bkp.sql ``` > To clean a database table, execute a MySQL `drop table` statement and re-initialize the dropped tables from the `katrin-db.sql` file. ### IDLE storage IDLE provides a local storage on the server-side file system. An empty IDLE repository with default datasets is created by executing this command: ``` /opt/kasper/bin/idle SetupPublicDatasets ``` This creates a directory `.../storage/idle/KatrinIdle` on the storage volume that can be filled with contents from a backup archive. The `oc rsync` command allows to transfer files to a running container (pod) in OpenShift. > After restoring one should fix all permissions so that KDB can access the data. ### Final steps Again a readiness/liveness probe can be added: TCP socket, port 80. To make the KDB server interface accessible to the outside, a route must be added in OpenShift: `http://kdb.kaas.kit.edu -> kdb-server:80` > The web interface is now available at http://kdb.kaas.kit.edu/kdb-admin.fcgi ## Run processing ### Application The setup for the run processing service is similar to the KDB server, with the container being created from a GitLab `Dockerfile` as well: https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/inlineprocessing The app is created via the CLI, but manual changes are necessary later on: ``` oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=run-processing ``` > The build fails because the branch name and user credentials are not defined. The build settings must be adapted before the image can be created. * Set the git branch name to `inlineprocessing`. * Use the source secret `katrin-gitlab` that was created before. #### Run environment When a container instance (pod) is created in OpenShift, the main script `/run-loop.sh` starts the main processing script `process-system.py`. It is executed in a continuous loop with a user-defined delay. The script is configured by the following environment variables that can be defined in the OpenShift configuration: * `PROCESS_SYSTEMS` defines one or more DAQ systems configured in the file `ProcessingConfig.py`: `fpd`, `mos`, etc. * `PROCESS_FLAGS` defines additional options passed to the script, e.g. `--pull` to automatically retrieve run files from configured DAQ machines. * `REFRESH_INTERVAL` defines the waiting time between consecutive executions. Note that the `/run-loop.sh` script waits until `process-system.py` finished before the next loop iteration is started, so the delay time is always included regardless of how long the script takes to process all files. ### Volumes The run processing stores files that need to be accessible by the KDB server application. Hence, the same persistent volume is used in this container: `katrin: data -> /mnt/katrin/data` To ensure that all processes can read/write correctly, the file permissions are relaxed (this can be done in an OpenShift terminal or remote shell): ``` mkdir -p /mnt/katrin/data/{inbox,archive,storage,workspace,logs,tmp} chown -R katrin: /mnt/katrin/data chmod -R ug+rw /mnt/katrin/data ``` ### Config Maps Just like with the KDB server, a config map `run-processing-config` with mountpoint `/config` should be added, which defines the configuration of the processing script: * `ProcessingConfig.py` is the main config where the DAQ machines are defined with their respective storage paths. The file also defines a list of processing steps to be executed for each run file; these steps may have to be adapted where necessary. * `datamanager.cfg` defines the interface to the KaLi web service. It must be configured so that the KDB server instance from above is used: ``` url = http://kdb-server.katrin.svc/kdb-kali.fcgi user = katrin password = XXX timeout_seconds = 300 cache_age_hours = -1 ``` * `rsync-filter` is applied with the `rsync` command that copies run files from the DAQ machines. It can be adapted to exclude certain directories, e.g. old run files that do not need to be processed. * `log4cxx.properties` configures terminal/logfile output, see above. > Files in `/config` are symlinked to the respective files inside the container by `/run-loop.sh`. #### SSH keys A second config map `run-processing-ssh` is required to provide SSH keys that are used to authenticate remote connections to the DAQ machines. The map with mountpoint `/.ssh` should contain the files `id_dsa`, `id_dsa.pub` and `known_hosts` and must be adapted as necessary. > This assumes that the SSH credentials have been added to the respective machines beforehand! > The contents of `known_hosts` should be updated with the output of `ssh-keyscan` for the configured DAQ machines. ### Notes The script `/run-loop.sh` pulls files from the DAQ machines and processes them automatically, newest first. Where necessary, run files can be copied manually (FPD example; adapt the options and `rsync-filter` file as required): ``` rsync -rltD --verbose --append-verify --partial --stats --compare-dest=/mnt/katrin/data/archive/FPDComm_530 --filter='. /opt/processing/system/rsync-filter' --log-file='/mnt/katrin/data/logs/rsync_fpd.log' katrin@192.168.110.76:/Volumes/DAQSTORAGE/data/ /mnt/katrin/data/inbox/FPDComm_530 ``` If runs were not processed correctly, one can trigger manual reprocessing from an OpenShift terminal (with run numbers `START`, `END` as necessary): ``` ./process-system.py -s fpd -r START END ```