From 2c3f1522274c09f7cfdb6309adc0719f05c188e9 Mon Sep 17 00:00:00 2001 From: "Suren A. Chilingaryan" Date: Thu, 5 Jul 2018 06:29:09 +0200 Subject: Update monitoring scripts to track leftover OpenVSwitch 'veth' interfaces and clean them up pereodically to avoid performance degradation, split kickstart --- docs/consistency.txt | 12 ++- docs/kickstart.txt | 12 ++- docs/logs.txt | 36 +++++++ docs/problems.txt | 103 ++++++++++++++++++ docs/projects/katrindb.txt | 255 +++++++++++++++++++++++++++++++++++++++++++++ docs/troubleshooting.txt | 18 ++++ 6 files changed, 434 insertions(+), 2 deletions(-) create mode 100644 docs/logs.txt create mode 100644 docs/problems.txt create mode 100644 docs/projects/katrindb.txt (limited to 'docs') diff --git a/docs/consistency.txt b/docs/consistency.txt index caaaf36..dcf311a 100644 --- a/docs/consistency.txt +++ b/docs/consistency.txt @@ -39,7 +39,17 @@ Networking - Ensure, we don't have override of cluster_name to first master (which we do during the provisioning of OpenShift plays) - + + - Sometimes OpenShift fails to clean-up after terminated pod properly. This causes rogue + network interfaces to remain in OpenVSwitch fabric. This can be determined by errors like: + could not open network device vethb9de241f (No such device) + reported by 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log' + which may quickly grow over 100MB quickly. If number of rogue interfaces grows too much, + the pod scheduling will start time-out on the affected node. + * The work-around is to delete rogue interfaces with + ovs-vsctl del-port br0 + This does not solve the problem, however. The new interfaces will get abandoned by OpenShift. + ADEI ==== diff --git a/docs/kickstart.txt b/docs/kickstart.txt index 1331542..b94b0f6 100644 --- a/docs/kickstart.txt +++ b/docs/kickstart.txt @@ -11,4 +11,14 @@ Troubleshooting dmsetup remove_all dmsetup remove - \ No newline at end of file + - Sometimes even this does not help. + > On CentOS 7.4 mdadm does not recognize the disk, but LVM thinks it is + part of MD. Then cleaning last megabytes of the former md partition may help. + > On Fedora 28, mdadm detects the old array and tries to "tear down" it down, but + fails as raid array is already innactive. + + * If raid is still more-or-less healthy. It can be destroyed with + mdadm --zero-superblock /dev/sdb3 + * Otherwise: + dd if=/dev/zero of=/dev/sda4 bs=512 seek=$(( $(blockdev --getsz /dev/sda4) - 1024 )) count=1024 + diff --git a/docs/logs.txt b/docs/logs.txt new file mode 100644 index 0000000..e27b1ff --- /dev/null +++ b/docs/logs.txt @@ -0,0 +1,36 @@ +/var/log/messages +================= + - Various RPC errors. + ... rpc error: code = # desc = xxx ... + + - container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process" + Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers + problem with rogue interfaces staying on the OpenVSwitch bridge. + + - containerd: unable to save f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a starttime: read /proc/81994/stat: no such process + containerd: f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a (pid 81994) has become an orphan, killing it + Seems a bug in docker 1.12* which is resolved in 1.13.0rc2. No side effects according to the issue. + https://github.com/moby/moby/issues/28336 + + - W0625 03:49:34.231471 36511 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": Unexpected command output nsenter: cannot open /proc/63586/ns/net: No such file or directory + - W0630 21:40:20.978177 5552 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "..." + Probably refered by the following bug report and accordingly can be ignored... + https://bugzilla.redhat.com/show_bug.cgi?id=1434950 + + - E0630 14:05:40.304042 5552 glusterfs.go:148] glusterfs: failed to get endpoints adei-cfg[an empty namespace may not be set when a resource name is provided] + E0630 14:05:40.304062 5552 reconciler.go:367] Could not construct volume information: MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/4 + I guess some configuration issue.... Probably can be ignored... + + - kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) + There are no adverse effects to this. It is a potential kernel issue, but should be just ignored by the customer. Nothing is going to break. + https://bugzilla.redhat.com/show_bug.cgi?id=1425278 + + + - E0625 03:59:52.438970 23953 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted + seems fine and can be ignored. + + +/var/log/openvswitch/ovs-vswitchd.log +===================================== + - bridge|WARN|could not open network device veth7d33a20f (No such device) + Indicates cleanup pod-cleanup failure and may cause problems during pod-scheduling. diff --git a/docs/problems.txt b/docs/problems.txt new file mode 100644 index 0000000..4be9dc7 --- /dev/null +++ b/docs/problems.txt @@ -0,0 +1,103 @@ +Actions Required +================ + * Long-term solution to 'rogue' interfaces is unclear. May require update to OpenShift 3.9 or later. + However, proposed work-around should do unless execution rate grows significantly. + * All other problems found in logs can be ignored. + + +Rogue network interfaces on OpenVSwitch bridge +============================================== + Sometimes OpenShift fails to clean-up after terminated pod properly. The actual reason is unclear. + * The issue is discussed here: + https://bugzilla.redhat.com/show_bug.cgi?id=1518684 + * And can be determined by looking into: + ovs-vsctl show + + Problems: + * As number of rogue interfaces grow, it start to have impact on performance. Operations with + ovs slows down and at some point the pods schedulled to the affected node fail to start due to + timeouts. This is indicated in 'oc describe' as: 'failed to create pod sandbox' + + Cause: + * Unclear, but it seems periodic ADEI cron jobs causes the issue. + * Could be related to 'container kill failed' problem explained in the section bellow. + Cannot kill container ###: rpc error: code = 2 desc = no such process + + + Solutions: + * According to RedHat the temporal solution is to reboot affected node (not tested yet). The problem + should go away, but may re-apper after a while. + * The simplest work-around is to just remove rogue interface. They will be re-created, but performance + problems only starts after hundreds accumulate. + ovs-vsctl del-port br0 + + Status: + * Cron job is installed which cleans rogue interfaces as they number hits 25. + + +Orphaning / pod termination problems in the logs +================================================ + There is several classes of problems reported with unknow reprecursions in the system log. Currently, I + don't see any negative side effects except some of these issues may trigger "rogue interfaces" problem. + + ! container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process" + + Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers + problem with rogue interfaces staying on the OpenVSwitch bridge. + + Scenario: + * happens with short-living containers + + - containerd: unable to save f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a starttime: read /proc/81994/stat: no such process + containerd: f7c3e6c02cdbb951670bc7ff925ddd7efd75a3bb5ed60669d4b182e5337dec23:d5b9394468235f7c9caca8ad4d97e7064cc49cd59cadd155eceae84545dc472a (pid 81994) has become an orphan, killing it + + Scenario: + This happens every couple of minutes and attributed to perfectely alive and running pods. + * For instance, ipekatrin1 was complaining some ADEI pod. + * After I removed this pod, it immidiately started complaining on 'glusterfs' replica. + * If 'glusterfs' pod re-created, the problem persist. + * It seems only a single pod is affected at each given moment (at least this was always true + on ipekatrin1 & ipekatrin2 while I was researching the problem) + + Relations: + * This problem is not aligned with the next 'container not found' problem. One happens with short-living containers which + actually get destroyed. This one is triggered for persistent container which keep going. And in fact this problem is triggered + significantly more frequently. + + Cause: + * Seems related to docker health checks due to a bug in docker 1.12* which is resolved in 1.13.0rc2 + https://github.com/moby/moby/issues/28336 + + Problems: + * It seems only extensive logging, according to the discussion in the issue + + Solution: Ignore for now + * docker-1.13 had some problems with groups (I don't remember exactly) and it was decided to not run it with current version of KaaS. + * Only update docker after extensive testing on the development cluster or not at all. + + - W0625 03:49:34.231471 36511 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": Unexpected command output nsenter: cannot open /proc/63586/ns/net: No such file or directory + - W0630 21:40:20.978177 5552 docker_sandbox.go:337] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "...": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "..." + Scenario: + * It seems can be ignored, see RH bug. + * Happens with short-living containers (adei cron jobs) + + Relations: + * This is also not aligned with 'container not found'. The time in logs differ significantly. + * It is also not aligned with 'orphan' problem. + + Cause: + ? https://bugzilla.redhat.com/show_bug.cgi?id=1434950 + + - E0630 14:05:40.304042 5552 glusterfs.go:148] glusterfs: failed to get endpoints adei-cfg[an empty namespace may not be set when a resource name is provided] + E0630 14:05:40.304062 5552 reconciler.go:367] Could not construct volume information: MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/4 + + I guess some configuration issue.... Probably can be ignored... + + Scenario: + * Reported on long running pods with persistent volumes (katrin, adai-db) + * Also seems an unrelated set of the problems. + + + + + diff --git a/docs/projects/katrindb.txt b/docs/projects/katrindb.txt new file mode 100644 index 0000000..0a14a25 --- /dev/null +++ b/docs/projects/katrindb.txt @@ -0,0 +1,255 @@ +# Steps to setup KDB infrastructure in OpenShift + +Web interface: https://kaas.kit.edu:8443/console/ + +Commandline interface: +``` +oc login kaas.kit.edu:8443 +oc project katrin +``` + + +## Overview + +The setup uses (at least) three containers: +* `kdb-backend` is a MySQL/MariaDB container that provides the database backend + used by KDB server. It hosts the `katrin` and `katrin_run` databases. +* `kdb-server` runs the KDB server process inside an Apache environment. It + provides the web interface (`kdb-admin.fcgi`) and the KaLi service + (`kdb-kali.fcgi`). +* `run-processing` periodically retrieves run files from several DAQ machines + and adds the processed files to the KDB runlist. This process could be + distributed over several containers for the individual systems (`fpd` etc.) + +> The ADEI server hosting the `adei` MySQL database runs in an independent project with hostname `mysql.adei.svc`. + +A persistent storage volume is needed for the MySQL data (volume group `db`) +and for the copied/processed run files (volume group `katrin`). The latter one +is shared between the KDB server and run processing applications. + + +## MySQL backend + +### Application + +This container is based on the official Redhat MariaDB Docker image. The +OpenShift application is created via the CLI: +``` +oc new-app -e MYSQL_ROOT_PASSWORD=XXX --name=kdb-backend registry.access.redhat.com/rhscl/mariadb-101-rhel7 +``` +Because KDB uses two databases (`katrin`, `katrin_run`) and must be permitted +to create/edit database users, it is required to define a root password here. + +### Volumes + +This container needs a persistent storage volume for the database content. In +OpenShift this is done by removing the default storage and adding a persistent +volume `kdb-backend` for MySQL data: `db: /kdb/mysql/data -> /var/lib/mysql/data` + +### Final steps + +It makes sense to add readiness/liveness probes as well: TCP socket, port 3306. + +> It is possible to access the MySQL server inside a container: `mysql -h kdb-backend.katrin.svc -u root -p -A` + + +## KDB server + +### Application + +The container is created from a `Dockerfile` available in GitLab: +https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/kdbserver + +The app is created via the CLI, but manual changes are necessary later on: +``` +oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=kdb-server +``` + +> The build fails because the branch name and user credentials are not defined. + +The build settings must be adapted before the image can be created. +* Set the git branch name to `kdbserver`. +* Add a source secret `katrin-gitlab` that provides the git user credentials, + i.e. the `katrin` username and corresponding password for read-only access. + +When a container instance (pod) is created in OpenShift, the main script +`/run-httpd.sh` starts the Apache webserver with the KDB fastcgi module. + +### Volumes + +Just like the MySQL backend, the container needs persistent storage enabled: `katrin: /data -> /mnt/katrin/data` + +### Config Maps + +Some default configuration files for the Apache web server and the KDB server +installation are provided with the Dockerfile. The webserver config should +work correctly as it is. The main config must be updated so that the correct +servers/databases are used. A config map `kdbserver-config` is created with +mountpoint `/config` in the container: +* `kdbserver.conf` is the main config for the KDB server instance. For the + steps outlined here, it should contain the following entries: + +``` +sql_server = kdb-backend.katrin.svc +sql_adei_server = mysql.adei.svc + +sql_katrin_dbname = katrin +sql_run_dbname = katrin_run +sql_adei_dbname = adei_katrin + +sql_user = root +sql_password = XXX +sql_adei_user = katrin +sql_adei_password = XXX + +use_adei_cache = true +adei_service_url = http://adei-katrin.kaas.kit.edu/adei +adei_public_url = http://katrin.kit.edu/adei-katrin +``` +* `log4cxx.properties` defines the terminal/logfile output settings. By default, + all log output is shown on `stdout` (and visible in the OpenShift log). + +> Files in `/config` are symlinked to the respective files inside the container by `/run-httpd.sh`. + +### Database setup + +The KDB server sources provide a SQL dump file to initialize the database. To +create an empty database with all necessary tables, run the `mysql` command: +``` +mysql -h kdb-backend.katrin.svc -u root -p < /src/kdbserver/Data/katrin-db.sql +``` + +Alternatively, a full backup of the existing database can be imported: +``` +tar -xJf /src/kdbserver/Data/katrin-db-bkp.sql.xz -C /tmp +mysql -h kdb-backend.katrin.svc -u root -p < /tmp/katrin-db-bkp.sql +``` + +> To clean a database table, execute a MySQL `drop table` statement and re-initialize the dropped tables from the `katrin-db.sql` file. + +### IDLE storage + +IDLE provides a local storage on the server-side file system. An empty IDLE +repository with default datasets is created by executing this command: +``` +/opt/kasper/bin/idle SetupPublicDatasets +``` + +This creates a directory `.../storage/idle/KatrinIdle` on the storage volume +that can be filled with contents from a backup archive. The `oc rsync` command +allows to transfer files to a running container (pod) in OpenShift. + +> After restoring one should fix all permissions so that KDB can access the data. + + + +### Final steps + +Again a readiness/liveness probe can be added: TCP socket, port 80. + +To make the KDB server interface accessible to the outside, a route must be +added in OpenShift: `http://kdb.kaas.kit.edu -> kdb-server:80` + +> The web interface is now available at http://kdb.kaas.kit.edu/kdb-admin.fcgi + + +## Run processing + +### Application + +The setup for the run processing service is similar to the KDB server, with +the container being created from a GitLab `Dockerfile` as well: +https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles/tree/inlineprocessing +The app is created via the CLI, but manual changes are necessary later on: +``` +oc new-app https://nuserv.uni-muenster.de:8443/katrin-git/Dockerfiles.git --name=run-processing +``` + +> The build fails because the branch name and user credentials are not defined. + +The build settings must be adapted before the image can be created. +* Set the git branch name to `inlineprocessing`. +* Use the source secret `katrin-gitlab` that was created before. + +#### Run environment + +When a container instance (pod) is created in OpenShift, the main script +`/run-loop.sh` starts the main processing script `process-system.py`. It +is executed in a continuous loop with a user-defined delay. The script +is configured by the following environment variables that can be defined +in the OpenShift configuration: +* `PROCESS_SYSTEMS` defines one or more DAQ systems configured in the file + `ProcessingConfig.py`: `fpd`, `mos`, etc. +* `PROCESS_FLAGS` defines additional options passed to the script, e.g. + `--pull` to automatically retrieve run files from configured DAQ machines. +* `REFRESH_INTERVAL` defines the waiting time between consecutive executions. + Note that the `/run-loop.sh` script waits until `process-system.py` finished + before the next loop iteration is started, so the delay time is always + included regardless of how long the script takes to process all files. + +### Volumes + +The run processing stores files that need to be accessible by the KDB server +application. Hence, the same persistent volume is used in this container: +`katrin: data -> /mnt/katrin/data` + +To ensure that all processes can read/write correctly, the file permissions are +relaxed (this can be done in an OpenShift terminal or remote shell): +``` +mkdir -p /mnt/katrin/data/{inbox,archive,storage,workspace,logs,tmp} +chown -R katrin: /mnt/katrin/data +chmod -R ug+rw /mnt/katrin/data +``` + +### Config Maps + +Just like with the KDB server, a config map `run-processing-config` with +mountpoint `/config` should be added, which defines the configuration of the +processing script: +* `ProcessingConfig.py` is the main config where the DAQ machines are defined + with their respective storage paths. The file also defines a list of + processing steps to be executed for each run file; these steps may have + to be adapted where necessary. +* `datamanager.cfg` defines the interface to the KaLi web service. It must be + configured so that the KDB server instance from above is used: + +``` +url = http://kdb-server.katrin.svc/kdb-kali.fcgi +user = katrin +password = XXX +timeout_seconds = 300 +cache_age_hours = -1 +``` +* `rsync-filter` is applied with the `rsync` command that copies run files + from the DAQ machines. It can be adapted to exclude certain directories, + e.g. old run files that do not need to be processed. +* `log4cxx.properties` configures terminal/logfile output, see above. + +> Files in `/config` are symlinked to the respective files inside the container by `/run-loop.sh`. + +#### SSH keys + +A second config map `run-processing-ssh` is required to provide SSH keys that +are used to authenticate remote connections to the DAQ machines. The map with +mountpoint `/.ssh` should contain the files `id_dsa`, `id_dsa.pub` and +`known_hosts` and must be adapted as necessary. + +> This assumes that the SSH credentials have been added to the respective machines beforehand! + +> The contents of `known_hosts` should be updated with the output of `ssh-keyscan` for the configured DAQ machines. + +### Notes + +The script `/run-loop.sh` pulls files from the DAQ machines and processes +them automatically, newest first. Where necessary, run files can be copied +manually (FPD example; adapt the options and `rsync-filter` file as required): +``` +rsync -rltD --verbose --append-verify --partial --stats --compare-dest=/mnt/katrin/data/archive/FPDComm_530 --filter='. /opt/processing/system/rsync-filter' --log-file='/mnt/katrin/data/logs/rsync_fpd.log' katrin@192.168.110.76:/Volumes/DAQSTORAGE/data/ /mnt/katrin/data/inbox/FPDComm_530 +``` + +If runs were not processed correctly, one can trigger manual reprocessing +from an OpenShift terminal (with run numbers `START`, `END` as necessary): +``` +./process-system.py -s fpd -r START END +``` + diff --git a/docs/troubleshooting.txt b/docs/troubleshooting.txt index ae43c52..9fa6f91 100644 --- a/docs/troubleshooting.txt +++ b/docs/troubleshooting.txt @@ -134,6 +134,22 @@ etcd (and general operability) pods (failed pods, rogue namespaces, etc...) ==== + - The 'pods' scheduling may fail on one (or more) of the nodes after long waiting with 'oc logs' reporting + timeout. The 'oc describe' reports 'failed to create pod sandbox'. This can be caused by failure to clean-up + after terminated pod properly. It causes rogue network interfaces to remain in OpenVSwitch fabric. + * This can be determined by errors reported using 'ovs-vsctl show' or present in the log '/var/log/openvswitch/ovs-vswitchd.log' + which may quickly grow over 100MB quickly. + could not open network device vethb9de241f (No such device) + * The work-around is to delete rogue interfaces with + ovs-vsctl del-port br0 + More info: + ovs-ofctl -O OpenFlow13 show br0 + ovs-ofctl -O OpenFlow13 dump-flows br0 + This does not solve the problem, however. The new interfaces will get abandoned by OpenShift. + * The issue is discussed here: + https://bugzilla.redhat.com/show_bug.cgi?id=1518684 + https://bugzilla.redhat.com/show_bug.cgi?id=1518912 + - After crashes / upgrades some pods may end up in 'Error' state. This is quite often happen to * kube-service-catalog/controller-manager * openshift-template-service-broker/api-server @@ -185,6 +201,8 @@ pods (failed pods, rogue namespaces, etc...) docker ps -aq --no-trunc | xargs docker rm + + Builds ====== - After changing storage for integrated docker registry, it may refuse builds with HTTP error 500. It is necessary -- cgit v1.2.1