summaryrefslogtreecommitdiffstats
path: root/docs/managment.txt
diff options
context:
space:
mode:
authorSuren A. Chilingaryan <csa@suren.me>2018-03-11 19:56:38 +0100
committerSuren A. Chilingaryan <csa@suren.me>2018-03-11 19:56:38 +0100
commitf3c41dd13a0a86382b80d564e9de0d6b06fb1dbf (patch)
tree3522ce77203da92bb2b6f7cfa2b0999bf6cc132c /docs/managment.txt
parent6bc3a3ac71e11fb6459df715536fec373c123a97 (diff)
downloadands-f3c41dd13a0a86382b80d564e9de0d6b06fb1dbf.tar.gz
ands-f3c41dd13a0a86382b80d564e9de0d6b06fb1dbf.tar.bz2
ands-f3c41dd13a0a86382b80d564e9de0d6b06fb1dbf.tar.xz
ands-f3c41dd13a0a86382b80d564e9de0d6b06fb1dbf.zip
Various fixes before moving to hardware installation
Diffstat (limited to 'docs/managment.txt')
-rw-r--r--docs/managment.txt166
1 files changed, 166 insertions, 0 deletions
diff --git a/docs/managment.txt b/docs/managment.txt
new file mode 100644
index 0000000..1eca8a8
--- /dev/null
+++ b/docs/managment.txt
@@ -0,0 +1,166 @@
+DOs and DONTs
+=============
+ Here we discuss things we should do and we should not do!
+
+ - Scaling up cluster is normally problem-less. Both nodes & masters can be added
+ fast and without much troubles afterwards.
+
+ - Upgrade procedure may cause the problems. The main trouble that many pods are
+ configured to use the 'latest' tag. And the latest versions has latest problems (some
+ of the tags can be fixed to actual version, but finding that is broken and why takes
+ a lot of effort)...
+ * Currently, there is problems if 'kube-service-catalog' is updated (see discussion
+ in docs/upgrade.txt). While it seems nothing really changes, the connection between
+ apiserver and etcd breaks down (at least for health checks). The intallation reamins
+ pretty much usable, but not in healthy state. This particular update is blocked by
+ setting.
+ openshift_enable_service_catalog: false
+ Then, it is left in 'Error' state, but can be easily recovered by deteleting and
+ allowing system to re-create a new pod.
+ * However, as cause is unclear, it is possible that something else with break as time
+ passes and new images are released. It is ADVISED to check upgrade in staging first.
+ * During upgrade also other system pods may stuck in Error state (as explained
+ in troubleshooting) and block the flow of upgrade. Just delete them and allow
+ system to re-create to continue.
+ * After upgrade, it is necessary to verify that all pods are operational and
+ restart ones in 'Error' states.
+
+ - Re-running install will break on heketi. And it will DESTROY heketi topology!
+ DON"T DO IT! Instead a separate components can be re-installed.
+ * For instance to reinstall 'openshift-ansible-service-broker' use
+ openshift-install-service-catalog.yml
+ * There is a way to prevent plays from touching heketi, we need to define
+ openshift_storage_glusterfs_is_missing: False
+ openshift_storage_glusterfs_heketi_is_missing: False
+ But I am not sure if it is only major issue.
+
+ - Few administrative tools could cause troubles. Don't run
+ * oc adm diagnostics
+
+
+Failures / Immidiate
+========
+ - We need to remove the failed node from etcd cluster
+ etcdctl3 --endpoints="192.168.213.1:2379" member list
+ etcdctl3 --endpoints="192.168.213.1:2379" member remove <hexid>
+
+ - Further, the following is required on all remaining nodes if the node is forever gone
+ * Delete node
+ oc delete node
+ * Remove it also from /etc/etcd.conf on all nodes ETCD_INITIAL_CLUSTER
+ * Remove failed nodes from 'etcdClinetInfo' section in /etc/origin/master/master-config.yaml
+ systemctl restart origin-master-api.service
+
+Scaling / Recovery
+=======
+ - One important point.
+ * If we lost data on the storage node, it should be re-added with different name (otherwise
+ the GlusterFS recovery would be significantly more complicated)
+ * If Gluster bricks are preserved, we may keep the name. I have not tried, but according to
+ documentation, it should be possible to reconnect it back and synchronize. Still it may be
+ easier to use a new name again to simplify procedure.
+ * Simple OpenShift nodes may be re-added with the same name, no problem.
+
+ - Next we need to perform all prepartion steps (the --limit should not be applied as we normally
+ need to update CentOS on all nodes to synchronize software versions; list all nodes in /etc/hosts
+ files; etc).
+ ./setup.sh -i staging prepare
+
+ - The OpenShift scale is provided as several ansible plays (scale-masters, scale-nodes, scale-etcd).
+ * Running 'masters' will also install configured 'nodes' and 'etcd' daemons
+ * I guess running 'nodes' will also handle 'etcd' daemons, but I have not checked.
+
+Problems
+--------
+ - There should be no problems if a simple node crashed, but things may go wrong if one of the
+ masters is crashed. And things definitively will go wrong if complete cluster will be cut from the power.
+ * Some pods will be stuck polling images. This happens if node running docker-registry have crashed
+ and the persistent storage was not used to back the registry. It can be fixed by re-schedulling build
+ and roling out the latest version from dc.
+ oc -n adei start-build adei
+ oc -n adei rollout latest mysql
+ OpenShift will trigger rollout automatically in some time, but it will take a while. The builds
+ should be done manually it seems.
+ * In case of long outtage some CronJobs will stop execute. The reason is some protection against
+ excive loads and missing defaults. Fix is easy, just setup how much time the OpenShift scheduller
+ allows to CronJob to start before considering it failed:
+ oc -n adei patch cronjob/adei-autogen-update --patch '{ "spec": {"startingDeadlineSeconds": 10 }}'
+
+ - if we forgot to remove old host from etcd cluster, the OpenShift node will be configured, but etcd
+ will not be installed. We need, then, to remove the node as explained above and run scale of etcd
+ cluster.
+ * In multiple ocasions, the etcd daemon has failed after reboot and needed to be resarted manually.
+ If half of the daemons is broken, the 'oc' will block.
+
+
+
+Storage / Recovery
+=======
+ - Furthermore, it is necessary to add glusterfs nodes on a new storage nodes. It is not performed
+ automatically by scale plays. The 'glusterfs' play should be executed with additional options
+ specifying that we are just re-configuring nodes. We can check if all pods are serviced
+ oc -n glusterfs get pods -o wide
+ Both OpenShift and etcd clusters should be in proper state before running this play. Fixing and re-running
+ should be not an issue.
+
+ - More details:
+ https://docs.openshift.com/container-platform/3.7/day_two_guide/host_level_tasks.html
+
+
+Heketi
+------
+ - With heketi things are straighforward, we need to mark node broken. Then heketi will automatically move the
+ bricks to other servers (as he thinks fit).
+ * Accessing heketi
+ heketi-cli -s http://heketi-storage-glusterfs.openshift.suren.me --user admin --secret "$(oc get secret heketi-storage-admin-secret -n glusterfs -o jsonpath='{.data.key}' | base64 -d)"
+ * Gettiing required ids
+ heketi-cli topology info
+ * Removing node
+ heketi-cli node info <failed_node_id>
+ heketi-cli node disable <failed_node_id>
+ heketi-cli node remove <failed_node_id>
+ * Thats it. A few self-healing daemons are running which should bring the volumes in order automatically.
+ * The node will still persist in heketi topology as failed, but will not be used ('node delete' potentially could destroy it, but it is failin)
+
+ - One problem with heketi, it may start volumes before bricks get ready. Consequently, it may run volumes with several bricks offline. It should be
+ checked and fixed by restarting the volumes.
+
+KaaS Volumes
+------------
+ There is two modes.
+ - If we migrated to a new server, we need to migrate bricks (force is required because
+ the source break is dead and data can't be copied)
+ gluster volume replace-brick <volume> <src_brick> <dst_brick> commit force
+ * There is healing daemons running and nothing else has to be done.
+ * There play and scripts available to move all bricks automatically
+
+ - If we kept the name and the data is still there, it should be also relatively easy
+ to perform migration (not checked). We also should have backups of all this data.
+ * Ensure Gluster is not running on the failed node
+ oadm manage-node ipeshift2 --schedulable=false
+ oadm manage-node ipeshift2 --evacuate
+ * Verify the gluster pod is not active. It may be running, but not ready.
+ Could be double checked with 'ps'.
+ oadm manage-node ipeshift2 --list-pods
+ * Get the original Peer UUID of the failed node (by running on healthy node)
+ gluster peer status
+ * And create '/var/lib/glusterd/glusterd.info' similar to the one on the
+ healthy nodes, but with the found UUID.
+ * Copy peers from the healthy nodes to /var/lib/glusterd/peers. We need to
+ copy from 2 nodes as node does not hold peer information on itself.
+ * Create mount points and re-schedule gluster pod. See more details
+ https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts
+ * Start healing
+ gluster volume heal VOLNAME full
+
+ - However, if data is lost, it is quite complecated to recover using the same server name.
+ We should rename the server and use first approach instead.
+
+
+
+Scaling
+=======
+We have currently serveral assumptions which will probably not hold true for larger clusters
+ - Gluster
+ To simplify matters we just reference servers in the storage group manually
+ Arbiter may work for several groups and we should define several brick path in this case