summaryrefslogtreecommitdiffstats
path: root/docs/infrastructure.txt
diff options
context:
space:
mode:
authorSuren A. Chilingaryan <csa@suren.me>2018-03-20 15:47:51 +0100
committerSuren A. Chilingaryan <csa@suren.me>2018-03-20 15:47:51 +0100
commite2c7b1305ca8495065dcf40fd2092d7c698dd6ea (patch)
treeabcaa7006a9c4b7a9add9bd0bf8c24f7f8ce048f /docs/infrastructure.txt
parent47f350bc3aa85a8bd406d95faf084df2abf74ae9 (diff)
downloadands-e2c7b1305ca8495065dcf40fd2092d7c698dd6ea.tar.gz
ands-e2c7b1305ca8495065dcf40fd2092d7c698dd6ea.tar.bz2
ands-e2c7b1305ca8495065dcf40fd2092d7c698dd6ea.tar.xz
ands-e2c7b1305ca8495065dcf40fd2092d7c698dd6ea.zip
Local volumes and StatefulSet to provision Master/Slave MySQL and Galera cluster
Diffstat (limited to 'docs/infrastructure.txt')
-rw-r--r--docs/infrastructure.txt110
1 files changed, 110 insertions, 0 deletions
diff --git a/docs/infrastructure.txt b/docs/infrastructure.txt
new file mode 100644
index 0000000..dc6a57e
--- /dev/null
+++ b/docs/infrastructure.txt
@@ -0,0 +1,110 @@
+Networks
+========
+ 192.168.11.0/24 (18-port IB switch): Legacy network, non-production systems including storage
+ 192.168.12.0/24 (12-port IB swotch): KATRIN Storage network
+ 192.168.13.0/24 (12-port IB switch): HPC Cloud & Computing network
+ 192.168.26.0/24 (Ethernet): Infrastructure network (OpenShift nodes and everything else)
+ 192.168.16.0/22 External IPs for testing and production
+ 192.168.111.0/24 (OpenVPN): Gateway to Katrin network using Master1 tunnel
+ 192.168.112.0/24 (OpenVPN): Gateway to Katrin network using Master2 tunnel
+
+ 192.168.212.0/24
+ 192.168.213.0/24
+ 192.168.226.0/24 (Ethernet): Staging network (Virtual OpenShift and other nodes)
+ 192.168.216.0/22 External IPs for staging
+ 192.168.221.0/24 (OpenVPN): Gateway to Katrin network using staging Master1 tunnel
+ 192.168.222.0/24 (OpenVPN): Gateway to Katrin network using staging Master2 tunnel
+
+KIT resources
+=============
+ - ipekatrin*.ipe.kit.edu Cluster nodes
+ - ipekatrin[1:2].ipe.kit.edu Master nodes with fixed IPs (one could be dead)
+ + katrin[1:2].ipe.kit.edu Virtual IPs assigned to master nodes (HA)
+ + kaas.kit.edu (katrin.ipe.kit.edu) DNS-based load balancer between katrin[1:2].ipe.kit.edu
+ + *.kaas.kit.edu (*.katrin.ipe.kit.edu) Default application domain?
+ - katrin.kit.edu Apache/mod_proxy pod (In DNS put CN to katrin.ipe.kit.edu)
+
+ + openshift.ipe.kit.edu Gateway (VIPS) to staging cluster (Just one IP migrating between 2 nodes)
+ - *.openshift.ipe.kit.edu Default application domain for staging cluster
+
+Storage
+=======
+ LVM VGs
+ VolGroup00
+ -> LogVol*: System partitions
+ -> docker-pool: Docker storage
+ Katrin
+ -> Heketi PD (we reserve space, but do not configure heketi so far)
+ -> vg_*
+ -> Heketi-managed Gluster Volumes
+ -> Katrin (mounted at '/mnt/ands')
+ -> Space for manually-managed Gluster Bricks
+ -> Storage for Galera / Cassandra / etc.?
+
+ Gluster Volume Types:
+ tmp: disitribute ? Various data which should be preserved, but not critical if lost or temporarily inaccessible (logs, etc.) [ check if we can still write if one brick is gone ]
+ cfg: replica=3 Small and critical data sets (configs, sources, etc.)
+ cache: replica+arbiter Large re-generatable data which anyway should be always available [ potentially we can use disperse to save space ]
+ data: replica+arbiter Very large and critical data
+ db: dispersed A few very large files, like large single-table database (ADEI many tables)
+
+ Scalling storage:
+ cfg: 3 nodes is enough
+ cache/data: [d][d][a] => [da][d ][ad][ d] => [d ][d ][ d][ d][aa] => further increas in pairs, at some point add second arbiter node
+
+ Gluster Volumes:
+ provision cfg /mnt/provision Provisioning volume which is not expected to be mounted in the containers (temporarily may contain secret information, etc.)
+ openshift cfg /mnt/openshift Multi-purpose: Various small size configurations (adei, apache, etc.)
+ temporary tmp /mnt/temporary Multi-purpose: Various logs & temporary files
+ ?adei cfg /mnt/adei/adei
+ adei-db cache /mnt/adei/db
+ adei-tmp tmp /mnt/adei/tmp
+ katrin-mysql data /mnt/katrin/mysql
+ katrin-data cfg /mnt/katrin/archive
+ katrin-kali cache /mnt/katrin/storage
+ katrin-tmp tmp /mnt/katrin/workspace
+
+ OpenShift Volumes:
+ etc cfg/ro openshift Various configurations (ADEI & Apache configs, other stuff in etc.)
+ src cfg/ro openshift Interpreted source files
+ log tmp/rw tmp Suff in /var/log
+ tmp tmp/rw tmp Various temporary files
+ adei-db data/rw adei-db ADEI cache database and a few primary source [ will take ages to regenerate, so we can't consider it as dispensable cache really ]
+ adei-tmp tmp/rw adei-tmp ADEI, Apache, and Cron logs [Techically we have also downloads here which are more cache when tmp... But I think it is fine for now...]
+ adei-cfg cfg/ro adei? ADEI & Apache configs
+ adei-src cfg/ro adei? ADEI sources
+ katrin-mysql cfg/rw katrin-mysql KATRIN Database with configurations, etc.
+ katrin-data data/rw katrin-data KATRIN data archives, all primary raw data from Orca, etc.
+ katrin-kali cache/rw katrin-kali Generated ROOT files [ Can we make this separation? Marco uses hardlinks ]
+ katrin-proc tmp/rw katrin-proc Data processing volume (inbox, etc.)
+
+Services
+========
+ - Keepalived
+ - OpenVPN
+ - Gluster
+ - MySQL Galera (?)
+ - Cassandra (?)
+ - oVirt (?)
+ - OpenShift Master / Node
+ - Heketi
+ - Apache Router
+ - ADEI Services
+ - Apache Spark & etc.
+
+Inventories
+===========
+ - staging & production will be operating in parallel (staging in vagrant and production on bare-metal)
+ - testing is just pre-production tests which will be removed once production is running
+
+Labels
+======
+ - We specify if node is master and provides fat storage for glusterfs
+ - All nodes currently in 'infra' region (for example, student computers will be non-infra nodes; nodes outside of KIT as well)
+ - The servers in cellar are in 'default' zone (if we put something in the 4th floor server room, we would define a new zone there)
+
+Computing
+=========
+ - Define CUDA nodes and OpenCL nodes
+ - Intel Xeon Phi is replaced by new Tesla in the ipepdvcompute2
+ - Gen1 UFO servers does not support "Above 64G decoding" and can't run Xeon Phi. May be we can put it in new Phi server.