Networks ======== 192.168.11.0/24 (18-port IB switch): Legacy network, non-production systems including storage 192.168.12.0/24 (12-port IB swotch): KATRIN Storage network 192.168.13.0/24 (12-port IB switch): HPC Cloud & Computing network 192.168.26.0/24 (Ethernet): Infrastructure network (OpenShift nodes and everything else) 192.168.16.0/22 External IPs for testing and production 192.168.111.0/24 (OpenVPN): Gateway to Katrin network using Master1 tunnel 192.168.112.0/24 (OpenVPN): Gateway to Katrin network using Master2 tunnel 192.168.212.0/24 192.168.213.0/24 192.168.226.0/24 (Ethernet): Staging network (Virtual OpenShift and other nodes) 192.168.216.0/22 External IPs for staging 192.168.221.0/24 (OpenVPN): Gateway to Katrin network using staging Master1 tunnel 192.168.222.0/24 (OpenVPN): Gateway to Katrin network using staging Master2 tunnel KIT resources ============= - ipekatrin*.ipe.kit.edu Cluster nodes - ipekatrin[1:2].ipe.kit.edu Master nodes with fixed IPs (one could be dead) + katrin[1:2].ipe.kit.edu Virtual IPs assigned to master nodes (HA) + kaas.kit.edu (katrin.ipe.kit.edu) DNS-based load balancer between katrin[1:2].ipe.kit.edu + *.kaas.kit.edu (*.katrin.ipe.kit.edu) Default application domain? - katrin.kit.edu Apache/mod_proxy pod (In DNS put CN to katrin.ipe.kit.edu) + openshift.ipe.kit.edu Gateway (VIPS) to staging cluster (Just one IP migrating between 2 nodes) - *.openshift.ipe.kit.edu Default application domain for staging cluster Storage ======= LVM VGs VolGroup00 -> LogVol*: System partitions -> docker-pool: Docker storage Katrin -> Heketi PD (we reserve space, but do not configure heketi so far) -> vg_* -> Heketi-managed Gluster Volumes -> Katrin (mounted at '/mnt/ands') -> Space for manually-managed Gluster Bricks -> Storage for Galera / Cassandra / etc.? Gluster Volume Types: tmp: disitribute ? Various data which should be preserved, but not critical if lost or temporarily inaccessible (logs, etc.) [ check if we can still write if one brick is gone ] cfg: replica=3 Small and critical data sets (configs, sources, etc.) cache: replica+arbiter Large re-generatable data which anyway should be always available [ potentially we can use disperse to save space ] data: replica+arbiter Very large and critical data db: dispersed A few very large files, like large single-table database (ADEI many tables) Scalling storage: cfg: 3 nodes is enough cache/data: [d][d][a] => [da][d ][ad][ d] => [d ][d ][ d][ d][aa] => further increas in pairs, at some point add second arbiter node Gluster Volumes: provision cfg /mnt/provision Provisioning volume which is not expected to be mounted in the containers (temporarily may contain secret information, etc.) openshift cfg /mnt/openshift Multi-purpose: Various small size configurations (adei, apache, etc.) temporary tmp /mnt/temporary Multi-purpose: Various logs & temporary files ?adei cfg /mnt/adei/adei adei-db cache /mnt/adei/db adei-tmp tmp /mnt/adei/tmp katrin-mysql data /mnt/katrin/mysql katrin-data cfg /mnt/katrin/archive katrin-kali cache /mnt/katrin/storage katrin-tmp tmp /mnt/katrin/workspace OpenShift Volumes: etc cfg/ro openshift Various configurations (ADEI & Apache configs, other stuff in etc.) src cfg/ro openshift Interpreted source files log tmp/rw tmp Suff in /var/log tmp tmp/rw tmp Various temporary files adei-db data/rw adei-db ADEI cache database and a few primary source [ will take ages to regenerate, so we can't consider it as dispensable cache really ] adei-tmp tmp/rw adei-tmp ADEI, Apache, and Cron logs [Techically we have also downloads here which are more cache when tmp... But I think it is fine for now...] adei-cfg cfg/ro adei? ADEI & Apache configs adei-src cfg/ro adei? ADEI sources katrin-mysql cfg/rw katrin-mysql KATRIN Database with configurations, etc. katrin-data data/rw katrin-data KATRIN data archives, all primary raw data from Orca, etc. katrin-kali cache/rw katrin-kali Generated ROOT files [ Can we make this separation? Marco uses hardlinks ] katrin-proc tmp/rw katrin-proc Data processing volume (inbox, etc.) Services ======== - Keepalived - OpenVPN - Gluster - MySQL Galera (?) - Cassandra (?) - oVirt (?) - OpenShift Master / Node - Heketi - Apache Router - ADEI Services - Apache Spark & etc. Inventories =========== - staging & production will be operating in parallel (staging in vagrant and production on bare-metal) - testing is just pre-production tests which will be removed once production is running Labels ====== - We specify if node is master and provides fat storage for glusterfs - All nodes currently in 'infra' region (for example, student computers will be non-infra nodes; nodes outside of KIT as well) - The servers in cellar are in 'default' zone (if we put something in the 4th floor server room, we would define a new zone there) Computing ========= - Define CUDA nodes and OpenCL nodes - Intel Xeon Phi is replaced by new Tesla in the ipepdvcompute2 - Gen1 UFO servers does not support "Above 64G decoding" and can't run Xeon Phi. May be we can put it in new Phi server.