/docs/MyDocs : contents of Administration/Server/Orchestration/orchestration.txt at revision 28

: (revision 28)

To get this branch, use:

bzr branch
http://darksoft.org/webbzr/docs/MyDocs

Batch
=====
- Slurm:

Cloud
=====
Docker swarm is a simple solution well integrated in docker infrastructure, but does not cater
for node failures currently. The Kubernetes seems optimal for container scheduling. The Mesos
is a more universal tool and is well integrated with other Apache infrastructures like Hadoop,
Spark, and Storm.

Requirements
------------
- Managing groups of services
- Node failover management and high availability
- Automatic scallability management
- Automatic updates
- Proper schedulling of heavy ocasional tasks like data mining

Docker Swarm
------------
Swarm VM is running on a single node of docker cluster and distributes all tasks
schedulled locally with standard docker interface. Other nodes, just run docker
daemon.
+ Directly works docker compose
+ Supports multiple schedulling policies:
fewest containers per node, most containers per node, start specific images
on the marked nodes, start specific images on the same node.
- Only runs containers as scheduled. Do not have logic to auto-start more copies
of specific containers under high load. Also will not restart containers which are
stopped or crashed.
- Does not provide a special support for automatic updates

The HA can be implemented based on etcd, Consul, or ZooKeeper which will handle fail-over
to backup manager.

CoreOS EtcD/Fleet
------
The CoreOS is distributed operating system built on top of SystemD and containers (and it is not
running on top of CentOS/Ubuntu/etc.) It provides number of services used by other schedulers
directly or re-implemented as concept.
- rkt: Rocket containers (an alternative to Docker)
- etcd: distributed key-value storage
- fleet: distributed SystemD unit scheduling service for rkt containers

The Fleet is build on top of SystemD and allows to execute SystemD unit files across the
nodes of cluster. Each node runs engine and agent, but only a single engine is active.
Engine accepts Unit files and schedules them on the least loaded agent. The unit file
is normally simply runs the container.
- The supports vairous hints and constraints. For instance, the units can be schedulled
global (and will run on all machines) or on a single machine. It is possible to schedule
multiple units together, etc.
- Socket activation is supported, i.e. container can be schedulled upon connection on a
given port.
- The architecture is fault-tolerant. The services from crashed node will be re-scheduled
on other nodes. The etcd is used to store status of the cluster and units.

It is positioned as low-level cluster engine and expected to be a part
of higher level solution, like Kubernetes.

Google Kubernetes
-----------------
Kubernets is based on etcd (distributed key-value storage) and provides Master sever
and multiple working nodes. The HA mode with multiple server are solely based on
etcd. The working nodes (minions) should just run docker and kubelet services.
The master server runs number of control services:
- etcd service
- API Services alowing management of cluster
- Scheduler taking care for resource management
- Controller Manager allowing automatic replication of the services, etc.

The basic scheduling unit in Kubernetes is a pod, a collection of co-located containers
forming a service. The pods are conceptually similar to the Docker Composer and use somehow
similar script to describe relations. Pods can define common storage and are will integrated
with Gluster, NFS, iSCSI, and number of Cloud storages. The most important n unique IP is
assigned to each pod in the cluster which can be used to access it independent of current
placement.

Also, Kubernetes allows tasks to be more precise about scheduling perference specifying
placement requirements (predicates) and preferences (priority). Basically, this allows
both manual and automatic scheduling and any hybrid vairant in between:
- predicate: mandatory requirement (run on specific node, amount of memory, etc.)
- priority: preferred, but not mandatory, requirement

The main difference to Docker Swarm is Controlers managed Controller Manager which can
automatize schedulling of replicas, etc.
- Replication: Will ensure that given number of pod replicas is running in the cluster
- DaemonSet: Will ensure that a single instance of pod is running at each cluster node
- Job Contoller: The batch joba

Also, the automatic updates are suported.

Apache Mesos
------------
Mesos is a more general purpose tool not limited to containers, but is able to schedule
standard applications on hosts. It tries to solve isolation problem differently by allowing
to asign for specific task a part of general pool of the servers. This is achieved with 2
level schedulling. Mesos master just distributes cluster resources between registered frameworks
which perform task schedulling withing allocated budget. Itt could easily switch resources away
from framework1 (for example, doing big-data analysis) and allocate them to framework2 (for example,
a web server), if there is heavy network traffic. There are number of frameworks:
- Chronos: A cron replacement which automatically starts and stops services.
- Marathon: Provides API for starting and stopping services and Chronos can be one of them
- Apache Aurora: Apache scheduler aimed on fault tolerant and long-running services taking
care for restarting, migration, etc.
- Task specific frameworks: MPI, Hadoop/Spark/Storm, Cassandra, ElasticSearch, Jenkins, etc.
- More: Singularity, Torque,

HA is supported. There is always a single active Master and a number of StandBy master servers. The
ZooKeeper is running on all master nodes and elects the active Master.

Mesos is known to scale well on really large clusters, but may be overcomplicated for a small cluster
we run at IPE.