3
- Apache Yarn: Resource manager. Part of Hadoop. Optionally can be used by Apache Spark as well.
4
- Kafka: Distributed messenger
5
- Storm: UFO for clusters. Works on tuples proceessed by acycling graph of filters which define spout (input)
6
and bolts (processing + output). It offers a strong guarantee that every tuple will be processed. Storm
7
defaults to an “at least once” guarantee for messages, but offers the ability to implement “exactly once”
8
processing as well. The filters can be written in any language.
9
- Spark: Generalized solution which can be configured for Hadoop and Storm workloads. Runs on top of the Apache
10
Yarn or Mesos. Provides adapters for working data stored in numerous disparate sources, including HDFS files,
11
Cassandra, HBase, and S3. Architecture is centered around RRD (Resilient Distributed Dataset) - a read-only
12
multiset of data items distributed over a cluster of machines. In contrast to Hadoop/MapReduce can use kind
13
of shared memory to store/read results effectively significantly speeding up various iterative workloads
14
accessing the data multiple times. Supports Java, Scala, Python, and R only. The simple app would look like:
16
2. Split each file into a list of tokens
17
3. Execute map/reduce operations on tokens, execute other operations
18
- Ignite: Advanced in-memory database with map-reduce, sql, etc...
19
- Hama: Iterative computations.
20
- Beam: Google Cloud dataflow model which cna be executed later on Google Cloud or with Spark
21
- Zeppelin: Online notebooks like Jupiter
25
Types: Key-value database (Redis), document database (MongoDb/RethinkDB), column data model (Cassandra), graph data model, sql data model, sql-style on top of nosql (Hive)