3
- High speed streaming: software raid of few iSER disks (using tgt, may
5
- Cluster of multiple standard nodes (no big storage nodes): Gluster
6
- Single storage node and several computational nodes: iSER + OCFS2
7
- Few big storage nodes: Gluster/FhGFS/Ceph ?
8
- Backup node: DRBD or Gluster if performance is not crucial.
12
iSCSI/iSER - protocols used to forward block devices over network. The device
13
can be either used as local device on a single node or forwarded to
14
multiple client nodes, but, then, a access synchronisation mechanism
15
is required. Such mechanims are provided by OCFS2, for instance.
16
DRBD - Distributed Replicated Block Device. It is a protocol to organize
17
synchronization between several nodes to provide high avalability
18
service. I.e. normally there will be two nodes: master and backup.
19
* Normally, the data is only written on a single system and replicated
20
to another. If both systems are writting, OCFS2 on top of DRBD or
21
other synchronisation mechanism have to be used.
22
- In single-master mode, it is recommended to use pacemaker to
23
migrate active configurations. Particularly, it is possible to
24
migrate LVM mapping on top of the DRBD. Only currently active
25
master node will have LVM devices populated.
26
* It is not directly usable by the clients which are not members of
28
cLVM - Allows cluster of computers to manage shared storage using LVM.
29
It is kind of OCFS2 for LVM running on top of block devices shared
33
File Systems [based on DLM (Distributed Lock Manager)]
35
GFS2 - Provides a shared access to a single network block device (like iSCSI,
36
NBD, DRBD) to multiple clients providing access syncrhonization (with in
37
kernel DLM manager). In the kernel since 2.6.19. However, userland
38
utilities are packaged only for RH.
39
OCFS2 - Oracle solution to handle network block devices in cluster.
40
Installation procedure sounds simpler than for GFS. SuSE is supported.
44
pNFS - Parallel NFS (part of NFS 4.1 specifications). Provides a metadata
45
server which in reponse to client request tells him where to look for
46
data. The data, then, can be requested directly from storage, hence,
47
eleminating single node botleneck of traditional NFS server.
48
* There is client implementation ready, but no server. Actually, pNFS
49
is best considered as an access method for a real distributed
50
filesystem, not as a complete solution in and of itself. There
51
ideas to wrap GFS2, etc. More details: http://linux-nfs.org
53
modprobe.d: alias nfs-layouttype4-1 nfs_layout_nfsv41_files
54
mount -o minorversion=1 server:/filesystem
55
No additional steps are needed to enable pNFS; if the server
56
supports it, the client will automatically use it.
60
Gluster - Easy to setup clustering file system. It has only storage servers,
61
all meta-data is just rsynced between nodes. The location of file
62
(or chunk) is determined by file name (chunk number?). The storage
63
node may declare any directory in it's file system as part of the
64
file system (yes, it should not be a standalone mounted partition).
65
RDMA support is included out of the box. The current version has
66
very high latencies and performance problems with a few fast
67
storage nodes (i.e. it is fine interfacing multiple nodes with
68
just a pair of hard drives, but slow if a single node with attached
69
Raid is used). However, nothing in architecture prevents good performance.
71
- No kernel module and, hence, lots context switches and extremely high latency
72
+ To a level compensated by integrated NFS and Samba functionality
74
fhgfs/beegfs - Fraunhofer file system. Quite unixish and easy to install. Management,
75
Meta, and Data services. The client-side kernel module. Uses directories
76
as data stores. RDMA support. Max sequential read/write 500 MB/s per
77
node (SSD raid capable of 3.5 GB/s). No complete source available,
78
only kernel module and few libraries. There are builds for RHEL,
79
SLES, Debian. SLES build working with OpenSuSE 12.2
80
+ Provides a kernel module and hence fast.
81
- Clients need to install the fhgfs-clients
82
- No high availability yet: single managment server, etc. They advice
83
to use hearbeat + pacemaker (Clusterlabs Linux HA).
85
Ceph - Lustre-style. Has integrated fault tollerance. Automated data
86
migration to avoid hotspots. Merged in 2.6.34. RDMA seems is
87
not available at the moment. After heavy fight with authorization,
88
got about 100 MB/s (quite expected without rDMA).
91
Lustre - Conists of Management Server (MGS) storing configuration of Lustre
92
file-system, Metadata Servers (MDS) and Object Storage Servers (OSS).
93
Clients are directly communicating with all these server. The file
94
system is reported to be very fast and scalable. It is developed by
95
Sun and currently controlled by Oracle. Seems to have native rDMA
96
support. It seems to be fastest system out there and it is specially
98
The main disadvantage it is not in the main-line kernel. The patched
99
kernel and e2fsprogs are required (not possible to have just a
100
additional modules). Official patches are made only against RHEL and
101
SLES. As of 11.2012 (3.6 is long out) the latest patches are against
102
kernel 2.6.38. Also, seems to be lacking integrated fault tollerance,
103
3rd party solutions to be used bellow the OSS.
104
Details: http://wiki.whamcloud.com/
106
PohmelFS - Consists of 3 components.
107
* Eliptics (a p2p-based storage manager distributing data chunks to nodes
108
according to DHT tables (hash). Unlike Gluster, it is fully dynamic, the
109
nodes may come and go.
110
* Multiple storage backends optimized for different types of data.
111
* PohmelFS kernel module (since 2.6.30) providing a file system on top of
112
eliptics. The core components here is the cache coherency management.
113
It supports weak synchronization between mounted nodes in that regard,
114
that data read/written into local page cache is not synced with the storage
116
- Eliptics is Yandex development. However, the PohmelFS in the kernel is
117
older and based on something else. It is not clear when new will hit it.
118
Also, there are not known users of old a new. Why it end-up in the
121
MogileFS, WebDFS - data distributed through HTTP, metadata in stored
122
in PostgreSQL. Standard drives as storage nodes.
125
GPFS - proprietary file system from IBM in many ways similar to Lustre.
126
Unlike lustre, it has distributed metadata. There is also extra
127
features like snapshots, etc.
132
HDFS - File system for Apache Hadoop. In many respects similar to Lustre
133
but optimized use case than data nodes are compute nodes as well.
134
It is not a POSIX compatible. The mount is possible through FUSE
135
module, but it is performance inoptimal.
136
There is 3rd party support for rDMA implemented over JNI interface.