wiki:rock/paper/PFS_HPC/PFS-type2

1. Brief Introduction to Object Based Storage

  • In a more classic file system, such as a block based file system
    • the metadata manager is contacted and a set of inodes or blocks is allocated for a file
    • Metadata manager is responsible not only for the metadata itself but also where the data is located on the storage.
    • Metadata manager is a key part of the process. It also creates the potential for a performance problem.
  • Object storage takes a different approach allowing the storage devices themselves to manage where the data is stored.
    • metadata manager and storage are usually separate devices
    • metadata manager is contacted by a client about a file operation, such as a write.
    • metadata manager gets out of the way of the actual file operation and allows the client to directly contact the assigned storage devices
    • The metadata manager constantly monitors the operations
      • If there is a change in the file operation, such as another client wanting to read or write data to the file, then the metadata manager has to get involved to arbitrate the file operations
    • In general, the data operations happen without the metadata manager being in the middle


2. The Advantage to Object Based Storage

  • 如同開車到百貨公司 shopping,只需把車交給停車場的服務生, shopping 完後直接跟服務生拿鑰匙和停車票 (不需 Care 車子停在哪一層、那一區)
  • An object based system can optimize data layout (where your car is located), add more storage (add to the parking garage),
  • Object storage has a great deal of flexibility and possibilities for improving file system performance and scalability for clusters


3. Lustre

  • Open source PFS, Object based FS (scaling to ten of thousands of nodes and Petabytes of data)
  • Lustre stores data as objects called containers that are very similar to files, but are not part of a directory tree
  • The advantage to an object based file system is that allocation management of data is distributed over many nodes, avoiding a central bottleneck.
  • Lustre has a metadata component, a data component, and a client part
    • MetaData Servers (MDS)
      • these components to be put on different machines, or a single machine (usually only for home clusters or for testing)
      • The metadata can be distributed across machines called MetaData Servers (MDS), to ensure that the failure of one machine will not cause the file system to crash.
      • MDS support failover as well. In Lustre 1.x, you can use up to two MDS machines (one in active mode and one in standby mode) while in Lustre 2.x, the goal is to have tens or even hundreds of MDS machines.
    • Object Storage Servers (OSS)
      • The file system data itself is stored as objects on the Object Storage Servers (OSS) machines
      • The data can be spread across the OSS machines in a round-robin fashion (striped in a RAID-0 sense) to allow parallel data access across many nodes resulting in higher throughput
      • Lustre mount points can be put into /etc/fstab or in an automounter
  • uses an Open network protocol to allow the components of the file system to communicate. It uses an open protocol called Portals, originally developed at Sandia National Laboratories.
    • This allows the networking portion of Lustre to be abstracted so new networks can be easily added
    • Supports TCP networks (Fast Ethernet, Gigabit Ethernet, 10GigE),Quadrics Elan, Myrinet GM, Scali SDP, and Infiniband. Lustre also uses Remote Direct Memory Access (RDMA) and OS-bypass capabilities to improve I/O performance.
  • 有人抱怨 client 需 patche kernel 才能使用 Lustre. Very few people could take the patches and apply them to a Kernel.org kernel and successfully build and run Lustre
    • people had to rely on ClusterFS for kernels. This was a burden on the users and on ClusterFS.
    • With the advent of Lustre 1.6.x, ClusterFS now has a patchless kernel for the client if you use a 2.6.15-16 kernel or greater. According to ClusterFS, there may be a performance reduction when using the patchless client, but for many people this is a useful thing since they can quickly rebuild a Lustre client kernel if there is a security problem.
    • Lustre follows a slightly unusual open source model so that development for the project can be paid for
      • the newest version of Lustre is only available from Cluster File Systems, the company developing Lustre, while the previous version is available freely from www.lustre.org.

4. PanFS

5. PVFS2

Last modified 15 years ago Last modified on Mar 6, 2009, 2:42:33 PM

Attachments (1)

Download all attachments as: .zip