Peaxy - Aureum 4.0 Under the Hood

Peaxy Aureum

Aureum 4.0 Under the Hood

August 15, 2016 Peaxy Aureum

The latest Aureum release (4.0) builds on the 3.1 features of Peaxy Find (threaded search capability) and security data services. It also improves interoperability with Hadoop Distributed File System (HDFS) and Spark. This post gives an overview of how Aureum works and explains the technical reasons it offers such high-scale and performance. This has more technical descriptions than most other posts on this blog, but it could come in handy for your Chief Data Officer, system architect or data scientist.

Aureum – A Modern Approach to Data Access

Aureum is an integrated clustered file system and data access platform that offers dynamically expandable scale-out facilities to all applications. Its incrementally scalable file-serving infrastructure can run on commodity servers from different vendors, regardless of form factors or media type.

Aureum is thinking past the “data silos” model that tends to make critical business data go dark. Aureum enables enterprises to consolidate unstructured data from across the business into a unified namespace that is easily accessible and searchable. Aureum creates truly scalable data access.

Versions 3.0 and 3.1 introduced new indexing and search capabilities, three levels of security, and HDFS and clustered SAMBA support.

Version 4.0 includes:

Upgrade to SolrCloud, delivering much faster search results.
Implementation of data classes, which can aggregate data sets by project or data type regardless of the file pathname, group or user.
S3-based storage class in the cloud
Two-factor authentication
Many performance improvements

How Aureum Works

Aureum is a file-serving cluster built out of Virtual Machines (VMs) or Containers running on several physical servers. VMs provide better fault isolation than Containers, but are less efficient than Containers. (Note: To simplify the discussion, in the following we will only talk about VMs, without explicitly mentioning Containers, unless the description specifically refers to only one of the two.)

Aureum server code runs in user space on each VM that is part of the cluster and that executes a separate Linux image. Containers within the same server all run under a single Linux instance. The servers, once assigned to Aureum, exclusively run Aureum software and are fully devoted to implementing Aureum abstractions to insure the appropriate levels of high performance and availability.

When Aureum is initially configured or new hardware is added, the system assigns storage, CPU, RAM and network ports to VMs as needed to achieve the desired trade-offs in terms of cost and performance. This allows the cluster to grow incrementally and organically while allowing the administrator to achieve the right match and balance of user requirements, disk capacity, network bandwidth, and hardware characteristics (mixing and matching vendors, form factors, hardware generations and types of storage devices) while offering a single namespace that aggregates all these physical devices.

Aureum is accessed through a small software client component that provides a POSIX-compliant interface for Linux systems or a Windows SMB interface, or finally via a clustered SAMBA infrastructure embedded within Aureum. This means that after Aureum has been mounted on a local client and the user has been authenticated, all applications will work like they have always worked with standard Windows shares.

Enabling a Scalable Distributed File System

The key function of a scalable distributed file system is to create a namespace that encompasses all the files in the system with great performance regardless of the number or size of the files.

In a file system, there is data, and there is metadata describing this data. The data structure containing the metadata is usually called a directory, subdirectory or folder. This data structure is hierarchical and the path to a particular file is called the data path. In many file systems, the directories are files, that is, the namespace and the data space are stored together on the same storage unit.

Usually a client has a single local storage unit, most commonly a solid-state drive (SSD) or a hard disk drive (HDD). In a storage system, there are a large number of storage units, often of different capacity and performance. In order to support distributed file storage, each Aureum VM manages the storage devices available to it. The physical media Aureum can work with include all types of drives, such as SATA, SAS and SSDs.

Each Aureum VM implements either a data space service or a namespace service. Data space services manage the storage resources where user file data is stored. The namespace server stores a subset of the basic hierarchical file system namespace that keeps track of the attributes for files, directories and symbolic links, such as ownership information, access permissions, creation and modification dates.

In Aureum, the namespace has its own storage subsystem, entirely and persistently stored in random access memory (RAM). This subsystem is backed on stable storage via a journaling mechanism. Therefore, any pathname-related operation like a file lookup or open is completely handled in RAM, avoiding disk I/O. The data structures used to implement this allow a large number of files and directories to be managed within a single VM.

In order to ensure high availability, both namespace and data space VMs are replicated across physical servers within software abstractions called “hyperservers”. Hyperservers provide the following benefits:

Each VM member of a hyperserver can manage and serve the same subset of data and metadata as the other members. Thus, besides providing availability, they can respond to client requests in parallel.
If a VM member of a hyperserver crashes or becomes unstable, the system fires up another one and replaces the VM that crashed.
Hyperservers work independently of each other, as they operate on their own metadata or data. This allows distributing the computations and the services within the unified namespace and avoiding bottlenecks and choke points.

For scalability, the entire Peaxy namespace is partitioned across hyperservers. Thus the namespace itself is a collection of fragments distributed across all namespace hyperservers. The clients see a single namespace, but know where each directory is located and communicate directly with the hyperserver in charge, without intermediaries—there are no bottlenecks.

A few more points to make on Aureum’s technical benefits:

VMs are expected to run on systems that make use of multi-core CPUs. Many of the choices made in Aureum’s design take advantage of the processing power of multi-core architectures, by trading off processing for I/O bandwidth.
Aureum is designed to support very different types of file storage loads, so the amount of RAM each VM is allowed to use is a function of the service each performs, the performance desired, and the cost goals for each deployment.
Unlike other systems, Aureum has no need to rely on any special types of network connections, other than IP over Ethernet. This lowers Total Cost of Ownership (TCO) by allowing access to very common technology and avoiding the need to manage multiple types of network infrastructures.