PSTORAGE−OVERVIEW

NAME
INTRODUCTION
CS WRITE JOURNAL
LOCAL CLIENT CACHE
AUTHOR
SEE ALSO

NAME

pstorage-ssd − overview of Parallels Cloud Storage SSD caching

INTRODUCTION

Along with using SSD drives for storing data chunks, Parallels Cloud Storage also supports the use of such drives for caching data to improve the cluster IO performance and reliability. You can create two types of caches on SSD drives:

CS write journal. You can attach an SSD drive to a chunk server in the cluster and configure the drive to store a write journal. By doing so, you can boost the performance of write operations in the cluster by up to 2 and more times. Besides, the journal can maintain chunk checksums to improve storage reliability.

Local client cache. You can attach an SSD drive to a client and configure the drive to store a local cache of frequently accessed data. By having a local cache on a client’s SSD drive, you can increase the overall cluster performance by up to 10 and more times.

WARNING! There is a lot of SSD models which are not server grade and may loose arbitrary set of data changes on power loss. Such SSDs should not be used in pstorage and are dangerous as may lead to data corruptions and inconsistencies. Please consult with the manual on which SSD models are known to be safe or verify it using pstorage−hwflush−check(1) utility.

CS WRITE JOURNAL

The CS write journal accumulates storage writes so that they can be flushed to a rotational disk less frequently and by bigger portions, thus improving overall IO throughput.

Configuring the write journal
You can configure the CS write journal when you create it using the following pstorage−make−cs(1) options:

−j, −−journal=path

Set the path to a directory to store the journal.

−s, −−journal−size=dsize[:msize]

Set the journal size where (1) dsize is the maximum size of data stored in the journal in megabytes, rounded up to a multiple of 64 Mb, and (2) msize is the maximum size of metadata stored in the journal (optional).

−S, −−no−checksums

Disable checksumming for journal transactions (optional).

If the journal directory does not exist, it is created. If the size option is omitted, the journal size is calculated automatically on the basis of free space available in the journal directory. Once created, the write journal cannot be dropped or resized.

The CS configuration directory has a symbolic link to the journal directory − <cs−root>/control/journal. By checking this link, you can always know if the CS was created with the journal and where it currently resides. If necessary, you can move the journal to another location provided that you update the link accordingly. You need to stop the CS before performing such operations.

Checksumming
Using checksumming, you can provide better reliability and integrity of all data in the cluster. When checksumming is enabled, Parallels Cloud Storage generates checksums each time some data in the cluster is modified. When this data is then read, the checksum is computed once more and compared with the already existing value.

By default, data checksumming is automatically enabled for all newly created chunk servers. If necessary, you can disable this functionality using the −S option when you set up a chunk server.

Data Scrubbing
Data scrubbing is the process of checking data chunks for durability and verifying their contents for readability and correctness. By default, Parallels Cloud Storage is set to examine two data chunks per minute on each chunk server in the cluster. If necessary, you can configure this number using the pstorage utility, for example:

pstorage −c my_cluster set−config mds.wd.verify_chunks=3

This command sets the number of chunks to be examined on each chunk server in the pcs1 cluster to 3.

LOCAL CLIENT CACHE

Another way of improving the overall cluster performance is to create a local cache on a client’s SSD drive. Once you create the cache, all cluster data accessed two or more times is put to that cache. Below we summarize the main features specific to a local cache:

Quick access time

Data in the local cache can be accessed much faster (up to 10 times and more) as compared to accessing the same data stored on chunk servers in the cluster.

No network bandwidth consumption

Cluster network bandwidth is not consumed because the data is accessed locally.

Special boot cache

Local cache uses a special boot cache to store limited amounts of data on file openings. This significantly speeds up the process of starting virtual machines and Containers

Cache survivability

Local cache is persistent and can survive a graceful system shutdown; however, it is dropped when the system crashes.

Sequential access filtering

Only randomly accessed data is cached. Data backup applications may generate a huge amount of sequential IO. Preventing such IO from being cached is important to avoid stressing the cache and evicting its content.

Configuring client cache
The client cache may be configured by passing the following options to the pstorage−mount(1) command:

−C CACHEFILE

Set the read cache file path. The cache is stored in a single file (rather than in a directory as in the case with the CS journal). The path must include a file name.

−R Mbytes

Read cache total size, in megabytes. If this parameter is not specified and the valid cache file cannot be found, the size is calculated from the free space available on the device where the cache file is located.

−B Mbytes

Boot cache size, in megabytes (the default is 1/8 of the total size).

−S

Disable the read cache checksum protection (enabled by default).

−b Kbytes

Set the read cache block size, in kilobytes (default is 64 KB). Increasing it may result in better performance of a large cache.

Only the path parameter (−C) is mandatory. For the first time cache initialization, you either specify the size parameter (−R) or rely on the default which depends on the free space available. If the cache already exists its parameters will be preserved unless you change them explicitly. If you change existing cache parameters when mounting a cluster all locally cached data will be destroyed. This, however, does not affect the data stored in the cluster.

If a mount point for the cluster is present in the /etc/fstab file, you can specify the cache parameters for this mount point, along with other mount options. For details, see the pstorage−mount(1) man page.

Querying cache information
You can tell if the cache is active for a mounted cluster and find its actual parameters, including the path to the cache file, by running the following command:

cat <mnt−root>/.pstorage.info/read_cache_info

If the cache is not active, the output will be empty. Otherwise, the command prints the path to the cache file, the main and boot cache sizes as well as the block size and the checksum protection status. You can also examine two other files in the <mnt−root>/.pstorage.info/ directory, read_cache and read_cache_files, to monitor the overall and file specific cache performance information. The pstorage(1) mnt−top command provides a more convenient and user−friendly way to access the same information.

AUTHOR

Copyright © 2011−2013, Parallels, Inc. All rights reserved.

SEE ALSO

pstorage(1), pstorage−make−cs(1), pstorage−mount(1)