Storage Plugins

The starting point for using an alternate storage technology is to create implementations of the StoragePlugin interface and register them in StoragePluginURLParser. In addition to the StoragePlugin interface, additional optional interfaces facilitate additional functionality

  1. ETLSource – The lets a StoragePlugin act as a source of data in the ETL process.

  2. ETLDest – The lets a StoragePlugin act as a destination of data in the ETL process.

  3. StorageMetrics – The lets a StoragePlugin provide metrics that are displayed in the UI and participate in capacity planning.

Writing a new StoragePlugin does take some effort but with this separation, you should be able to support a wide variety of storage technologies. For more details, please see the Javadoc.

NIO2

The PlainStoragePlugin can be viewed as a chunking StoragePlugin. It chunks data into Time instant t Chunk key, well-defined time-partitions (instead of individual samples) and various business processes in the archiver appliance understand these time-partitions and deal with them efficiently. Each chunk has a well defined key and one can choose to store a chunk in any storage provider that provides block storage. The PlainStoragePlugin supports multiple backends for serialization:

  1. Protocol Buffers (PB): The default backend, using the pb: scheme.

  2. Apache Parquet: A columnar storage backend, using the parquet: scheme.

For existing PVs, you can modify storage parameters or switch backends using the /changeStore Management BPL action. This allows for updating partition granularity, compression settings, or migrating between PB and Parquet without losing data.

The PlainStoragePlugin uses Java NIO2 as the storage API. Java NIO2 has a documented mechanism for developing custom file system providers. Using custom NIO2 file system providers, one can store the chunks generated by the PlainStoragePlugin using storage technologies like

  1. Database BLOBS

  2. Any key/value store (for example, SciDB)

  3. Other technologies that may be more appropriate

To add custom NIO2 file system providers for use in the archiver appliance, please look at the JavaDoc for our version of Java’s Paths.

Type systems

The archiver appliance uses Google’s ProtocolBuffers as the serialization scheme. There are plenty of other algorithms that offer the same functionality, of great interest is the serialization scheme used in the EPICS V4 protocol. Support for alternate serialization mechanisms is possible by adding support for alternate type systems. Please contact the collaboration if you’d want to consider using alternate serialization mechanisms.

For more detailed information on implementation and tools for analyzing Parquet files, see the Parquet backend page.

Configuration

To use the Parquet backend, use the parquet: scheme in your dataStores definition within policies.py. For the protobuf backend, use the pb: scheme.

Example:

"dataStores": [
    "pb://localhost?name=STS&rootFolder=${ARCHAPPL_SHORT_TERM_FOLDER}&partitionGranularity=PARTITION_HOUR",
    "parquet://localhost?name=MTS&rootFolder=${ARCHAPPL_MEDIUM_TERM_FOLDER}&partitionGranularity=PARTITION_DAY&compress=ZSTD&zstdLevel=0",
    "parquet://localhost?name=LTS&rootFolder=${ARCHAPPL_LONG_TERM_FOLDER}&partitionGranularity=PARTITION_YEAR&compress=ZSTD&zstdLevel=5"
]

Compression

Parquet supports various compression codecs. The archiver appliance specifically supports:

  • UNCOMPRESSED: No compression.

  • SNAPPY: High speed, reasonable compression.

  • ZSTD: Excellent balance between compression ratio and speed.

ZSTD Configuration

When using ZSTD compression, several advanced configuration options are available via the storage plugin URL:

  • zstdBufferPool (boolean): Enables the use of a buffer pool for ZSTD compression/decompression. Defaults to false.

  • zstdLevel (integer): Sets the ZSTD compression level (typically 1-22). Defaults to 3.

  • zstdWorkers (integer): Sets the number of worker threads for ZSTD. Defaults to 0 (single-threaded).

Conversion

The archiver appliance provides several ways to convert data between different backends or to update storage parameters for existing PVs.

ConvertFile Utility

The ConvertFile utility can be used for ad-hoc conversion of individual files.

# Convert a PB file to Parquet with ZSTD compression
java -cp ... edu.stanford.slac.archiverappliance.plain.utils.ConvertFile /data/pv.pb PARQUET compress=ZSTD zstdLevel=3

ChangeStore BPL

For a more automated approach, the /changeStore Management BPL action can be used to update the storage configuration for a single PV. This action allows you to:

  • Change Backend: Convert existing data from one backend to another (e.g., from pb to parquet).

  • Update Partition Granularity: Change partition granularity (e.g., from PARTITION_HOUR to PARTITION_DAY)

  • Compression Parameters: any other storage plugin URL parameters (like compression settings).

  • etc