Storage Plugins
The starting point for using an alternate storage technology is to create implementations of the StoragePlugin interface and register them in StoragePluginURLParser. In addition to the StoragePlugin interface, additional optional interfaces facilitate additional functionality
ETLSource – The lets a StoragePlugin act as a source of data in the ETL process.
ETLDest – The lets a StoragePlugin act as a destination of data in the ETL process.
StorageMetrics – The lets a StoragePlugin provide metrics that are displayed in the UI and participate in capacity planning.
Writing a new StoragePlugin does take some effort but with this separation, you should be able to support a wide variety of storage technologies. For more details, please see the Javadoc.
NIO2
The
PlainStoragePlugin
can be viewed as a chunking
StoragePlugin. It
chunks data into Time instant t ↔ Chunk key,
well-defined time-partitions (instead of individual samples) and various
business processes in the archiver appliance understand these
time-partitions and deal with them efficiently. Each chunk has a well
defined key and one can choose to store a chunk in any
storage provider that provides block storage. The PlainStoragePlugin
supports multiple backends for serialization:
Protocol Buffers (PB): The default backend, using the
pb:scheme.Apache Parquet: A columnar storage backend, using the
parquet:scheme.
For existing PVs, you can modify storage parameters or switch backends using the /changeStore Management BPL action. This allows for updating partition granularity, compression settings, or migrating between PB and Parquet without losing data.
The PlainStoragePlugin uses Java NIO2 as the storage API. Java NIO2 has a documented mechanism for developing custom file system providers. Using custom NIO2 file system providers, one can store the chunks generated by the PlainStoragePlugin using storage technologies like
Database BLOBS
Any key/value store (for example, SciDB)
Other technologies that may be more appropriate
To add custom NIO2 file system providers for use in the archiver appliance, please look at the JavaDoc for our version of Java’s Paths.
Type systems
The archiver appliance uses Google’s ProtocolBuffers as the serialization scheme. There are plenty of other algorithms that offer the same functionality, of great interest is the serialization scheme used in the EPICS V4 protocol. Support for alternate serialization mechanisms is possible by adding support for alternate type systems. Please contact the collaboration if you’d want to consider using alternate serialization mechanisms.
For more detailed information on implementation and tools for analyzing Parquet files, see the Parquet backend page.
Configuration
To use the Parquet backend, use the parquet: scheme in your dataStores definition within policies.py. For the protobuf backend, use the pb: scheme.
Example:
"dataStores": [
"pb://localhost?name=STS&rootFolder=${ARCHAPPL_SHORT_TERM_FOLDER}&partitionGranularity=PARTITION_HOUR",
"parquet://localhost?name=MTS&rootFolder=${ARCHAPPL_MEDIUM_TERM_FOLDER}&partitionGranularity=PARTITION_DAY&compress=ZSTD&zstdLevel=0",
"parquet://localhost?name=LTS&rootFolder=${ARCHAPPL_LONG_TERM_FOLDER}&partitionGranularity=PARTITION_YEAR&compress=ZSTD&zstdLevel=5"
]
Compression
Parquet supports various compression codecs. The archiver appliance specifically supports:
UNCOMPRESSED: No compression.
SNAPPY: High speed, reasonable compression.
ZSTD: Excellent balance between compression ratio and speed.
ZSTD Configuration
When using ZSTD compression, several advanced configuration options are available via the storage plugin URL:
zstdBufferPool(boolean): Enables the use of a buffer pool for ZSTD compression/decompression. Defaults tofalse.zstdLevel(integer): Sets the ZSTD compression level (typically 1-22). Defaults to3.zstdWorkers(integer): Sets the number of worker threads for ZSTD. Defaults to0(single-threaded).
Conversion
The archiver appliance provides several ways to convert data between different backends or to update storage parameters for existing PVs.
ConvertFile Utility
The ConvertFile utility can be used for ad-hoc conversion of individual files.
# Convert a PB file to Parquet with ZSTD compression
java -cp ... edu.stanford.slac.archiverappliance.plain.utils.ConvertFile /data/pv.pb PARQUET compress=ZSTD zstdLevel=3
ChangeStore BPL
For a more automated approach, the /changeStore Management BPL action can be used to update the storage configuration for a single PV. This action allows you to:
Change Backend: Convert existing data from one backend to another (e.g., from
pbtoparquet).Update Partition Granularity: Change partition granularity (e.g., from
PARTITION_HOURtoPARTITION_DAY)Compression Parameters: any other storage plugin URL parameters (like compression settings).
etc