ESCAPE QoS Architecture

From ESCAPE_WIKI
Jump to: navigation, search

Introduction

A Storage Quality of Service (QoS) represents a common agreement between storage providers and the scientists using that storage on how that storage system should behave. A QoS class is typically understood in terms of access latency, bandwidth, likelihood of data loss.

Some storage systems may provide a single QoS class, while others may provide several.

Data may require different QoS classes at different times. Moving away from simple descriptions (DISK and TAPE) to more general concepts may allow sites to provide storage in new and innovative ways that drive down cost. It may also allow trade-offs, such as providing increased storage capacity but with an increased risk of data loss.

QoS White Paper

The Working Group has published a white paper describing QoS and its potential within WLCG. The document is intended to frame and to stimulate the necessary debate on how the community can progress in realising the benefits of QoS.

The latest released version of the white paper is v1.0.0.

ESFRI Projects

This document describes the architecture used within DIOS to achieve (storage) QoS. It details the components and the roles those components play within DIOS to allow researchers to manage their storage QoS.

Document format

The ESFRI-specific QoS document is split into four sections.

1 The QoS policies section is a table, containing information about how the experiment sees QoS

  • Name, Where it is used, Important characteristics, Example media

2 The data life-cycle/work-flows section is more free-form

  • Describes the best guess at how data will be handled

3 The interactions with Rucio section describes how the experiment framework should achieve the desired QoS

  • List of operations to satisfy data life cycles

4 The use-cases section provide terse description of user interactions

  • Bring previous three items together

ATLAS

The following table describes the different VO QoS Policies for data from the ATLAS experiment at the Large Hadron Collider that were identified during the workshop.

Name Where it is used Important characteristics Example media
FAST_COMPUTATION Computation that requires reduced CPU usage, so more likely to be IO-bound. Most likely streaming access. Very low latency NVMe-SSD
COMPUTATION Any computation that is CPU-bound. Most likely streaming access. low latency RAID-6, CEPH with replication
MERGEABLE Data that is scheduled for some merging task High throughput RAID-6, CEPH with replication
ARCHIVE Data that is no longer of immediate interest (low cost) high durability Tape
DEFAULT A file that is either of interest (e.g., a log file needed to diagnose a problem), or might be of interest (output not yet validated) Reasonably reliable RAID/CEPH/...
LOGFILE A file that is unlikely to be of interest; for example, a log file for a job with output that has been validated. Cheapest available “Opportunistic storage”
HOT_ANALYSIS Data recently produced, which is likely to be used again. Most likely random-IO Low latency & good durability RAID-6, CEPH-Replication
COLD_ANALYSIS Data that has not been used for some time. Most likely random-IO Reasonably reliable CEPH-EC

Further information can found here on Google Doc.

SKA

The following table describes the different VO QoS Policies foreseen for Square Kilometre Array data in the SKA Regional Centre context. SKA has an anticipated start date of 2028, and these are a best-guess QoS policies in this context.

Name Where it is used Important characteristics Example media
FAST_COMPUTATION Computation that requires reduced CPU usage, so more likely to be IO-bound. Most likely streaming access Very low latency NVMe-SSD
COMPUTATION Any computation that is CPU-bound. Most likely streaming access. low latency RAID-6
INGEST Ingest from SDP High throughput RAID-6
LONG_TERM_ARCHIVE Data backup from INGEST. Data that is no longer of immediate interest (low cost) high durability Tape

Further information can found here on Google Doc

CTA

Envisaged levels for data from the Cherenkov Telescope Array:

Data Level Short Name Description
Level 0 RAW Data from DAQ written to disk.
Level 1 CALIBRATED Physical quantities measured in the camera: Photons, arrival times etc.

(Preliminary image shape parameters could be also included within)

Level 2 RECONSTRUCTED Reconstructed shower parameters such as energy, direction and particle ID.

Several increasingly sophisticated sub-levels are envisaged.

Level 3 REDUCED Sets of selected (e.g gamma-candidate) events.
Level 4 SCIENCE High-level binned data products like spectra, skymaps or lightcurves
Level 5 OBSERVATORY Legacy observatory data, such as CTA survey sky maps or the CTA source catalog.

The following table describes the different QoS Policies for CTA data.

Name Where it is used Important characteristics Example media
COLD_DL0 DL0 files must be stored as 2 replicas on cold storage. All DL0’s files for a given day must be stored in 2 given Data Centers Size is 6 PB / year Very good durability because DL0 can not be reproduced Tape
HOT_DL0_TEMPORARY_OFFSITE DL0 files must be stored temporarily on a hot storage before being accessible by the processing pipe. Sample of usage : reprocessing once a year, first processing. Low latency Spinning disc, SSD
HOT_DL0_TEMPORARY_ONSITE DL0 files must be stored temporary onsite on a hot storage before being sent to HOT_DL0_TEMPORARY_OFFSITE fIles are deleted once the transfert to COLD_DL0 and HOT_DL0_TEMPORARY_OFFSITE are completed Low latency Spinning disc, SSD
COLD_DL2_LATEST A new version of DL2 is computed per year. Latest version of DL2 files must be stored as 2 replicas on cold storage. Size is 600TB/ year. Good durability Tape
HOT_DL2_LATEST Latest version of DL2 files as 1 replica Low latency Spinning disc, SSD
COLD_DL2_PREVIOUS Previous version of DL2 files must be stored as 1 replica on cold storage Good durability Tape
HOT_DL3_LATEST Latest version of DL3 file must be stored as 1 replica for transfer to Science Archive Size is 6 TB / year Good durability (99.9%) High throughput Data available even in case of site or network outage RAID-6, CEPH with replication Tape (replicated in different sites)
COLD_DL3 All versions of DL3 file are stored as 2 replicas on cold storage. (one version per year) Very good durability Tape
LOGFILE An incremental backup of the DB which stores log files (2 replicas). This is only for backup purpose, a separate DB is used for ‘live’ data good durability Tape
TEMPORARY_STORAGE To store intermediate products needed by the reconstruction pipe. Low latency RAID-6, SSD, CEPH-Replication

Further information can found here on Google Doc.

FAIR

The upcoming Facility for Antiproton and Ion Research will support many experiments, each with different needs regarding data. So far we have gathered QoS information from the heavy-ion experiment CBM (Compressed Baryonic Matter) and the antiproton experiment PANDA (antiProton ANnihilation at DArmstadt).

CBM

The persistent data levels for CBM offline computing are:

Data Level Short Name Description
Level 1 RAW Selected de-contextualised (“unpacked”) data extracted from the data stream as delivered by the CBM data acquisition.

Typical objects are digis (digital single-channel information), typical containers are raw events.

Level 2 SIM Full MC information of simulated data. CBM intends to provide simulation statistics equivalent to the triggered experiment data statistics.

Of about 10% of these data, the full MC information will be kept.

Level 3 AOD Analysis Object Data serving as input for high-level user analysis. AOD are derived from RAW through calibration, reconstruction and skimming,

or from SIM through detector response simulation, calibration, reconstruction and skimming.
Different AOD types may be defined serving different physics analysis objectives. Typical objects are tracks, typical containers are reco events.

Level 4 PAR Parameter data required for the production of the AODs (calibration, reconstruction). These are needed for high-level physics analyses.

Typical parameter sets comprise the experiment configuration (geometry, settings) defined at the start-up of the experimental run,
the running conditions monitored and recorded during the experiment operation, and calibration parameters obtained through an analysis of RAW data.
Parameters are typically managed through appropriate data bases

Level 5 PHY Physics-level results, usually in a binned and inclusive format. These are derived from AOD and constitute the experiment results to be made public.

The following service levels can be identified on the base of the workflow outlined above. They do not include PAR and PHY data because of their negligible size compared to experiment and simulated data.

Name Usage Volume Characteristics Example media
RAW_COLD Long-term storage of prime experiment data. 18 PB/a High reliability and long-term stability Tape
RAW_HOT Availability for calibration and production of AOD for 2 years after data taking 36 PB Low latency Disc
AOD Availability for userlevel physics analysis up to 5 years after data taking 18 PB Low latency Disc
SIM Availability for MClevel analysis up to 3 years after production 16 PB Low latency Disc
AOD_SIM Availability for userlevel physics analysis up to 5 years after data taking 18 PB Low latency Disc

Further information can found here Doc link

PANDA

The following service levels can be identified on the base of the workflow outlined above. They do not include PAR and PHY data because of their negligible size compared to experiment and simulated data.

Name Usage Volume Characteristics Example media
RAW_COLD Long-term storage prime experiment data 1 PB/a High reliability and long-term stability Tape
RAW_HOT Availability for calibration and production of ESD for 4 years after data taking 4 PB Low latency Disc
ESD Transient data for AOD production 1 PB Low latency, transient Disc
AOD Availability for user-level physics analysis up to 10 years after data taking 23 PB Low latency, distributed access Disc
SIM Availability for MC-level analysis up to 4 years after production 11PB Low latency Disc
AOD_SIM Availability for userlevel physics analysis up to 10 years after data taking 12 PB Low latency, distributed access Disc
PAR_RAW Needed for RAW and ESD processing Low latency, distributed access Disc
PAR_COLD Long-term storage prime experiment data High reliability and long-term stability, Database Tape

LOFAR

Service levels of the LOw-Frequency Array:

Name Usage Characteristics Example media description/comments
UPLOAD_STAGING Upload area for ingest process Reliable, high throughput, normal latency Spinning disk, RAID6 Few bulk users
ARCHIVE Long-term archiving of data. Cheap but long-term reliable. Tape, powered-down spinning disk, etc. This is supposed to be a very cold Tier. High latency OK
DOWNLOAD_STAGING Staging of data prior to egest High throughput, normal latency Spinning disks, RAID0(?) Reliability is not too important. Many users.

Further information can found here Doc link



Other sources of information

There is also an overview of storage resources in the Datalake Status table (located on the WP2 DIOS page), and the Storage QoS page