CTA Data Analysis Use Case
Contents
Summary
This is the use case for the analysis of CTA data via the ESAP.
Actors
The actors could be one of the following:
- CTA Project PI
- Science User
- Member of the public
Requirements
Science Platform with ability to:
- search the data lake for CTA data
- search for relavent computing resources
- search for relavent notebooks
- the ability to submit jobs/workflows to distributed computing
Flow
- Search for Data
- cone search RA, DEC, solid angle, time period
- cone search Alt, AZ, solid angle, time period
- target name (Simbad), solid angle, time period
- by Source class, time period
- select data level (binned or unbinned)
- by project ID (maybe)
- Select data from search results or select all (Add to Basket)
- Find IRF that corresponds to the data selected - if data is not binned. (Add to Basket)
- Find corresponding metadata, log files etc .... (Download)
- Make quality selection cuts based on log files
- Compile the selected data with the correct IRFs (if data is not binned) into a runlist.
- The data can now be analysed either in interactive or batch sessions:
- Interactive (best for binned data):
- Search for Jupyter hub with appropriate modules/software
- Runlist or links to runlist are also available
- Search for and upload appropriate notebook
- Batch (best for unbinned data):
- Search for software/container appropriate for the data
- Calculate resources needed to analyse the data
- Find resources (eg via DIRAC)
- Submit to runlist to compute resources (eg via DIRAC)
- Monitor job, via GUI or other tool
- When job is finished a report should be sent to the user.
- Output data, metadata, logs etc ... transferred to chosen location
- Interactive (best for binned data):
- Data verification (maybe)