.. Copyright 2022 – present, UBC EOAS MOAD Group and The University of British Columbia .. .. Licensed under the Apache License, Version 2.0 (the "License"); .. you may not use this file except in compliance with the License. .. You may obtain a copy of the License at .. .. https://www.apache.org/licenses/LICENSE-2.0 .. .. Unless required by applicable law or agreed to in writing, software .. distributed under the License is distributed on an "AS IS" BASIS, .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. See the License for the specific language governing permissions and .. limitations under the License. .. SPDX-License-Identifier: Apache-2.0 .. _nibiResearchRunExtraction: Extractions from Research Runs on ``nibi`` ========================================== The new Alliance HPC clusters (``nibi``, ``fir``, ``rorqual``, and ``trillium``) that came online in mid-2025 have improved interactive session functionality that make it more attractive to do analysis work on the clusters instead of downloading results to ``salish`` for analysis. That strategy of "taking the compute to the data" is augmented by the ability to run Jupyter and Marimo notebooks on the clusters via the VSCode Remove - SSH extension. This section uses one of Jose's SHEM configuration collections of tuning runs for heterotrophic bacteria as the research runs example. Notable difference from other example uses include: * The Reshapr model profile is maintained by the user doing the analysis rather than it being included in the Reshapr code repository. Please see the :ref:`SHEM-DayAvgModelProfile` section below for details. * The extractions are run in interactive :command:`salloc` sessions or as jobs submitted via :command:`sbatch` .. _SHEM-FileOrganizationAndExecutingExtractions: File Organization and Executing Extractions ------------------------------------------- Store your model profile and extraction configuration YAML files in a Git repository such as your analysis repository so that you can commit your changes to them and push them to GitHub to document your analysis history and make it reproducible. Here is an example from :file:`analysis-doug`: .. code-block:: text analysis-doug/ ├── ... ├── notebooks/ │   ├── ... │   └── SHEM/ │   ├── extract_SHEM_heterotrophic_bacteria.yaml │   └── model_profiles/ │   └── Jose-SHEM-tuning-pred_flag.yaml Store the results of your extractions outside of a Git repository. The :file:`/scratch/` file system is a good choice, for example, :file:`/scratch/dlatorne/SHEM/`. Extracted netCDF files are large binary files. *Do not try to push them to GitHub.* If you commit them and push them to GitHub you will quickly exceed file and repository size limits. They are products of the extraction process described by your model profile and extraction configuration YAML files. So, having those YAML files under version control is sufficient to enable you to reproduce the extracted netCDF files. You will need to create a model profile YAML file. Please see the :ref:`SHEM-DayAvgModelProfile` section below for details and an example file. Store your model profile YAML file in your analysis repository and commit it. In the example above, the file is :file:`analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml`. You will also need to create an extraction configuration YAML file. Please see the :ref:`SHEM-ExtractConfig` section below for details and an example file. In the example above, the file is :file:`analysis-doug/notebooks/SHEM/extract_SHEM_heterotrophic_bacteria.yaml`. If you start from the example file below, please be sure to edit it to set the correct paths for your system: * line 5 that starts with ``model profile:`` to set the absolute path to your copy of the model profile YAML file * line 18 that starts with ``dest dir:`` to set the absolute path to your directory where you will store the results of your extractions Commit your modified extraction configuration YAML file. Reshapr extractions *must* be run on compute nodes on ``nibi``, not on the login nodes. You can do this either in an interactive :command:`salloc` session or as a batch job submitted via :command:`sbatch`. Please see the :ref:`ReshaprBatchJobScript` section below for an example batch job script. In a terminal session on ``nibi``, request an interactive session with .. code-block:: bash salloc --time=1:00:00 --mem-per-cpu=8000M --ntasks=16 --ntasks-per-node=16 --account=def-allen * The values for ``--mem-per-cpu``, ``--ntasks``, and ``--ntasks-per-node`` are set to match the Dask cluster configuration in :file:`Reshapr/cluster_configs/nibi_cluster.yaml`. * The values for ``--ntasks`` and ``--ntasks-per-node`` must be the same to ensure that all of the cores are allocated on the same node. * Extractions typically take a few minutes each. So, a value for ``--time`` of tens of minutes to a few hours should be sufficient for most extractions. Once your interactive session starts, activate your ``reshapr`` conda environment and run your extraction with the :command:`reshapr extract` command. An example of doing that looks like: .. code-block:: text cd $HOME/MEOPAR/analysis-doug/notebooks/SHEM/ salloc --time=1:00:00 --mem-per-cpu=8000M --ntasks=16 --ntasks-per-node=16 --account=def-allen salloc: Pending job allocation 7615270 salloc: job 7615270 queued and waiting for resources salloc: job 7615270 has been allocated resources salloc: Granted job allocation 7615270 salloc: Waiting for resource configuration salloc: Nodes c205 are ready for job analysis-doug$ conda activate reshapr (/home/dlatorne/conda_envs/reshapr) analysis-doug$ reshapr extract extract_SHEM_heterotrophic_bacteria.yaml 2026-01-26 14:43:35 [info ] loaded config config_file=extract_SHEM_heterotrophic_bacteria.yaml 2026-01-26 14:43:35 [info ] loaded model profile model_profile_yaml=/home/dlatorne/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml 2026-01-26 14:43:40 [info ] dask cluster dashboard dashboard_link=http://127.0.0.1:8787/status dask_config_yaml=nibi_cluster.yaml 2026-01-26 14:43:41 [info ] extracting variables 2026-01-26 14:44:48 [info ] wrote netCDF4 file nc_path=/scratch/dlatorne/test-reshapr/SHEM_day_tuning_pred_flag_heterotrophic_bacteria_20180226_20180701.nc 2026-01-26 14:44:48 [info ] total time t_total=67.281958341598511 Be sure to use the path (relative or absolute) to your extraction YAML file in the :command:`reshapr extract` command. .. _SHEM-DayAvgModelProfile: SHEM Day-Averaged Results Model Profile --------------------------------------- Here is an example model profile YAML file for Jose's SHEM tuning/pred_flag research run: .. literalinclude:: Jose-SHEM-tuning-pred_flag.yaml :language: yaml :linenos: The details of creating model profile YAML files are described in the :ref:`ReshaprModelProfileYAMLFiles` section of the documentation. Version Control Your Model Profile Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When you create new model profile YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible. .. _SHEM-ExtractConfig: Extraction Configuration ------------------------ Here is an example extraction configuration YAML file for extracting heterotrophic bacteria from Jose's SHEM tuning/pred_flag research run: .. literalinclude:: extract_SHEM_heterotrophic_bacteria.yaml :language: yaml :linenos: Start and/or End Dates ^^^^^^^^^^^^^^^^^^^^^^ You can change the start and/or end dates for the extraction by editing the ``start date:`` and/or ``end date:`` lines in the YAML file. Alternatively, you can use the ``--start-date`` and/or ``--end-date`` command-line options in the :command:`reshapr extract` command to override the start and/or end dates in the YAML file. Use :command:`reshapr extract --help` to see the details of how to do that. Variables ^^^^^^^^^ You can change the variables that you extract by changing the ``variable group:`` name, and the list of variables names in the lines following the ``extract variables:`` key in the YAML file. To learn the names of the available variable groups and the variables in them, use the :command:`reshapr info` command with the path and file name of your model profile. For example: .. code-block:: text reshapr info ~/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml /home/dlatorne/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml: Jose's SalishSeaCast v202111 NEMO SHEM config results stored on nibi. 26feb18-02jul18 tuning pred_flag run. variable groups from time intervals in this model: day biology biology growth rates grazing Please use reshapr info model-profile time-interval variable-group (e.g. reshapr info SalishSeaCast-201905 hour biology) to get the list of variables in a variable group. Please use reshapr info --help to learn how to get other information, or reshapr --help to learn about other sub-commands. shows the list of day-averaged variable groups. From that we can see the list of variables in the day-averaged biology variable group with: .. code-block:: text reshapr info ~/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml day biology /home/dlatorne/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml: Jose's SalishSeaCast v202111 NEMO SHEM config results stored on nibi. 26feb18-02jul18 tuning pred_flag run. day-averaged variables in biology group: - nitrate : Nitrate Concentration [mmol m-3] - ammonium : Ammonium Concentration [mmol m-3] - silicon : Silicon Concentration [mmol m-3] - diatoms : Diatoms Concentration [mmol m-3] - flagellates : Flagellates Concentration [mmol m-3] - microzooplankton : Microzooplankton Concentration [mmol m-3] - dissolved_organic_nitrogen : Dissolved Organic N Concentration [mmol m-3] - particulate_organic_nitrogen : Particulate Organic N Concentration [mmol m-3] - biogenic_silicon : Biogenic Silicon Concentration [mmol m-3] - mesozooplankton : Mesozooplankton Concentration [mmol m-3] - heterotrophic_bacteria : Heterotrophic Bacteria Concentration [mmol m-3] - dissolved_oxygen : Dissolved Oxygen Concentration [mmol m-3] - dissolved_inorganic_carbon : Dissolved Inorganic C Concentration [mmol m-3] - total_alkalinity : Total Alkalinity Concentration [mmol m-3] Please use reshapr info --help to learn how to get other information, or reshapr --help to learn about other sub-commands. Depth-y-x Slab Selection ^^^^^^^^^^^^^^^^^^^^^^^^ You can specify depth, y direction, and x direction limits of your extraction by adding a ``selection:`` section to the YAML file after the ``extracted dataset:`` section. Example: .. code-block:: yaml dataset: model profile: /home/dlatorne/MEOPAR/analysis-doug/notebooks/SHEM/model_profiles/Jose-SHEM-tuning-pred_flag.yaml time base: day variables group: biology dask cluster: nibi_cluster.yaml start date: 2018-02-26 end date: 2018-07-02 extract variables: - heterotrophic_bacteria selection: depth: # NOTE: use depth level numbers, not depths in meters depth min: 0 depth max: 31 grid y: y min: 450 y max: 651 grid x: x min: 200 x max: 301 extracted dataset: name: SHEM_day_tuning_pred_flag_heterotrophic_bacteria description: Daily heterotrophic bacteria extracted from SHEM tuning/pred_flag run; depth levels=0:30, y=450:650, x=200:300 dest dir: /scratch/dlatorne/test-reshapr/ Remember that Python uses 0-based indexing and that Python intervals are open on the right. So, to get the the y grid point from 430 to 470 you need to use: .. code-block:: yaml selection: grid y: y min: 430 y max: 471 Extraction File Name and Path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can change the beginning of the file name that your extracted netCDF dataset file will be written to and the description in its metadata by editing the ``name:`` and ``description:`` values in the ``extracted dataset:`` section of the YAML file. The full file name will have the start and end dates appended to the ``name:`` value in the format ``_YYYYMMDD_YYYYMMDD.nc``. With ``SHEM_day_tuning_pred_flag_heterotrophic_bacteria`` as the value of ``name:``, an extraction for 2018-02-26 to 2018-07-02 will produce a netCDF file called :file:`SHEM_day_tuning_pred_flag_heterotrophic_bacteria_20180226_20180702.nc`. You can change the directory where your extracted netCDF dataset files will be written to by editing the ``dest dir:`` value in the ``extracted dataset:`` section of the YAML file. As noted in :ref:`SHEM-FileOrganizationAndExecutingExtractions`, *do not* store extracted netCDF dataset files in a Git repository or try to commit and push them to GitHub - they are too large. Version Control Your Extraction YAML Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As you build your collection of extraction YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible. .. _ReshaprBatchJobScript: Reshapr Batch Job Script Example -------------------------------- As an alternative to running your extractions in interactive :command:`salloc` sessions, you can run them as batch jobs submitted via :command:`sbatch`. Here is an example batch job script for running an extraction: .. literalinclude:: extract_nibi.sh :language: bash :linenos: As in the interactive :command:`salloc` example above, * The values for ``#SBATCH --mem-per-cpu``, ``#SBATCH --ntasks``, and ``#SBATCH --ntasks-per-node`` directives are set to match the Dask cluster configuration in :file:`Reshapr/cluster_configs/nibi_cluster.yaml`. * The values for ``#SBATCH --ntasks`` and ``#SBATCH --ntasks-per-node`` must be the same to ensure * that all of the cores are allocated on the same node. Lines 16 and 17 in the script enable ``conda`` and activate the ``reshapr`` conda environment. Submit the batch job with: .. code-block:: bash sbatch extract_nibi.sh