.. Copyright 2022 – present, UBC EOAS MOAD Group and The University of British Columbia .. .. Licensed under the Apache License, Version 2.0 (the "License"); .. you may not use this file except in compliance with the License. .. You may obtain a copy of the License at .. .. https://www.apache.org/licenses/LICENSE-2.0 .. .. Unless required by applicable law or agreed to in writing, software .. distributed under the License is distributed on an "AS IS" BASIS, .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. See the License for the specific language governing permissions and .. limitations under the License. .. SPDX-License-Identifier: Apache-2.0 .. _IonaWastewaterDischargeAnalysis: Analysis of NEMO Runs with Iona Wastewater Discharge ==================================================== Susan is running various configurations of version 202111 that include a simulation of the Iona Island Wastewater Treatment Plant Deep Sea Outfall. Since those are "research run results" in contrast to collections of daily results files from long-running hindcasts the handling of the results files and the Reshapr model profile(s) is a little different. .. note:: This section serves as a guide for use of Reshapr for other "research run" applications. Notable differences include: * The research runs are executed on an HPC cluster in multi-day segments. For the Iona wastewater case the runs were done on ``graham``. Initial runs were 5 days long for debugging, tunning, and initial analysis development by Jake. Subsequent runs were 1 month long because that fits well in the 12-hour walltime scheduler partition on ``graham``. * The run results are downloaded from the HPC cluster to research storage on :file:`/ocean/$USER/` or :file:`/data/$USER/`. For the Iona wastewater case the results were downloaded to directory trees in :file:`/data/sallen/results/MEOPAR/wastewater/` such as :file:`/data/sallen/results/MEOPAR/wastewater/long_run/`. * The multi-day run results files like :file:`/data/sallen/results/MEOPAR/wastewater/long_run/SalishSea_1h_20180101_20180131_grid_T.nc` *must* be split into 1-day files stored in date-named subdirectories like :file:`/data/sallen/results/MEOPAR/wastewater/long_run/01jan18/SalishSea_1h_20180101_20180101_grid_T.nc`. At the moment, the beast way to do that is via the SalishSeaCast automation :py:mod:`nowcast.workers.split_results` worker. Only Doug and Susan have the necessary permissions to run that worker. Please ask them for help if you need to split results from another research run. * The Reshapr model profile is maintained by the user doing the analysis rather than it being included in the Reshapr code repository. Please see the :ref:`IonaWastewaterModelProfile` section below for details. .. _FileOrganizationAndExecutingExtractions: File Organization and Executing Extractions ------------------------------------------- Store your model profile and extraction configuration YAML files in a Git repository such as your analysis repository so that you can commit your changes to them and push them to GitHub to document your analysis history and make it reproducible. Here is an example from :file:`analysis-doug`: .. code-block:: text analysis-doug/ ├── ... ├── notebooks │   ├── ... │   └── wastewater │   ├── extract_biology.yaml │   └── model_profiles │   └── SalishSeaCast-202111-wastewater-salish.yaml Store the results of your extractions outside of a Git repository, for example, :file:`/ocean/dlatorne/MOAD/extractions/`. Extracted netCDF files are large binary files. *Do not try to push them to GitHub.* If you commit them and push them to GitHub you will quickly exceed file and repository size limits. They are products of the extraction process described by your model profile and extraction configuration YAML files. So, having those YAML files under version control is sufficient to enable you to reproduce the extracted netCDF files. Grab a copy of the model profile YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml Store your copy of that file in your analysis repository and commit it. Grab a copy of the sample extraction configuration YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/extract_biology.yaml Store your copy of that file in your analysis repository. Edit 2 lines of that file * line 5 that starts with ``model profile:`` to set the absolute path to your copy of the model profile YAML file * line 33 that starts with ``dest dir:`` to set the absolute path to your directory where you will store the results of your extractions Commit your modified file. In a terminal session on ``salish``, activate your ``reshapr`` conda environment, and do a test extraction. For Doug, that looks like: .. code-block:: text cd /ocean/dlatorne/MEOPAR/analysis-doug/ analysis-doug$ conda activate reshapr (/home/dlatorne/conda_envs/reshapr) analysis-doug$ reshapr extract notebooks/wastewater/extract_biology.yaml 2023-10-19 12:13:43 [info ] loaded config config_file=notebooks/wastewater/extract_biology.yaml 2023-10-19 12:13:43 [info ] loaded model profile model_profile_yaml=/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml 2023-10-19 12:13:48 [info ] dask cluster dashboard dashboard_link=http://127.0.0.1:8787/status dask_config_yaml=/ocean/dlatorne/MOAD/Reshapr-10jul23/cluster_configs/salish_cluster.yaml 2023-10-19 12:13:49 [info ] extracting variables 2023-10-19 12:13:49,882 - distributed.nanny - WARNING - Restarting worker 2023-10-19 12:13:50 [info ] wrote netCDF4 file nc_path=/ocean/dlatorne/MOAD/extractions/SalishSeaCast_wastewater_day_avg_biology_20180101_20180102.nc 2023-10-19 12:13:50 [info ] total time t_total=7.281958341598511 Be sure to use the path (relative or absolute) to your extraction YAML file in the :command:`reshapr extract` command. Changing the Extraction Parameters ---------------------------------- Here is the contents of the example :file:`extract_biology.yaml` file: .. code-block:: yaml :linenos: # Reshapr configuration to extract day-averages of interesting biology variables # near Iona Island wastewater outfall dataset: model profile: /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml time base: day variables group: biology dask cluster: salish_cluster.yaml start date: 2018-01-01 end date: 2018-01-02 extract variables: - ammonium - nitrate - diatoms selection: depth: # NOTE: use depth level numbers, not depths in meters depth max: 30 grid y: y min: 430 y max: 471 grid x: x min: 280 x max: 321 extracted dataset: name: SalishSeaCast_wastewater_day_avg_biology description: Day-averaged ammonium, nitrate & diatoms extracted from SalishSeaCast v202111 NEMO model with wastewater outfalls dest dir: /ocean/dlatorne/MOAD/extractions/ Version Control Your Extraction YAML Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As you build your collection of extraction YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible. Start and/or End Dates ^^^^^^^^^^^^^^^^^^^^^^ You can change the start and/or end dates for the extraction by editing the ``start date:`` and/or ``end date:`` lines in the YAML file. Alternatively, you can use the ``--start-date`` and/or ``--end-date`` command-line options in the :command:`reshapr extract` command to override the start and/or end dates in the YAML file. Use :command:`reshapr extract --help` to see the details of how to do that. Variables ^^^^^^^^^ You can change the variables that you extract by changing the ``variable group:`` name in line 5, and the list of variables names in the lines following the ``extract variables:`` key at line 13. To learn the names of the available variable groups and the variables in them, use the :command:`reshapr info` command with the path and file name of your model profile. For example: .. code-block:: text reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml: SalishSeaCast version 202111 NEMO with wastewater outfalls results on storage accessible from salish. variable groups from time intervals in this model: day biology chemistry biology growth rates grazing light mortality physics tracers vvl grid hour biology chemistry light physics tracers turbulence u velocity v velocity vvl grid w velocity Please use reshapr info model-profile time-interval variable-group (e.g. reshapr info SalishSeaCast-201905 hour biology) to get the list of variables in a variable group. Please use reshapr info --help to learn how to get other information, or reshapr --help to learn about other sub-commands. shows the lists of variable groups, divided into day-averaged and hour-averaged collections. From that we can see the list of variables in the day-averaged physics tracers variable group with: .. code-block:: text reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml day physics tracers /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml: SalishSeaCast version 202111 NEMO with wastewater outfalls results on storage accessible from salish. day-averaged variables in physics tracers group: - sossheig : Sea Surface Height [m] - votemper : Conservative Temperature [degree_C] - vosaline : Reference Salinity [g kg-1] - sigma_theta : Potential Density (sigma_theta) [kg m-3] - e3t : T-cell Thickness [m] Please use reshapr info --help to learn how to get other information, or reshapr --help to learn about other sub-commands. Depth-y-x Slab Selection ^^^^^^^^^^^^^^^^^^^^^^^^ You can change the depth, y direction, and x direction limits of your extraction by editing the ``selection:`` section that starts on line 18. Remember that Python uses 0-based indexing and that Python intervals are open on the right. So, to get the the y grid point from 430 to 470 you need to use: .. code-block:: yaml selection: grid y: y min: 430 y max: 471 Extraction File Name and Path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can change the beginning of the file name that your extracted netCDF dataset file will be written to and the description in its metadata by editing the ``name:`` and ``description:`` values in lines 30 and 31. With ``SalishSeaCast_wastewater_day_avg_biology`` as the value of ``name:``, and extraction for 2018-01-01 to 2018-01-31 will produce a netCDF file called :file:`SalishSeaCast_wastewater_day_avg_biology_20180101_20180131.nc`. You can change the directory where your extracted netCDF dataset files will be written to by editing the ``dest dir:`` value in line 33. As noted in :ref:`FileOrganizationAndExecutingExtractions`, *do not* store extracted netCDF dataset files in a Git repository or try to commit and push them to GitHub - they are too large. .. _IonaWastewaterModelProfile: Iona Wastewater Model Profile ----------------------------- Here is the contents of the :file:`SalishSeaCast-202111-wastewater-salish.yaml` file: .. code-block:: yaml :linenos: description: SalishSeaCast version 202111 NEMO with wastewater outfalls results on storage accessible from salish. time coord: name: time_counter y coord: name: y x coord: name: x # Chunking scheme used for the netCDF4 files # Note that coordinate names (keys) are conceptual here. # They are replaced with actual coordinate names in files in the code; # e.g. time is replaced by time_counter for dataset loading chunk size: time: 24 depth: 40 y: 898 x: 398 geo ref dataset: path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV21-08 y coord: gridY x coord: gridX extraction time origin: 2007-01-01 results archive: path: /data/sallen/results/MEOPAR/wastewater/long_run/ datasets: day: biology: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_biol_T.nc" depth coord: deptht chemistry: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc" depth coord: deptht biology growth rates: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_prod_T.nc" depth coord: deptht grazing: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc" depth coord: deptht light: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc" depth coord: deptht mortality: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc" depth coord: deptht physics tracers: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc" depth coord: deptht vvl grid: file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc" depth coord: deptht hour: biology: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_biol_T.nc" depth coord: deptht chemistry: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc" depth coord: deptht light: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc" depth coord: deptht physics tracers: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc" depth coord: deptht turbulence: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc" depth coord: depthw u velocity: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc" depth coord: depthu v velocity: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc" depth coord: depthv vvl grid: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc" depth coord: deptht w velocity: file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc" depth coord: depthw Version Control Your Model Profile Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When you create new model profile YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible. Change the Model Results Path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To work with model results in a different directory tree, change the value of ``path:`` in the ``results archive:`` section on line 31. For example, if Susan does model runs with alkalinity added to the Iona wastewater discharge, she might store the run results in :file:`/data/sallen/results/MEOPAR/wastewater/alkalinity_added/`. If you are changing the model results path in a model profile, you should seriously consider storing the profile in a new file with a different name, updating the ``description:`` at the top of the file, and committing it to version control.