Xarray Fundamentals

Xarray Fundamentals#

Computation#

Xarray dataarrays and datasets work seamlessly with arithmetic operators and numpy array functions.

temp_kelvin = argo.temperature + 273.15
temp_kelvin.plot(yincrease=False)

<matplotlib.collections.QuadMesh at 0x79970223ad80>

../../_images/8c50219bfdb666d9d60acab6dfe890081ba18b5c8ba09f2b1f899f3324079cb6.png

We can also combine multiple xarray datasets in arithemtic operations

g = 9.8
buoyancy = g * (2e-4 * argo.temperature - 7e-4 * argo.salinity)
buoyancy.plot(yincrease=False)

<matplotlib.collections.QuadMesh at 0x799702193740>

../../_images/87654595088921105d9231cc1ace9903e139f2a8a45bf286e7c843724aeace36.png

Loading Data from netCDF Files#

NetCDF (Network Common Data Format) is the most widely used format for distributing geoscience data. NetCDF is maintained by the Unidata organization.

Below we quote from the NetCDF website:

NetCDF (network Common Data Form) is a set of interfaces for array-oriented data access and a freely distributed collection of data access libraries for C, Fortran, C++, Java, and other languages. The netCDF libraries support a machine-independent format for representing scientific data. Together, the interfaces, libraries, and format support the creation, access, and sharing of scientific data.

NetCDF data is:

Self-Describing. A netCDF file includes information about the data it contains.

Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.

Scalable. A small subset of a large dataset may be accessed efficiently.

Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.

Sharable. One writer and multiple readers may simultaneously access the same netCDF file.

Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.

Xarray was designed to make reading netCDF files in python as easy, powerful, and flexible as possible. (See xarray netCDF docs for more details.)

Below we load some data from the LEAP catalog.

# https://catalog.leap.columbia.edu/feedstock/australian-gridded-climate-data-agcd
import xarray as xr

store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/AGDC-feedstock/AGCD.zarr'
ds = xr.open_dataset(store, engine='zarr', chunks={})

ds

<xarray.Dataset> Size: 175GB
Dimensions:     (lat: 691, lon: 886, time: 17897)
Coordinates:
  * lat         (lat) float32 3kB -44.5 -44.45 -44.4 ... -10.1 -10.05 -10.0
  * lon         (lon) float32 4kB 112.0 112.1 112.1 112.2 ... 156.1 156.2 156.2
  * time        (time) datetime64[ns] 143kB 1971-01-01T09:00:00 ... 2019-12-3...
Data variables:
    precip      (time, lat, lon) float32 44GB dask.array<chunksize=(40, 691, 886), meta=np.ndarray>
    tmax        (time, lat, lon) float32 44GB dask.array<chunksize=(40, 691, 886), meta=np.ndarray>
    tmin        (time, lat, lon) float32 44GB dask.array<chunksize=(40, 691, 886), meta=np.ndarray>
    vapourpres  (time, lat, lon) float32 44GB dask.array<chunksize=(40, 691, 886), meta=np.ndarray>
Attributes: (12/36)
    Conventions:               CF-1.6, ACDD-1.3
    acknowledgment:            The Australian Government, Bureau of Meteorolo...
    agcd_version:              AGCD (AWAP) v1.0.0 Snapshot (1900-01-01 to 202...
    analysis_components:       0900: the gridded vapour pressure value at 9am...
    attribution:               Data should be cited as : Australian Bureau of...
    cdm_data_type:             Grid
    ...                        ...
    summary:                   The partial pressure of water vapour in air (v...
    time_coverage_end:         1971-12-31T00:00:00
    time_coverage_start:       1971-01-01T15:00:00
    title:                     Interpolated Vapour Pressure
    url:                       http://www.bom.gov.au/climate/
    uuid:                      e684e0a6-73c7-4522-ab78-a8285ca34b4b

ds.tmax.isel(time=-1).plot()

<matplotlib.collections.QuadMesh at 0x7996daa828d0>

../../_images/de314e18ced1887aba7f042204f990bf4e53e4d05a562cea13b3bca184edcb7d.png

ds.precip.isel(time=-1).plot()

<matplotlib.collections.QuadMesh at 0x7996dab2b440>

../../_images/68d3b76dd2e0650837d3ad16632306fb9f1be2c913e103301c3865c1cf119071.png

# https://catalog.leap.columbia.edu/feedstock/aws-noaa-optimum-interpolated-sst
import xarray as xr

store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa_oisst/v2.1-avhrr.zarr'
ds = xr.open_dataset(store, engine='zarr', chunks={})

ds

<xarray.Dataset> Size: 241GB
Dimensions:  (time: 14532, zlev: 1, lat: 720, lon: 1440)
Coordinates:
  * lat      (lat) float32 3kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88
  * lon      (lon) float32 6kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9
  * time     (time) datetime64[ns] 116kB 1981-09-01T12:00:00 ... 2021-06-14T1...
  * zlev     (zlev) float32 4B 0.0
Data variables:
    anom     (time, zlev, lat, lon) float32 60GB dask.array<chunksize=(20, 1, 720, 1440), meta=np.ndarray>
    err      (time, zlev, lat, lon) float32 60GB dask.array<chunksize=(20, 1, 720, 1440), meta=np.ndarray>
    ice      (time, zlev, lat, lon) float32 60GB dask.array<chunksize=(20, 1, 720, 1440), meta=np.ndarray>
    sst      (time, zlev, lat, lon) float32 60GB dask.array<chunksize=(20, 1, 720, 1440), meta=np.ndarray>
Attributes: (12/37)
    Conventions:                CF-1.6, ACDD-1.3
    cdm_data_type:              Grid
    comment:                    Data was converted from NetCDF-3 to NetCDF-4 ...
    creator_email:              oisst-help@noaa.gov
    creator_url:                https://www.ncei.noaa.gov/
    date_created:               2020-05-08T19:05:13Z
    ...                         ...
    source:                     ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfin...
    standard_name_vocabulary:   CF Standard Name Table (v40, 25 January 2017)
    summary:                    NOAAs 1/4-degree Daily Optimum Interpolation ...
    time_coverage_end:          1981-09-01T23:59:59Z
    time_coverage_start:        1981-09-01T00:00:00Z
    title:                      NOAA/NCEI 1/4 Degree Daily Optimum Interpolat...

ds.sst.isel(time=-1, zlev=0).plot()

<matplotlib.collections.QuadMesh at 0x7996da211d30>

../../_images/d1594006799ce0caf3ea5920680a7641bc484d3bd49b1fb3adb5cfe5cc1113e4.png

ds.time

<xarray.DataArray 'time' (time: 14532)> Size: 116kB
array(['1981-09-01T12:00:00.000000000', '1981-09-02T12:00:00.000000000',
       '1981-09-03T12:00:00.000000000', ..., '2021-06-12T12:00:00.000000000',
       '2021-06-13T12:00:00.000000000', '2021-06-14T12:00:00.000000000'],
      shape=(14532,), dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 116kB 1981-09-01T12:00:00 ... 2021-06-14T1...
Attributes:
    long_name:  Center time of the day

Xarray Tutorials:#

You can find a nice tutorial to fundamentals of Xarray on youtube here.

If you prefer reading there is a Xarray in 45 mins

And much much more: https://tutorial.xarray.dev/intro.html and https://www.youtube.com/playlist?list=PLNemzZpJM7lUu_iGP_lA2m7SeSUwKSIvR

Xarray Fundamentals

Contents

Xarray Fundamentals#

Xarray data structures#

DataArray#

Multidimensional DataArray#

Datasets#

Coordinates vs. Data Variables#

Selecting Data (Indexing)#

Computation#

Broadcasting, Aligment, and Combining Data#

Broadcasting#

Alignment#

Combing Data: Concat and Merge#

Reductions#

Weighted Reductions#

Loading Data from netCDF Files#

Xarray Tutorials:#