Making CF-compliant netCDF files

... and have geoserver eat them

  1. Intro
  2. Geoserver
  3. QnA
  4. Example Conversion
    1. Use time coordinates
    2. Use ordinal months or weeks
    3. Add grid_mapping
    4. Writing to netCDF
  5. Create a dummy dataset

Intro

Oh boy was this a shitshow to get the netCDFs CF compliant. The problem is that the standard is vague at times and more so, geoserver has its own (undocumented) requirements and absolutely provides no clue when it doesn’t like a format.

Converting to geotiffs would certainly be easier, but I guess I want to call myself a geospatial data scientist and it became a matter of stubborn pride to have geoserver eat my netCDFs directly.

What I found to make a netCDF CF compliant in a way that geoserver likes is the following. I am pretty sure this is a superset of what is needed and not all below is required.

  1. Every variable needs a units and long_name attribute. They can hold any string.
  2. Every spatial variable needs a grid_mapping attribute which contains the value “spatial_ref” (pointing to the CRS definition, see below)
  3. Every spatial coordinate needs a long_name and units attributes. The units attribute can either be “degrees north”/”degrees_east” (probably “degrees_south”/”degrees_west” works too) or “meter” (probably “feet” or “foot”(?) works too, but who uses shit units anyways?). For geographic coordinates, the long_name should hold the value “latitude”/”longitude”. For projected coordinates, I think you do not need a long_name attribute, but a standard_name attribute instead which should hold the value “projected_x_coordinate” / ”projected_y_coordinate”.
  4. Geoserver eats custom dimensions and coordinates (such as time). The CF standard is vague about how to add CF-compliant custom coordinates. However, it seems like if a coordinate has an attribute called positive with values either “up” or “down”, geoserver eats the coordinates just fine. This must be something the function inherits from the vertical dimension/coordinate (link to the section).
  5. I have not fully figured out time coordinates, but here is what I believe to be true: geoserver does not seem to understand nanoseconds. It seems like CF-compliant timestamps are stored as integer offsets from an arbitrary point in time. The offset is stored in the unit attribute as “seconds since yyyy-mm-ddTHH:MM:SS”. The milliseconds have to be cut off here, I think. The time coordinate also needs a calendar attribute, which can hold values like “standard”, “gregorian”, etc. Like all other variables/coordinates, I think it also needs a long_name which you should set to a value of “time”
  6. Coordinates cannot be uints8 or int8. Not sure if other data types work, but int64 definitely does!
  7. Your CRS is stored in the attributes of a variable called spatial_ref. I think this variable needs to be int (though it won’t hold any values). Though e.g. rioxarray will populate a bunch of attributes when you do ds.rio.write_crs() am 99% sure you only need to specify the crs_wkt attribute value of the WKT string (and honestly, this is way more sane to me than having the CRS being overdetermined and potentially have contradictory attributes)
  8. The xarray should hold global attributes Conventions with value “CF-1.8” and title with some string.

Geoserver

Important note that had cost me a good 10 hours day to figure out: Geoserver creates metadata files living next to the actual files for every store.

So the following can happen:

  1. You have a file called 420.nc
  2. This file is incorrectly formatted
  3. You try to add this file as a store to geoserver
  4. Geoserver gives you an error
  5. Geoserver creates a (hidden) metadata folder called .420_{UID} (with an autogenerated UID) next to the 420.nc file
  6. You fix your file and it is now in a format that geoserver can read
  7. You try to add the file as a store in geoserver again
  8. You get an error despite the file being correct
  9. This is because instead of re-opening the file, geoserver only looks in the metadata folder which contains information about the previously opened (and incorrectly formatted) version of the file

So you either need a new file name after fixing the file, or purge/delete the metadata folder of the file before trying to re-add as a store

QnA

Is there a way to test beyond trial and error?

To be a smartass: A file is CF-compliant if it follows the convention. But a way better answer: Actually, yes. Take a look here: https://cfconventions.org/software.html. The cf-checker is likely what you are looking for. CF-compliance does not guarantee geoserver will be able to read it though. The geoserver netCDF plugin has some (undocumented) restrictions. Also: It seems like geoserver version 2.25.4 through 2.26.1 do NOT work with the netCDF plugin. 2.25.3 is the latest one that does as of 2025-01-18. Warning: You wont get an immediate error when using the netCDF plugin with e.g. 2.26.1, however, geoserver will not be able to read CF-compliant netCDFs.

how well does it interact with GeoServer as a WMS.

Perfectly. Just like any other datastore would. With the added advantage that netCDF variables are inherently n-dimensional (in contrary to e.g. geotiffs, which really only understand “band” as a possible third dimension), allowing you to slice a n-D array in the WMS query. If you e.g. had a dimension and coordinates “elevation”, you’d simply add “&DIM_ELEVATION=x” to your WMS request.

perhaps if we do have CF-compliant net-CDFs, then display as WMS is straightforward.

Yes. Very. And making a file CF-compliant is relatively easy as long as you know what to do. All it takes is modifying/adding some attributes, thus no computation required.

Example Conversion

ds = xarray.open_dataset(path)

Use time coordinates

we need to adjust timestamps to seconds since geoserver does not seem to understand nanoseconds. It seems like CF-compliant timestamps are stored as integer offsets from an arbitrary point in time. The offset is stored in the unit attribute as 'seconds since yyyy-mm-ddTHH:MM:SS'. The milliseconds have to be cut off, I think. In the xarray side of things, the unit can also be stored in either the encoding or the attributes.

reference_date = ds['time'].min().data.astype('datetime64[s]')
calendar = ds['time'].encoding['calendar']
#or maybe: calendar = 'standard'
time_in_seconds = ((ds['time'].data - reference_date) / np.timedelta64(1, 's')).astype(np.int64)
ds = ds.assign_coords(time=time_in_seconds)
ds['time'].attrs['units'] = f'seconds since {reference_date}'
ds['time'].attrs['calendar'] = calendar
ds['time'].attrs['long_name'] = 'time'

Use ordinal months or weeks

This is freaking odd, but geoserver eats custom dimensions and coordinates. The CF standard is vague about how to add CF-compliant custom coordinates. However, it seems like if a coordinate has a 'positive' attribute with either up or down value, geoserver eats the coordinates just fine. This must be something the function inherits from the vertical dimension/coordinate (link to the section). Note: the coordinates cannot be uints8 or int8. Not sure if other data types work, but int64 does!

ds['month'] = ds['month'].astype(np.int64)
ds['month'].attrs['long_name'] = 'Month of the Year'
ds['month'].attrs['units'] = 'months'
ds['month'].attrs['positive'] = 'up'

Add grid_mapping

CF-compliance allows the use of projected coordinates. We specify the projection of a variable in its grid_mapping attribute. The grid_mapping has to equal the name of a variable that holds the projection information. The projection is stored in the variable’s attributes. The following will suffice for QGIS, panoply, and geoserver to understand the projection

crs = pyproj.crs.CRS('EPSG:4326')
ds['spatial_ref'] = 0
ds['spatial_ref'].attrs['spatial_ref'] = crs.to_wkt(version='WKT1_GDAL')

But for it to be cf_compliant, we need more attributes (which is annoying since it overdetermines the projection). You can create the whole CF-compliant dictionary with pyproj.CRS.to_cf(). Also, note that I am not sure if the wkt version has to be WKT1_GDAL.

We also need to add attributes to the x and y coordinates. For geographic coordinates,

ds['x'].attrs['long_name'] = "longitude"
ds['x'].attrs['units'] = "degrees_east"

ds['y'].attrs['long_name'] = "latitude"
ds['y'].attrs['units'] = "degrees_north"

For projected, e.g.:

ds['x'].attrs["standard_name"] = "projection_x_coordinate"
ds['x'].attrs["units"] = "meter"
ds['y'].attrs["standard_name"] = "projection_y_coordinate"
ds['y'].attrs["units"] = "meter"

And then for every spatial variable:

ds['fsca'].attrs['grid_mapping'] = 'spatial_ref'
ds['fsca'].attrs['units'] = 'percent'
ds['fsca'].attrs['long_name'] = 'fractional snow cover'

Writing to netCDF

ds.attrs['Conventions'] = 'CF-1.8'  # CF conventions version
ds.attrs['title'] = "CF output"
ds.to_netcdf(f'CF.nc', engine='netcdf4', format='NETCDF4', group='/')

Create a dummy dataset

import xarray as xr
import numpy as np
import pandas as pd

# Create sample data
time = pd.date_range('2000-01-01', periods=2, freq='D')
data = np.random.rand(2, 5, 5).astype(float)  # Example data
data2 = np.random.rand(2, 5, 5).astype(float)  # Example data

# Reference date for time
reference_date = np.datetime64('2000-01-01T00:00:00')
time_in_days = (time - reference_date) / np.timedelta64(1, 'D')
time_in_days = time_in_days.astype(float)

# Define coordinates
x = np.linspace(0, 10, 5)  # Example longitude values
y = np.linspace(45, 55, 5)    # Example latitude values

# Create the xarray Dataset
ds2 = xr.Dataset(
    {
        'variable': (['time', 'y', 'x'], data, {
            'units': 'units',
            'long_name': 'variable',
            '_FillValue': np.nan
            'grid_mapping': 'spatial_ref'
        }),
        'variable2': (['time', 'y', 'x'], data2, {
            'units': 'units',
            'long_name': 'variable2',
            '_FillValue': np.nan
            'grid_mapping': 'spatial_ref'
        })
    },
    coords={
        'time': (['time'], time_in_days, {
            'units': 'days since 2000-01-01 00:00:00',
            'calendar': 'gregorian'
        }),
        'x': (['x'], x, {
            'units': 'degrees_east',
            'standard_name': 'longitude',
            'long_name': 'Longitude'
        }),
        'y': (['y'], y, {
            'units': 'degrees_north',
            'standard_name': 'latitude',
            'long_name': 'Latitude'
        })
    }
)

crs = pyproj.crs.CRS('EPSG:4326')
ds2['spatial_ref'] = 0
ds2['spatial_ref'].attrs['spatial_ref'] = crs.to_wkt(version='WKT1_GDAL')

# Add global attributes
ds2.attrs['title'] = 'Sample NetCDF with CF-compliant spatial and time coordinates'
ds2.attrs['Conventions'] = 'CF-1.8'

# Save to NetCDF
ds2.to_netcdf(f'{prefix}/example_cf_compliantd.nc', engine='netcdf4', format='NETCDF4_CLASSIC')