techAggregation

Descriptions of the basic functions are given below.

Function descriptions:

Aggregation of RE technologies in every region.

techAggregation.aggregate_RE_technology(gridded_RE_ds=None, CRS_attr=None, shp_file=None, non_gridded_RE_ds=None, n_timeSeries_perRegion=1, capacity_var_name='capacity', capfac_var_name='capacity factor', region_var_name='region', longitude_dim_name='x', latitude_dim_name='y', time_dim_name='time', location_dim_name='locations', shp_index_col='region_ids', shp_geometry_col='geometry', linkage='average')[source]

Reduces the number of a particular RE technology (e.g. onshore wind turbine) to a desired number, within each region.

Note

The explanation below uses wind turbines as an example. It could, in reality, be any variable RE technology like PV, offshore wind turbine, etc.

The number of simulated wind turbines could be huge. This function reduces them to a few turbine types, in each of the defined region. Each wind turbine is characterised by its capacity and capacity factor time series.

The basic idea here is to group the turbines, within each region, such that the turbines with most similar capacity factor time series appear in the same group. Next, the turbines in each group are aggregated to obtain one turbine type, per group, thereby reducing the total number of turbines.

Please go through the parameters list below for more information.

Default arguments:

Parameters:
  • gridded_RE_ds (str/xr.Dataset) –

    Either the path to the dataset or the read-in xr.Dataset

    • Dimensions in this data - latitude_dim_name, longitude_dim_name, and time_dim_name

    • Variables: capacity_var_name and capfac_var_name

  • CRS_attr (str) – The attribute in gridded_RE_ds that holds its Coordinate Reference System (CRS) information

  • shp_file (str/GeoDataFrame) – Either the path to the shapefile or the read-in shapefile that should be overlapped with gridded_RE_ds, in order to obtain regions’ information

  • non_gridded_RE_ds (str/xr.Dataset) –

    Either the path to the dataset or the read-in xr.Dataset

    • Dimensions in this data - location_dim_name and time_dim_name

    • Variables - capacity_var_name, capfac_var_name, and region_var_name

    One can either pass gridded_RE_ds or non_gridded_RE_ds to work with. If both are passed, gridded_RE_ds is considered

  • n_timeSeries_perRegion (strictly positive int) –

    The number of time series to which the original set should be aggregated, within each region.

    • If set to 1, performs simple aggregation

      • Within every region, calculates the weighted mean of RE time series (capacities being weights), and sums the capacities.

    • If set to a value greater than 1, time series clustering is employed

      • Clustering method: Sklearn’s agglomerative hierarchical clustering

      • Distance measure: Euclidean distance

      • Aggregation within each resulting cluster is the same as simple aggregation


    * the default value is 1

  • capacity_var_name (str) – The name of the data variable in the provided dataset that corresponds to capacity
    * the default value is ‘capacity’

  • capfac_var_name (str) – The name of the data variable in the provided dataset that corresponds to capacity factor time series
    * the default value is ‘capacity factor’

  • region_var_name (str) – The name of the data variable in non_gridded_RE_ds that contains region IDs
    * the default value is ‘region’

  • longitude_dim_name (str) – The dimension name in gridded_RE_ds that corresponds to longitude
    * the default value is ‘x’

  • latitude_dim_name (str) – The dimension name in gridded_RE_ds that corresponds to latitude
    * the default value is ‘y’

  • time_dim_name (str) – The dimension name in in the provided dataset that corresponds to time
    * the default value is ‘time’

  • location_dim_name (str) – The dimension name in non_gridded_RE_ds that corresponds to locations
    * the default value is ‘locations’

  • shp_index_col (str) – The column in shp_file that needs to be taken as location-index in gridded_RE_ds
    * the default value is ‘region_ids’

  • shp_geometry_col (str) – The column in shp_file that holds geometries
    * the default value is ‘geometry’

  • linkage (str) –

    • Relevant only if n_timeSeries_perRegion is greater than 1.

    • The linkage criterion to be used with agglomerative hierarchical clustering. Can be ‘complete’, ‘single’, etc. Refer to Sklearn’s documentation for more info.


    * the default value is ‘average’

Returns:

regional_aggregated_RE_ds

  • Dimensions in this data: time_dim_name, ‘region_ids’

  • The dimension ‘region_ids’ has its coordinates corresponding to shp_index_col if

    gridded_RE_ds is passed. Otherwise, it corresponds to region_var_name values

If n_timeSeries_perRegion is greater than 1, additional dimension - ‘TS_ids’ is present * Within each region, different time series are indicated by this ‘TS_ids’

  • In addition, the dataset also contains attributes which indicate which time series were

    clustered. Calling represented_RE_ds.attrs would render a dictionary that contains <region_ids>.<TS_ids> as keys and a list of original locations as values. In case of gridded_RE_ds, these locations are a tuple - (x/longitude, y/latitude)

Return type:

xr.Dataset