Project Data

New data imported into the operational_analysis toolkit can take advantage of the data structures in the types module. The base class is operational_analysis.types.PlantData, which represents all known pieces of data about a given wind plant.

Schemas

Operational Data

PlantData.scada

Field Name

Data Type

time

datetime64[ns]

id

string

power_kw

float64

windspeed_ms

float64

winddirection_deg

float64

status_label

string

pitch_deg

float64

temp_c

float64

PlantData.meter

Field Name

Data Type

time

datetime64[ns]

power_kw

float64

energy_kw

float64

PlantData.tower

Field Name

Data Type

time

datetime64[ns]

id

float64

PlantData.curtail

Field Name

Data Type

time

datetime64[ns]

curtailment_pct

float64

availability_pct

float64

net_energy

float64

PlantData.status

Field Name

Data Type

time

datetime64[ns]

id

string

status_id

int64

status_code

int64

status_text

string

PlantData.asset

Field Name

Data Type

id

string

latitude

float64

longitude

float64

rated_power_kw

float64

type

string

Reanalysis Products

Reanalysis products are included as Plant Data objects and, regardless of data source, have a standardized set of field names and types (see below). That said, the data sources are obviously different, as are the methods use to calculate these standard fields from the raw datasets. These methods are described here.

PlantData.reanalysis.product[“merra2”]

MERRA-2 data are based on the single-level diagnostic data available here:

https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_V5.12.4/summary?keywords=%22MERRA-2%22

Wind speed and direction are taken directly from the diagnostic 50-m u- and v-wind fields provided in this dataset. Air density at 50m is calculated using temperature and pressure estimations at 50m and the ideal gas law. Temperature at 50m is estimated by taking the 10-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 50m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name

Data Type

time

datetime64[ns]

windspeed_ms

float64

winddirection_deg

float64

rho_kgm-3

float64

PlantData.reanalysis.product[“ncep2”]

NCEP-2 data are based on the single-level diagnostic data available here:

https://rda.ucar.edu/datasets/ds091.0/

Wind speed and direction are taken directly from the diagnostic 10-m u- and v-wind fields provided in this dataset. Air density at 10m is calculated using temperature and pressure estimations at 10m and the ideal gas law. Temperature at 10m is estimated by taking the 2-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 10m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name

Data Type

time

datetime64[ns]

windspeed_ms

float64

winddirection_deg

float64

rho_kgm-3

float64

PlantData.reanalysis.product[“erai”]

ERA-interim data are based on the model-level data available here:

https://rda.ucar.edu/datasets/ds627.0/

Model levels are based on sigma coordinates (i.e. fractions of surface pressure). From this dataset, we extract temperature, u-wind, and v-wind at the 58th model level, which is on average about 72m above ground level (https://www.ecmwf.int/en/forecasts/documentation-and-support/60-model-levels). We also extract surface pressure data. Air density at the 58th model level is calculated using temperature data extracted at that level and an estimation of pressure at that level using the ideal gas law. Pressure at the 58th model level is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name

Data Type

time

datetime64[ns]

windspeed_ms

float64

winddirection_deg

float64

rho_kgm-3

float64

PlantData

class operational_analysis.types.plant.PlantData(path, name, engine='pandas', toolkit=['pruf_analysis'], schema=None)[source]

Bases: object

Data object for operational wind plant data.

This class holds references to all tables associated with a wind plant. The tables are grouped by type:
  • PlantData.scada

  • PlantData.meter

  • PlantData.tower

  • PlantData.status

  • PlantData.curtail

  • PlantData.asset

  • PlantData.reanalysis

Each table must have columns following the following convention:

The PlantData object can serialize all of these structures and reload them them from the cache as needed.

The underlying datastructure is a TimeseriesTable, which is agnostic to the underlying engine and can be implemented with Pandas, Spark, or Dask (for instance).

Individual plants will extend this object with their own prepare() and other methods.

Create a plant data object without loading any data.

Parameters
  • path (string) – path where data should be read/written

  • name (string) – uniqiue name for this plant in case there’s multiple plant’s data in the directory

  • engine (string) – backend engine - pandas, spark or dask

  • toolkit (list) – the _tool_classes attribute defines a list of toolkit modules that can be loaded

Returns

New object

amend_std(dfname, new_fields)[source]

Amend a dataframe standard with new or changed fields. Consider running ensure_columns afterward to automatically create the new required columns if they don’t exist.

Parameters
  • dfname (string) – one of scada, status, curtail, etc.

  • new_fields (dict) – set of new fields and types in the same format as _scada_std to be added/changed in

  • the std

Returns

New data field standard

ensure_columns()[source]

@deprecated Ensure all dataframes contain necessary columns and format as needed

get_time_range()[source]

Get time range as tuple

Returns

start_time(datetime): start time stop_time(datetime): stop time

Return type

(tuple)

load(path=None)[source]

Load this project and all associated data from a file path

Parameters

path (string) – Location of plant data directory. Defaults to self._path

Returns

(None)

merge_asset_metadata()[source]

Merge metadata from the asset table into the scada and tower tables

prepare()[source]

Prepare this object for use by loading data and doing essential preprocessing.

save(path=None)[source]

Save out the project and all JSON serializeable attributes to a file path.

Parameters
  • path (string) – Location of new directory into which plant will be saved. The directory should not

  • already exist. Defaults to self._path

Returns

(None)

set_time_range(start_time, stop_time)[source]

Set time range given two unparsed timestamp strings

Parameters
  • start_time (string) – start time

  • stop_time (string) – stop time

Returns

(None)

validate(schema=None)[source]

Validate this plant data object against its schema. Returns True if valid, Rasies an exception if not valid.

AssetData

class operational_analysis.types.asset.AssetData(engine='pandas')[source]

Bases: object

This class wraps around a Pandas dataframe that contains metadata about the plant assets. It provides some useful functions to work with this data (e.g., calculating nearest neighbors, etc.).

calculate_nearest(active_turbine_ids, active_tower_ids)[source]

Create or overwrite a column called ‘nearest_turbine_id’ or ‘nearest_tower_id’ which contains the asset id of the closest active turbine or tower to the closest turbine or tower. The columns are only valid for turbines or towers listed in the parameters of this function, and it will only calculate the value of the correct column for each asset. Turbines, for example, will have null ‘nearest_tower_id’ and vice versa.

Parameters
  • active_turbine_ids (list) – List of IDs of turbines to consider.

  • active_tower_ids (list) – List of IDs of met towers to consider.

Returns: None

Sets asset ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.

parse_geometry(srs='epsg:4326', zone=None, longitude=None)[source]

Calculate UTM coordinates from latitude/longitude.

The UTM system divides the Earth into 60 zones, each 6deg of longitude in width. Zone 1 covers longitude 180deg to 174deg W; zone numbering increases eastward to zone 60, which covers longitude 174deg E to 180deg. The polar regions south of 80deg S and north of 84deg N are excluded.

Ref: http://geopandas.org/projections.html

Parameters
  • srs (str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.

  • zone (int, optional) – UTM zone. If set to None (default), then calculated from the longitude.

  • longitude (float, optional) – Reference longitude for calculating the UTM zone. If None (default), then taken as the average longitude of all assets.

Returns: None

Sets asset ‘geometry’ column.

prepare(active_turbine_ids, active_tower_ids, srs='epsg:4326')[source]

Prepare the asset data frame for further analysis work. Currently, this function calls parse_geometry(srs) and calculate_nearest(active_turbine, active_tower), passing through the arguments to this function.

Parameters
  • active_turbine_ids (list) – List of IDs of turbines to consider.

  • active_tower_ids (list) – List of IDs of met towers to consider.

  • srs (str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.

Returns: None

Sets asset ‘geometry’, ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.

ReanalysisData

class operational_analysis.types.reanalysis.ReanalysisData(engine='pandas')[source]

Bases: object

This class houses the different reanalysis data products and their related funcitons for use in the PRUF OA code. ReanalysisData holds an array of TimeseriesTable in the _product attribute. The keys (names) of these attributes can be found in the _products attribute.