Project Data

New data imported into the operational_analysis toolkit can take advantage of the data structures in the types module. The base class is operational_analysis.types.PlantData, which represents all known pieces of data about a given wind plant.

Schemas

Operational Data

PlantData.scada

Field Name	Data Type
time	datetime64[ns]
id	string
power_kw	float64
windspeed_ms	float64
winddirection_deg	float64
status_label	string
pitch_deg	float64
temp_c	float64

PlantData.meter

Field Name	Data Type
time	datetime64[ns]
power_kw	float64
energy_kw	float64

PlantData.tower

Field Name	Data Type
time	datetime64[ns]
id	float64

PlantData.curtail

Field Name	Data Type
time	datetime64[ns]
curtailment_pct	float64
availability_pct	float64
net_energy	float64

PlantData.status

Field Name	Data Type
time	datetime64[ns]
id	string
status_id	int64
status_code	int64
status_text	string

PlantData.asset

Field Name	Data Type
id	string
latitude	float64
longitude	float64
rated_power_kw	float64
type	string

Reanalysis Products

Reanalysis products are included as Plant Data objects and, regardless of data source, have a standardized set of field names and types (see below). That said, the data sources are obviously different, as are the methods use to calculate these standard fields from the raw datasets. These methods are described here.

PlantData.reanalysis.product[“merra2”]

MERRA-2 data are based on the single-level diagnostic data available here:

https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_V5.12.4/summary?keywords=%22MERRA-2%22

Wind speed and direction are taken directly from the diagnostic 50-m u- and v-wind fields provided in this dataset. Air density at 50m is calculated using temperature and pressure estimations at 50m and the ideal gas law. Temperature at 50m is estimated by taking the 10-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 50m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name	Data Type
time	datetime64[ns]
windspeed_ms	float64
winddirection_deg	float64
rho_kgm-3	float64

PlantData.reanalysis.product[“ncep2”]

NCEP-2 data are based on the single-level diagnostic data available here:

https://rda.ucar.edu/datasets/ds091.0/

Wind speed and direction are taken directly from the diagnostic 10-m u- and v-wind fields provided in this dataset. Air density at 10m is calculated using temperature and pressure estimations at 10m and the ideal gas law. Temperature at 10m is estimated by taking the 2-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 10m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name	Data Type
time	datetime64[ns]
windspeed_ms	float64
winddirection_deg	float64
rho_kgm-3	float64

PlantData.reanalysis.product[“erai”]

ERA-interim data are based on the model-level data available here:

https://rda.ucar.edu/datasets/ds627.0/

Model levels are based on sigma coordinates (i.e. fractions of surface pressure). From this dataset, we extract temperature, u-wind, and v-wind at the 58th model level, which is on average about 72m above ground level (https://www.ecmwf.int/en/forecasts/documentation-and-support/60-model-levels). We also extract surface pressure data. Air density at the 58th model level is calculated using temperature data extracted at that level and an estimation of pressure at that level using the ideal gas law. Pressure at the 58th model level is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.

Field Name	Data Type
time	datetime64[ns]
windspeed_ms	float64
winddirection_deg	float64
rho_kgm-3	float64

PlantData

class operational_analysis.types.plant.PlantData(path, name, engine='pandas', toolkit=['pruf_analysis'], schema=None)[source]

Bases: object

Data object for operational wind plant data.

This class holds references to all tables associated with a wind plant. The tables are grouped by type:

PlantData.scada
PlantData.meter
PlantData.tower
PlantData.status
PlantData.curtail
PlantData.asset
PlantData.reanalysis

Each table must have columns following the following convention:

The PlantData object can serialize all of these structures and reload them them from the cache as needed.

The underlying datastructure is a TimeseriesTable, which is agnostic to the underlying engine and can be implemented with Pandas, Spark, or Dask (for instance).

Individual plants will extend this object with their own prepare() and other methods.

Create a plant data object without loading any data.

Parameters

path (string) – path where data should be read/written
name (string) – uniqiue name for this plant in case there’s multiple plant’s data in the directory
engine (string) – backend engine - pandas, spark or dask
toolkit (list) – the _tool_classes attribute defines a list of toolkit modules that can be loaded

Returns

New object

amend_std(dfname, new_fields)[source]

Amend a dataframe standard with new or changed fields. Consider running ensure_columns afterward to automatically create the new required columns if they don’t exist.

Parameters

dfname (string) – one of scada, status, curtail, etc.
new_fields (dict) – set of new fields and types in the same format as _scada_std to be added/changed in
the std

Returns

New data field standard

ensure_columns()[source]: @deprecated Ensure all dataframes contain necessary columns and format as needed

get_time_range()[source]

Get time range as tuple

Returns: start_time(datetime): start time stop_time(datetime): stop time
Return type: (tuple)

load(path=None)[source]

Load this project and all associated data from a file path

Parameters: path (string) – Location of plant data directory. Defaults to self._path
Returns: (None)

merge_asset_metadata()[source]: Merge metadata from the asset table into the scada and tower tables

prepare()[source]: Prepare this object for use by loading data and doing essential preprocessing.

save(path=None)[source]

Save out the project and all JSON serializeable attributes to a file path.

Parameters

path (string) – Location of new directory into which plant will be saved. The directory should not
already exist. Defaults to self._path

Returns

(None)

set_time_range(start_time, stop_time)[source]

Set time range given two unparsed timestamp strings

Parameters

start_time (string) – start time
stop_time (string) – stop time

Returns

(None)

validate(schema=None)[source]: Validate this plant data object against its schema. Returns True if valid, Rasies an exception if not valid.

AssetData

class operational_analysis.types.asset.AssetData(engine='pandas')[source]

Bases: object

This class wraps around a Pandas dataframe that contains metadata about the plant assets. It provides some useful functions to work with this data (e.g., calculating nearest neighbors, etc.).

calculate_nearest(active_turbine_ids, active_tower_ids)[source]

Create or overwrite a column called ‘nearest_turbine_id’ or ‘nearest_tower_id’ which contains the asset id of the closest active turbine or tower to the closest turbine or tower. The columns are only valid for turbines or towers listed in the parameters of this function, and it will only calculate the value of the correct column for each asset. Turbines, for example, will have null ‘nearest_tower_id’ and vice versa.

Parameters

active_turbine_ids (list) – List of IDs of turbines to consider.
active_tower_ids (list) – List of IDs of met towers to consider.

Returns: None: Sets asset ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.

parse_geometry(srs='epsg:4326', zone=None, longitude=None)[source]

Calculate UTM coordinates from latitude/longitude.

The UTM system divides the Earth into 60 zones, each 6deg of longitude in width. Zone 1 covers longitude 180deg to 174deg W; zone numbering increases eastward to zone 60, which covers longitude 174deg E to 180deg. The polar regions south of 80deg S and north of 84deg N are excluded.

Ref: http://geopandas.org/projections.html

Parameters

srs (str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.
zone (int, optional) – UTM zone. If set to None (default), then calculated from the longitude.
longitude (float, optional) – Reference longitude for calculating the UTM zone. If None (default), then taken as the average longitude of all assets.

Returns: None: Sets asset ‘geometry’ column.

prepare(active_turbine_ids, active_tower_ids, srs='epsg:4326')[source]

Prepare the asset data frame for further analysis work. Currently, this function calls parse_geometry(srs) and calculate_nearest(active_turbine, active_tower), passing through the arguments to this function.

Parameters

active_turbine_ids (list) – List of IDs of turbines to consider.
active_tower_ids (list) – List of IDs of met towers to consider.
srs (str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.

Returns: None: Sets asset ‘geometry’, ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.

ReanalysisData

class operational_analysis.types.reanalysis.ReanalysisData(engine='pandas')[source]

Bases: object

This class houses the different reanalysis data products and their related funcitons for use in the PRUF OA code. ReanalysisData holds an array of TimeseriesTable in the _product attribute. The keys (names) of these attributes can be found in the _products attribute.