Project Data
New data imported into the operational_analysis toolkit can take advantage of the data structures in the types module. The base class is operational_analysis.types.PlantData, which represents all known pieces of data about a given wind plant.
Schemas
Operational Data
PlantData.scada
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
id |
string |
power_kw |
float64 |
windspeed_ms |
float64 |
winddirection_deg |
float64 |
status_label |
string |
pitch_deg |
float64 |
temp_c |
float64 |
PlantData.meter
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
power_kw |
float64 |
energy_kw |
float64 |
PlantData.tower
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
id |
float64 |
PlantData.curtail
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
curtailment_pct |
float64 |
availability_pct |
float64 |
net_energy |
float64 |
PlantData.status
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
id |
string |
status_id |
int64 |
status_code |
int64 |
status_text |
string |
PlantData.asset
Field Name |
Data Type |
|---|---|
id |
string |
latitude |
float64 |
longitude |
float64 |
rated_power_kw |
float64 |
type |
string |
Reanalysis Products
Reanalysis products are included as Plant Data objects and, regardless of data source, have a standardized set of field names and types (see below). That said, the data sources are obviously different, as are the methods use to calculate these standard fields from the raw datasets. These methods are described here.
PlantData.reanalysis.product[“merra2”]
MERRA-2 data are based on the single-level diagnostic data available here:
https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_V5.12.4/summary?keywords=%22MERRA-2%22
Wind speed and direction are taken directly from the diagnostic 50-m u- and v-wind fields provided in this dataset. Air density at 50m is calculated using temperature and pressure estimations at 50m and the ideal gas law. Temperature at 50m is estimated by taking the 10-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 50m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
windspeed_ms |
float64 |
winddirection_deg |
float64 |
rho_kgm-3 |
float64 |
PlantData.reanalysis.product[“ncep2”]
NCEP-2 data are based on the single-level diagnostic data available here:
https://rda.ucar.edu/datasets/ds091.0/
Wind speed and direction are taken directly from the diagnostic 10-m u- and v-wind fields provided in this dataset. Air density at 10m is calculated using temperature and pressure estimations at 10m and the ideal gas law. Temperature at 10m is estimated by taking the 2-m temperature data provided by this dataset and assuming a constant lapse rate of -9.8 degrees Celsius per vertical kilometer. Pressure at 10m is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
windspeed_ms |
float64 |
winddirection_deg |
float64 |
rho_kgm-3 |
float64 |
PlantData.reanalysis.product[“erai”]
ERA-interim data are based on the model-level data available here:
https://rda.ucar.edu/datasets/ds627.0/
Model levels are based on sigma coordinates (i.e. fractions of surface pressure). From this dataset, we extract temperature, u-wind, and v-wind at the 58th model level, which is on average about 72m above ground level (https://www.ecmwf.int/en/forecasts/documentation-and-support/60-model-levels). We also extract surface pressure data. Air density at the 58th model level is calculated using temperature data extracted at that level and an estimation of pressure at that level using the ideal gas law. Pressure at the 58th model level is extrapolated from surface pressure data provided in this dataset using the hypsometric equation.
Field Name |
Data Type |
|---|---|
time |
datetime64[ns] |
windspeed_ms |
float64 |
winddirection_deg |
float64 |
rho_kgm-3 |
float64 |
PlantData
- class operational_analysis.types.plant.PlantData(path, name, engine='pandas', toolkit=['pruf_analysis'], schema=None)[source]
Bases:
objectData object for operational wind plant data.
- This class holds references to all tables associated with a wind plant. The tables are grouped by type:
PlantData.scada
PlantData.meter
PlantData.tower
PlantData.status
PlantData.curtail
PlantData.asset
PlantData.reanalysis
- Each table must have columns following the following convention:
The PlantData object can serialize all of these structures and reload them them from the cache as needed.
The underlying datastructure is a TimeseriesTable, which is agnostic to the underlying engine and can be implemented with Pandas, Spark, or Dask (for instance).
Individual plants will extend this object with their own prepare() and other methods.
Create a plant data object without loading any data.
- Parameters
path (string) – path where data should be read/written
name (string) – uniqiue name for this plant in case there’s multiple plant’s data in the directory
engine (string) – backend engine - pandas, spark or dask
toolkit (list) – the _tool_classes attribute defines a list of toolkit modules that can be loaded
- Returns
New object
- amend_std(dfname, new_fields)[source]
Amend a dataframe standard with new or changed fields. Consider running ensure_columns afterward to automatically create the new required columns if they don’t exist.
- Parameters
dfname (string) – one of scada, status, curtail, etc.
new_fields (dict) – set of new fields and types in the same format as _scada_std to be added/changed in
the std
- Returns
New data field standard
- ensure_columns()[source]
@deprecated Ensure all dataframes contain necessary columns and format as needed
- get_time_range()[source]
Get time range as tuple
- Returns
start_time(datetime): start time stop_time(datetime): stop time
- Return type
(tuple)
- load(path=None)[source]
Load this project and all associated data from a file path
- Parameters
path (string) – Location of plant data directory. Defaults to self._path
- Returns
(None)
- save(path=None)[source]
Save out the project and all JSON serializeable attributes to a file path.
- Parameters
path (string) – Location of new directory into which plant will be saved. The directory should not
already exist. Defaults to self._path
- Returns
(None)
AssetData
- class operational_analysis.types.asset.AssetData(engine='pandas')[source]
Bases:
objectThis class wraps around a Pandas dataframe that contains metadata about the plant assets. It provides some useful functions to work with this data (e.g., calculating nearest neighbors, etc.).
- calculate_nearest(active_turbine_ids, active_tower_ids)[source]
Create or overwrite a column called ‘nearest_turbine_id’ or ‘nearest_tower_id’ which contains the asset id of the closest active turbine or tower to the closest turbine or tower. The columns are only valid for turbines or towers listed in the parameters of this function, and it will only calculate the value of the correct column for each asset. Turbines, for example, will have null ‘nearest_tower_id’ and vice versa.
- Parameters
active_turbine_ids (
list) – List of IDs of turbines to consider.active_tower_ids (
list) – List of IDs of met towers to consider.
- Returns: None
Sets asset ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.
- parse_geometry(srs='epsg:4326', zone=None, longitude=None)[source]
Calculate UTM coordinates from latitude/longitude.
The UTM system divides the Earth into 60 zones, each 6deg of longitude in width. Zone 1 covers longitude 180deg to 174deg W; zone numbering increases eastward to zone 60, which covers longitude 174deg E to 180deg. The polar regions south of 80deg S and north of 84deg N are excluded.
Ref: http://geopandas.org/projections.html
- Parameters
srs (
str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.zone (
int, optional) – UTM zone. If set to None (default), then calculated from the longitude.longitude (
float, optional) – Reference longitude for calculating the UTM zone. If None (default), then taken as the average longitude of all assets.
- Returns: None
Sets asset ‘geometry’ column.
- prepare(active_turbine_ids, active_tower_ids, srs='epsg:4326')[source]
Prepare the asset data frame for further analysis work. Currently, this function calls parse_geometry(srs) and calculate_nearest(active_turbine, active_tower), passing through the arguments to this function.
- Parameters
active_turbine_ids (
list) – List of IDs of turbines to consider.active_tower_ids (
list) – List of IDs of met towers to consider.srs (
str, optional) – Used to define the coordinate reference system (CRS). Defaults to the European Petroleum Survey Group (EPSG) code 4326 to be used with the World Geodetic System reference system, WGS 84.
- Returns: None
Sets asset ‘geometry’, ‘nearest_turbine_id’ and ‘nearest_tower_id’ column.
ReanalysisData
- class operational_analysis.types.reanalysis.ReanalysisData(engine='pandas')[source]
Bases:
objectThis class houses the different reanalysis data products and their related funcitons for use in the PRUF OA code. ReanalysisData holds an array of TimeseriesTable in the _product attribute. The keys (names) of these attributes can be found in the _products attribute.