Analysis Methods

Analysis Methods are a collection of modules and classes to accomplish high level analysis of plant data. They integrate the more abstract toolkit modules with specific PlantData classes to create a reproducible process. Analysis modules may respect the engine parameter of PlantData objects to support multi-platform, scalable analysis.

Plant Level Analysis

class operational_analysis.methods.plant_analysis.MonteCarloAEP(*args, **kwargs)[source]

Bases: object

A serial (Pandas-driven) implementation of the benchmark PRUF operational analysis implementation. This module collects standard processing and analysis methods for estimating plant level operational AEP and uncertainty.

The preprocessing should run in this order:

Process revenue meter energy - creates monthly/daily data frame, gets revenue meter on monthly/daily basis, and adds data flag
Process loss estimates - add monthly/daily curtailment and availabilty losses to monthly/daily data frame
Process reanalysis data - add monthly/daily density-corrected wind speeds, temperature (if used) and wind direction (if used) from several reanalysis products to the monthly data frame
Set up Monte Carlo - create the necessary Monte Carlo inputs to the OA process
Run AEP Monte Carlo - run the OA process iteratively to get distribution of AEP results

The end result is a distribution of AEP results which we use to assess expected AEP and associated uncertainty

Initialize APE_MC analysis with data and parameters.

Parameters

plant (PlantData object) – PlantData object from which PlantAnalysis should draw data.
reanal_products(obj – list) : List of reanalysis products to use for Monte Carlo sampling. Defaults to [“merra2”, “ncep2”, “erai”].
uncertainty_meter (float) – uncertainty on revenue meter data
uncertainty_losses (float) – uncertainty on long-term losses
uncertainty_windiness (tuple) – number of years to use for the windiness correction
uncertainty_loss_max (tuple) – threshold for the combined availabilty and curtailment monthly loss threshold
uncertainty_nan_energy (float) – threshold to flag days/months based on NaNs
time_resolution (string) – whether to perform the AEP calculation at monthly (‘M’), daily (‘D’) or hourly (‘H’) time resolution
reg_model (string) – which model to use for the regression (‘lin’ for linear, ‘gam’, ‘gbm’, ‘etr’). At monthly time resolution only linear regression is allowed because of the reduced number of data points.
ml_setup_kwargs (kwargs) – keyword arguments to MachineLearningSetup class
reg_temperature (bool) – whether to include temperature (True) or not (False) as regression input
reg_winddirection (bool) – whether to include wind direction (True) or not (False) as regression input

calculate_aggregate_dataframe(*args, **kwargs)

Perform pre-processing of the plant data to produce a monthly/daily data frame to be used in AEP analysis.

Parameters: (None)
Returns: (None)

calculate_long_term_losses(*args, **kwargs)

This function calculates long-term availability and curtailment losses based on the reported data grouped by the time resolution, filtering for those data that are deemed representative of average plant performance.

Parameters: (None)
Returns: (None)

filter_outliers(*args, **kwargs)

This function filters outliers based on a combination of range filter, unresponsive sensor filter, and window filter. We use a memoized funciton to store the regression data in a dictionary for each combination as it comes up in the Monte Carlo simulation. This saves significant computational time in not having to run robust linear regression for each Monte Carlo iteration

Parameters: n (float) – Monte Carlo iteration
Returns: Filtered monthly/daily data ready for linear regression
Return type: pandas.DataFrame

groupby_time_res(*args, **kwargs)

Group pandas dataframe based on the time resolution chosen in the calculation.

Parameters: df (dataframe) – dataframe that needs to be grouped based on time resolution used
Returns: None

plot_aep_boxplot(param, lab)[source]

Plot box plots of AEP results sliced by a specified Monte Carlo parameter

Parameters

param (list) – The Monte Carlo parameter on which to split the AEP results
lab (str) – The name to use for the parameter when producing the figure

Returns

(none)

plot_aggregate_plant_data_timeseries()[source]

Plot timeseries of monthly/daily gross energy, availability and curtailment

Returns: matplotlib.pyplot object

plot_reanalysis_gross_energy_data(outlier_thres)[source]

Make a plot of normalized 30-day gross energy vs wind speed for each reanalysis product, include R2 measure

Parameters: outlier_thres (float) – outlier threshold (typical range of 1 to 4) which adjusts outlier sensitivity detection
Returns: matplotlib.pyplot object

plot_reanalysis_normalized_rolling_monthly_windspeed()[source]

Make a plot of annual average wind speeds from reanalysis data to show general trends for each Highlight the period of record for plant data

Returns: matplotlib.pyplot object

plot_result_aep_distributions()[source]

Plot a distribution of AEP values from the Monte-Carlo OA method

Returns: matplotlib.pyplot object

process_loss_estimates(*args, **kwargs)

Append availability and curtailment losses to monthly data frame

Parameters: (None)
Returns: (None)

process_reanalysis_data(*args, **kwargs)

Process reanalysis data for use in PRUF plant analysis:

calculate density-corrected wind speed and wind components
get monthly/daily average wind speeds and components
calculate monthly/daily average wind direction
calculate monthly/daily average temperature
append monthly/daily averages to monthly/daily energy data frame

Parameters: (None)
Returns: (None)

process_revenue_meter_energy(*args, **kwargs)

Initial creation of monthly data frame:

Populate monthly/daily data frame with energy data summed from 10-min QC’d data
For each monthly/daily value, find percentage of NaN data used in creating it and flag if percentage is greater than 0

Parameters: (None)
Returns: (None)

run(*args, **kwargs)

Perform pre-processing of data into an internal representation for which the analysis can run more quickly.

Parameters

reanal_subset (list) – list of str data indicating which reanalysis products to use in OA
num_sim (int) – number of simulations to perform

Returns

None

run_AEP_monte_carlo(*args, **kwargs)

Loop through OA process a number of times and return array of AEP results each time

Returns: numpy.ndarray Array of AEP, long-term avail, long-term curtailment calculations

run_regression(*args, **kwargs)

Run robust linear regression between Monte-Carlo generated monthly/daily gross energy, wind speed, temperature and wind direction (if used)

Parameters: n (int) – The Monte Carlo iteration number
Returns: trained regression model
Return type: ?

sample_long_term_losses(*args, **kwargs)

This function calculates long-term availability and curtailment losses based on the Monte Carlo sampled historical availability and curtailment data. To estimate long-term losses, average percentage monthly losses are weighted by monthly long-term gross energy.

Parameters: gross_lt (pandas.Series) – Time series of long-term gross energy
Returns: long-term availability loss expressed as fraction float: long-term curtailment loss expressed as fraction
Return type: float

sample_long_term_reanalysis(*args, **kwargs)

This function returns the long-term monthly/daily wind speeds based on the Monte-Carlo generated sample of:

The reanalysis product

The number of years to use in the long-term correction

Parameters: (None)
Returns: the windiness-corrected or ‘long-term’ monthly/daily wind speeds
Return type: pandas.DataFrame

set_regression_data(*args, **kwargs)

This will be called for each iteration of the Monte Carlo simulation and will do the following:

Randomly sample monthly/daily revenue meter, availabilty, and curtailment data based on specified uncertainties and correlations

Randomly choose one reanalysis product

Calculate gross energy from randomzied energy data

Normalize gross energy to 30-day months

Filter results to remove months/days with NaN data and with combined losses that exceed the Monte Carlo sampled max threhold

Return the wind speed and normalized gross energy to be used in the regression relationship

Parameters: n (int) – The Monte Carlo iteration number
Returns: Monte-Carlo sampled wind speeds and other variables (temperature, wind direction) if used in the regression pandas.Series: Monte-Carlo sampled normalized gross energy
Return type: pandas.Series

setup_monte_carlo_inputs()[source]

Create and populate the data frame defining the simulation parameters. This data frame is stored as self._inputs

Parameters: (None)
Returns: (None)

trim_monthly_df(*args, **kwargs)

Remove first and/or last month of data if the raw data had an incomplete number of days

Parameters: (None)
Returns: (None)

operational_analysis.methods.plant_analysis.get_annual_values(data)[source]

This function returns annual summations of values in a pandas Series (or each column of a pandas DataFrame) with a DatetimeIndex index starting from the first row. The purpose of the function is to correctly resample to annual values when the first index does not fall on the beginning of the month.

Parameters: data (pandas.Series or pandas.DataFrame) – Input data with a DatetimeIndex index.
Returns: Array containing annual summations for each column of the input data.
Return type: numpy.ndarray

Turbine Level Analysis

class operational_analysis.methods.turbine_long_term_gross_energy.TurbineLongTermGrossEnergy(*args, **kwargs)[source]

Bases: object

A serial (Pandas-driven) implementation of calculating long-term gross energy for each turbine in a wind farm. This module collects standard processing and analysis methods for estimating this metric.

The method proceeds as follows:

Filter turbine data for normal operation

Calculate daily means of wind speed, wind direction, and air density from reanalysis products

Calculate daily sums of energy from each turbine

Fit daily data (features are atmospheric variables, response is turbine power) using a generalized additive model (GAM)

Apply model results to long-term atmospheric varaibles to calculate long term gross energy for each turbine

A Monte Carlo approach is implemented to repeat the procedure multiple times to get a distribution of results, from which deriving uncertainty quantification for the long-term gross energy estimate.

The end result is a table of long-term gross energy values for each turbine in the wind farm. Note that this gross energy metric does not back out losses associated with waking or turbine performance. Rather, gross energy in this context is what turbine would have produced under normal operation (i.e. excluding downtime and underperformance).

Required schema of PlantData:

_scada_freq

reanalysis products [‘merra2’, ‘erai’, ‘ncep2’] with columns [‘time’, ‘u_ms’, ‘v_ms’, ‘windspeed_ms’, ‘rho_kgm-3’]

scada with columns: [‘time’, ‘id’, ‘wmet_wdspd_avg’, ‘wtur_W_avg’, ‘energy_kwh’]

Initialize turbine long-term gross energy analysis with data and parameters.

Parameters

plant (PlantData object) – PlantData object from which TurbineLongTermGrossEnergy should draw data.
UQ – (bool): choice whether to perform (‘Y’) or not (‘N’) uncertainty quantification
num_sim – (int): number of Monte Carlo simulations. Please note that this script is somewhat computationally heavy so the default num_sim value has been adjusted accordingly.

apply_model_to_lt(n)[source]

Apply model result to the long-term reanalysis data to calculate long-term gross energy for each turbine.

Parameters: n (int) – The Monte Carlo iteration number
Returns: (None)

filter_sum_impute_scada()[source]

Filter SCADA data for unflagged data, gather SCADA energy data into daily sums, and correct daily summed energy based on amount of missing data and a threshold limit. Finally impute missing data for each turbine based on reported energy data from other highly correlated turbines. threshold

Parameters: n (int) – The Monte Carlo iteration number
Returns: (None)

filter_turbine_data()[source]

Apply a set of filtering algorithms to the turbine wind speed vs power curve to flag data not representative of normal turbine operation

Parameters: n (int) – The Monte Carlo iteration number
Returns: (None)

fit_model()[source]

Fit the daily turbine energy sum and atmospheric variable averages using a GAM model

Parameters: n (int) – The Monte Carlo iteration number
Returns: (None)

plot_daily_fitting_result(save_folder, output_to_terminal=False)[source]

Plot the raw and flagged power curve data and save to file.

Parameters

save_folder(‘obj’ – ‘str’): The pathname to where figure files should be saved
output_to_terminal(‘obj’ – ‘boolean’): Indicate whether or not to output figures to terminal

Returns

(None)

plot_filtered_power_curves(save_folder, output_to_terminal=False)[source]

Plot the raw and flagged power curve data and save to file.

Parameters

save_folder(‘obj’ – ‘str’): The pathname to where figure files should be saved
output_to_terminal(‘obj’ – ‘boolean’): Indicate whether or not to output figures to terminal

Returns

(None)

run(*args, **kwargs)

Perform pre-processing of data into an internal representation for which the analysis can run more quickly.

Parameters

reanal_subset (list) – Which reanalysis products to use for long-term correction
uncertainty_scada (float) – uncertainty imposed to scada data (used in UQ = True case only)
max_power_filter (tuple) – Maximum power threshold (fraction) to which the bin filter should be applied (default 0.85). This should be a tuple in the UQ = True case, a single value when UQ = False.
wind_bin_thresh (tuple) – The filter threshold for each bin (default is 2 m/s). This should be a tuple in the UQ = True case, a single value when UQ = False.
correction_threshold (tuple) – The threshold (fraction) above which daily scada energy data hould be corrected (default is 0.90). This should be a tuple in the UQ = True case, a single value when UQ = False.
enable_plotting (boolean) – Indicate whether to output plots
plot_dir (string) – Location to save figures

Returns

(None)

setup_daily_reanalysis_data()[source]

Process reanalysis data to daily means for later use in the GAM model

Parameters: (None)
Returns: (None)

setup_inputs()[source]

Create and populate the data frame defining the simulation parameters. This data frame is stored as self._inputs

Parameters: (None)
Returns: (None)

setup_model_dict()[source]

Setup daily atmospheric variable averages and daily turbine energy sums for use in the GAM model

Parameters: (None)
Returns: (None)

sort_scada_by_turbine()[source]

Take raw SCADA data in plant object and sort into a dictionary using turbine IDs.

Parameters: (None)
Returns: (None)

Electrical Losses Analysis

class operational_analysis.methods.electrical_losses.ElectricalLosses(*args, **kwargs)[source]

Bases: object

A serial (Pandas-driven) implementation of calculating the average monthly and annual electrical losses at a wind plant, and their uncertainty. Energy output from the turbine SCADA meter and the wind plant revenue meter are used to estimate electrical losses.

The approach is to first calculate daily sums of turbine and revenue meter energy over the plant period of record. Only those days where all turbines and the revenue meter were reporting for all timesteps are considered. Electrical loss is then the difference in total turbine energy production and meter production over those concurrent days.

A Monte Carlo approach is applied to sample revenue meter data and SCADA data with a 0.5% imposed uncertainty, and one filtering parameter is sampled too. The uncertainty in estimated electrical losses is quantified as standard deviation of the distribution of losses obtained from the MC sampling.

In the case that meter data is not provided on a daily or sub-daily basis (e.g. monthly), a different approach is implemented. The sum of daily turbine energy is corrected for any missing reported energy data from the turbines based on the ratio of expected number of data counts per day to the actual. Daily corrected sum of turbine energy is then summed on a monthly basis. Electrical loss is then the difference between total corrected turbine energy production and meter production over those concurrent months.

Initialize electrical losses class with input parameters

Parameters

plant (PlantData object) – PlantData object from which EYAGapAnalysis should draw data.
num_sim – (int): number of Monte Carlo simulations
UQ – (bool): choice whether to perform (True) or not (False) uncertainty quantification

calculate_electrical_losses(*args, **kwargs)

Apply Monte Carlo approach to calculate electrical losses and their uncertainty based on the difference in the sum of turbine and metered energy over the compiled days.

Parameters: (None)
Returns: (None)

process_meter(*args, **kwargs)

Calculate daily sum of meter energy only for days when meter data is reporting at all time steps.

Parameters: (None)
Returns: (None)

process_scada(*args, **kwargs)

Calculate daily sum of turbine energy only for days when all turbines are reporting at all time steps.

Parameters: (None)
Returns: (None)

run(*args, **kwargs)

Run the electrical loss calculation in order by calling this function.

Parameters

uncertainty_meter (float) – uncertainty imposed to revenue meter data (for UQ = True case)
uncertainty_scada (float) – uncertainty imposed to scada data (for UQ = True case)
uncertainty_correction_thresh (tuple) – Data availability thresholds (fractions) under which months should be eliminated. This should be a tuple in the UQ = True case, a single value when UQ = False.

Returns

(None)

setup_inputs()[source]

Create and populate the data frame defining the simulation parameters. This data frame is stored as self._inputs

Parameters: (None)
Returns: (None)