Analysis Methods
Analysis Methods are a collection of modules and classes to accomplish high level analysis of plant data. They integrate the more abstract toolkit modules with specific PlantData classes to create a reproducible process. Analysis modules may respect the engine parameter of PlantData objects to support multi-platform, scalable analysis.
Plant Level Analysis
- class operational_analysis.methods.plant_analysis.MonteCarloAEP(*args, **kwargs)[source]
Bases:
objectA serial (Pandas-driven) implementation of the benchmark PRUF operational analysis implementation. This module collects standard processing and analysis methods for estimating plant level operational AEP and uncertainty.
- The preprocessing should run in this order:
Process revenue meter energy - creates monthly/daily data frame, gets revenue meter on monthly/daily basis, and adds data flag
Process loss estimates - add monthly/daily curtailment and availabilty losses to monthly/daily data frame
Process reanalysis data - add monthly/daily density-corrected wind speeds, temperature (if used) and wind direction (if used) from several reanalysis products to the monthly data frame
Set up Monte Carlo - create the necessary Monte Carlo inputs to the OA process
Run AEP Monte Carlo - run the OA process iteratively to get distribution of AEP results
The end result is a distribution of AEP results which we use to assess expected AEP and associated uncertainty
Initialize APE_MC analysis with data and parameters.
- Parameters
plant (
PlantData object) – PlantData object from which PlantAnalysis should draw data.reanal_products(obj – list) : List of reanalysis products to use for Monte Carlo sampling. Defaults to [“merra2”, “ncep2”, “erai”].
uncertainty_meter (
float) – uncertainty on revenue meter datauncertainty_losses (
float) – uncertainty on long-term lossesuncertainty_windiness (
tuple) – number of years to use for the windiness correctionuncertainty_loss_max (
tuple) – threshold for the combined availabilty and curtailment monthly loss thresholduncertainty_nan_energy (
float) – threshold to flag days/months based on NaNstime_resolution (
string) – whether to perform the AEP calculation at monthly (‘M’), daily (‘D’) or hourly (‘H’) time resolutionreg_model (
string) – which model to use for the regression (‘lin’ for linear, ‘gam’, ‘gbm’, ‘etr’). At monthly time resolution only linear regression is allowed because of the reduced number of data points.ml_setup_kwargs (
kwargs) – keyword arguments to MachineLearningSetup classreg_temperature (
bool) – whether to include temperature (True) or not (False) as regression inputreg_winddirection (
bool) – whether to include wind direction (True) or not (False) as regression input
- calculate_aggregate_dataframe(*args, **kwargs)
Perform pre-processing of the plant data to produce a monthly/daily data frame to be used in AEP analysis.
- Parameters
(None)
- Returns
(None)
- calculate_long_term_losses(*args, **kwargs)
This function calculates long-term availability and curtailment losses based on the reported data grouped by the time resolution, filtering for those data that are deemed representative of average plant performance.
- Parameters
(None)
- Returns
(None)
- filter_outliers(*args, **kwargs)
This function filters outliers based on a combination of range filter, unresponsive sensor filter, and window filter. We use a memoized funciton to store the regression data in a dictionary for each combination as it comes up in the Monte Carlo simulation. This saves significant computational time in not having to run robust linear regression for each Monte Carlo iteration
- Parameters
n (
float) – Monte Carlo iteration- Returns
Filtered monthly/daily data ready for linear regression
- Return type
pandas.DataFrame
- groupby_time_res(*args, **kwargs)
Group pandas dataframe based on the time resolution chosen in the calculation.
- Parameters
df (
dataframe) – dataframe that needs to be grouped based on time resolution used- Returns
None
- plot_aep_boxplot(param, lab)[source]
Plot box plots of AEP results sliced by a specified Monte Carlo parameter
- Parameters
param (
list) – The Monte Carlo parameter on which to split the AEP resultslab (
str) – The name to use for the parameter when producing the figure
- Returns
(none)
- plot_aggregate_plant_data_timeseries()[source]
Plot timeseries of monthly/daily gross energy, availability and curtailment
- Returns
matplotlib.pyplot object
- plot_reanalysis_gross_energy_data(outlier_thres)[source]
Make a plot of normalized 30-day gross energy vs wind speed for each reanalysis product, include R2 measure
- Parameters
outlier_thres (float) – outlier threshold (typical range of 1 to 4) which adjusts outlier sensitivity detection
- Returns
matplotlib.pyplot object
- plot_reanalysis_normalized_rolling_monthly_windspeed()[source]
Make a plot of annual average wind speeds from reanalysis data to show general trends for each Highlight the period of record for plant data
- Returns
matplotlib.pyplot object
- plot_result_aep_distributions()[source]
Plot a distribution of AEP values from the Monte-Carlo OA method
- Returns
matplotlib.pyplot object
- process_loss_estimates(*args, **kwargs)
Append availability and curtailment losses to monthly data frame
- Parameters
(None)
- Returns
(None)
- process_reanalysis_data(*args, **kwargs)
- Process reanalysis data for use in PRUF plant analysis:
calculate density-corrected wind speed and wind components
get monthly/daily average wind speeds and components
calculate monthly/daily average wind direction
calculate monthly/daily average temperature
append monthly/daily averages to monthly/daily energy data frame
- Parameters
(None)
- Returns
(None)
- process_revenue_meter_energy(*args, **kwargs)
- Initial creation of monthly data frame:
Populate monthly/daily data frame with energy data summed from 10-min QC’d data
For each monthly/daily value, find percentage of NaN data used in creating it and flag if percentage is greater than 0
- Parameters
(None)
- Returns
(None)
- run(*args, **kwargs)
Perform pre-processing of data into an internal representation for which the analysis can run more quickly.
- Parameters
reanal_subset (
list) – list of str data indicating which reanalysis products to use in OAnum_sim (
int) – number of simulations to perform
- Returns
None
- run_AEP_monte_carlo(*args, **kwargs)
Loop through OA process a number of times and return array of AEP results each time
- Returns
numpy.ndarrayArray of AEP, long-term avail, long-term curtailment calculations
- run_regression(*args, **kwargs)
Run robust linear regression between Monte-Carlo generated monthly/daily gross energy, wind speed, temperature and wind direction (if used)
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
trained regression model
- Return type
?
- sample_long_term_losses(*args, **kwargs)
This function calculates long-term availability and curtailment losses based on the Monte Carlo sampled historical availability and curtailment data. To estimate long-term losses, average percentage monthly losses are weighted by monthly long-term gross energy.
- Parameters
gross_lt (
pandas.Series) – Time series of long-term gross energy- Returns
long-term availability loss expressed as fraction
float: long-term curtailment loss expressed as fraction- Return type
float
- sample_long_term_reanalysis(*args, **kwargs)
This function returns the long-term monthly/daily wind speeds based on the Monte-Carlo generated sample of:
The reanalysis product
The number of years to use in the long-term correction
- Parameters
(None)
- Returns
the windiness-corrected or ‘long-term’ monthly/daily wind speeds
- Return type
pandas.DataFrame
- set_regression_data(*args, **kwargs)
This will be called for each iteration of the Monte Carlo simulation and will do the following:
Randomly sample monthly/daily revenue meter, availabilty, and curtailment data based on specified uncertainties and correlations
Randomly choose one reanalysis product
Calculate gross energy from randomzied energy data
Normalize gross energy to 30-day months
Filter results to remove months/days with NaN data and with combined losses that exceed the Monte Carlo sampled max threhold
Return the wind speed and normalized gross energy to be used in the regression relationship
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
Monte-Carlo sampled wind speeds and other variables (temperature, wind direction) if used in the regression
pandas.Series: Monte-Carlo sampled normalized gross energy- Return type
pandas.Series
- setup_monte_carlo_inputs()[source]
Create and populate the data frame defining the simulation parameters. This data frame is stored as self._inputs
- Parameters
(None)
- Returns
(None)
- trim_monthly_df(*args, **kwargs)
Remove first and/or last month of data if the raw data had an incomplete number of days
- Parameters
(None)
- Returns
(None)
- operational_analysis.methods.plant_analysis.get_annual_values(data)[source]
This function returns annual summations of values in a pandas Series (or each column of a pandas DataFrame) with a DatetimeIndex index starting from the first row. The purpose of the function is to correctly resample to annual values when the first index does not fall on the beginning of the month.
- Parameters
data (
pandas.Seriesorpandas.DataFrame) – Input data with a DatetimeIndex index.- Returns
Array containing annual summations for each column of the input data.
- Return type
numpy.ndarray
Turbine Level Analysis
- class operational_analysis.methods.turbine_long_term_gross_energy.TurbineLongTermGrossEnergy(*args, **kwargs)[source]
Bases:
objectA serial (Pandas-driven) implementation of calculating long-term gross energy for each turbine in a wind farm. This module collects standard processing and analysis methods for estimating this metric.
The method proceeds as follows:
Filter turbine data for normal operation
Calculate daily means of wind speed, wind direction, and air density from reanalysis products
Calculate daily sums of energy from each turbine
Fit daily data (features are atmospheric variables, response is turbine power) using a generalized additive model (GAM)
Apply model results to long-term atmospheric varaibles to calculate long term gross energy for each turbine
A Monte Carlo approach is implemented to repeat the procedure multiple times to get a distribution of results, from which deriving uncertainty quantification for the long-term gross energy estimate.
The end result is a table of long-term gross energy values for each turbine in the wind farm. Note that this gross energy metric does not back out losses associated with waking or turbine performance. Rather, gross energy in this context is what turbine would have produced under normal operation (i.e. excluding downtime and underperformance).
Required schema of PlantData:
_scada_freq
reanalysis products [‘merra2’, ‘erai’, ‘ncep2’] with columns [‘time’, ‘u_ms’, ‘v_ms’, ‘windspeed_ms’, ‘rho_kgm-3’]
scada with columns: [‘time’, ‘id’, ‘wmet_wdspd_avg’, ‘wtur_W_avg’, ‘energy_kwh’]
Initialize turbine long-term gross energy analysis with data and parameters.
- Parameters
plant (
PlantData object) – PlantData object from which TurbineLongTermGrossEnergy should draw data.UQ – (
bool): choice whether to perform (‘Y’) or not (‘N’) uncertainty quantificationnum_sim – (
int): number of Monte Carlo simulations. Please note that this script is somewhat computationally heavy so the default num_sim value has been adjusted accordingly.
- apply_model_to_lt(n)[source]
Apply model result to the long-term reanalysis data to calculate long-term gross energy for each turbine.
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
(None)
- filter_sum_impute_scada()[source]
Filter SCADA data for unflagged data, gather SCADA energy data into daily sums, and correct daily summed energy based on amount of missing data and a threshold limit. Finally impute missing data for each turbine based on reported energy data from other highly correlated turbines. threshold
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
(None)
- filter_turbine_data()[source]
Apply a set of filtering algorithms to the turbine wind speed vs power curve to flag data not representative of normal turbine operation
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
(None)
- fit_model()[source]
Fit the daily turbine energy sum and atmospheric variable averages using a GAM model
- Parameters
n (
int) – The Monte Carlo iteration number- Returns
(None)
- plot_daily_fitting_result(save_folder, output_to_terminal=False)[source]
Plot the raw and flagged power curve data and save to file.
- Parameters
save_folder(‘obj’ – ‘str’): The pathname to where figure files should be saved
output_to_terminal(‘obj’ – ‘boolean’): Indicate whether or not to output figures to terminal
- Returns
(None)
- plot_filtered_power_curves(save_folder, output_to_terminal=False)[source]
Plot the raw and flagged power curve data and save to file.
- Parameters
save_folder(‘obj’ – ‘str’): The pathname to where figure files should be saved
output_to_terminal(‘obj’ – ‘boolean’): Indicate whether or not to output figures to terminal
- Returns
(None)
- run(*args, **kwargs)
Perform pre-processing of data into an internal representation for which the analysis can run more quickly.
- Parameters
reanal_subset (
list) – Which reanalysis products to use for long-term correctionuncertainty_scada (
float) – uncertainty imposed to scada data (used in UQ = True case only)max_power_filter (
tuple) – Maximum power threshold (fraction) to which the bin filter should be applied (default 0.85). This should be a tuple in the UQ = True case, a single value when UQ = False.wind_bin_thresh (
tuple) – The filter threshold for each bin (default is 2 m/s). This should be a tuple in the UQ = True case, a single value when UQ = False.correction_threshold (
tuple) – The threshold (fraction) above which daily scada energy data hould be corrected (default is 0.90). This should be a tuple in the UQ = True case, a single value when UQ = False.enable_plotting (
boolean) – Indicate whether to output plotsplot_dir (
string) – Location to save figures
- Returns
(None)
- setup_daily_reanalysis_data()[source]
Process reanalysis data to daily means for later use in the GAM model
- Parameters
(None)
- Returns
(None)
- setup_inputs()[source]
Create and populate the data frame defining the simulation parameters. This data frame is stored as self._inputs
- Parameters
(None)
- Returns
(None)
Electrical Losses Analysis
- class operational_analysis.methods.electrical_losses.ElectricalLosses(*args, **kwargs)[source]
Bases:
objectA serial (Pandas-driven) implementation of calculating the average monthly and annual electrical losses at a wind plant, and their uncertainty. Energy output from the turbine SCADA meter and the wind plant revenue meter are used to estimate electrical losses.
The approach is to first calculate daily sums of turbine and revenue meter energy over the plant period of record. Only those days where all turbines and the revenue meter were reporting for all timesteps are considered. Electrical loss is then the difference in total turbine energy production and meter production over those concurrent days.
A Monte Carlo approach is applied to sample revenue meter data and SCADA data with a 0.5% imposed uncertainty, and one filtering parameter is sampled too. The uncertainty in estimated electrical losses is quantified as standard deviation of the distribution of losses obtained from the MC sampling.
In the case that meter data is not provided on a daily or sub-daily basis (e.g. monthly), a different approach is implemented. The sum of daily turbine energy is corrected for any missing reported energy data from the turbines based on the ratio of expected number of data counts per day to the actual. Daily corrected sum of turbine energy is then summed on a monthly basis. Electrical loss is then the difference between total corrected turbine energy production and meter production over those concurrent months.
Initialize electrical losses class with input parameters
- Parameters
plant (
PlantData object) – PlantData object from which EYAGapAnalysis should draw data.num_sim – (
int): number of Monte Carlo simulationsUQ – (
bool): choice whether to perform (True) or not (False) uncertainty quantification
- calculate_electrical_losses(*args, **kwargs)
Apply Monte Carlo approach to calculate electrical losses and their uncertainty based on the difference in the sum of turbine and metered energy over the compiled days.
- Parameters
(None)
- Returns
(None)
- process_meter(*args, **kwargs)
Calculate daily sum of meter energy only for days when meter data is reporting at all time steps.
- Parameters
(None)
- Returns
(None)
- process_scada(*args, **kwargs)
Calculate daily sum of turbine energy only for days when all turbines are reporting at all time steps.
- Parameters
(None)
- Returns
(None)
- run(*args, **kwargs)
Run the electrical loss calculation in order by calling this function.
- Parameters
uncertainty_meter (
float) – uncertainty imposed to revenue meter data (for UQ = True case)uncertainty_scada (
float) – uncertainty imposed to scada data (for UQ = True case)uncertainty_correction_thresh (
tuple) – Data availability thresholds (fractions) under which months should be eliminated. This should be a tuple in the UQ = True case, a single value when UQ = False.
- Returns
(None)