Baselines

Baseline (Interface)

class meta_policy_search.baselines.Baseline[source]

Reward baseline interface

fit(paths)[source]

Fits the baseline model with the provided paths

Parameters:paths – list of paths
get_param_values()[source]

Returns the parameter values of the baseline object

log_diagnostics(paths, prefix)[source]

Log extra information per iteration based on the collected paths

predict(path)[source]

Predicts the reward baselines for a provided trajectory / path

Parameters:path – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …

Returns: numpy array of the same length as paths[“observations”] specifying the reward baseline

set_params(value)[source]

Sets the parameter values of the baseline object

Parameters:value – parameter value to be set

Linear Feature Baseline

class meta_policy_search.baselines.LinearFeatureBaseline(reg_coeff=1e-05)[source]

Linear (polynomial) time-state dependent return baseline model (see. Duan et al. 2016, “Benchmarking Deep Reinforcement Learning for Continuous Control”, ICML)

Fits the following linear model

reward = b0 + b1*obs + b2*obs^2 + b3*t + b4*t^2+ b5*t^3

Parameters:reg_coeff – list of paths
fit(paths, target_key='returns')

Fits the linear baseline model with the provided paths via damped least squares

Parameters:
  • paths (list) – list of paths
  • target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)
get_param_values(**tags)

Returns the parameter values of the baseline object

Returns:numpy array of linear_regression coefficients
log_diagnostics(paths, prefix)

Log extra information per iteration based on the collected paths

predict(path)

Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline

Parameters:path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …
Returns:numpy array of the same length as paths[“observations”] specifying the reward baseline
Return type:(np.ndarray)
set_params(value, **tags)

Sets the parameter values of the baseline object

Parameters:value – numpy array of linear_regression coefficients

LinearTimeBaseline

class meta_policy_search.baselines.LinearTimeBaseline(reg_coeff=1e-05)[source]

Linear (polynomial) time-dependent reward baseline model

Fits the following linear model

reward = b0 + b3*t + b4*t^2+ b5*t^3

Parameters:reg_coeff – list of paths
fit(paths, target_key='returns')

Fits the linear baseline model with the provided paths via damped least squares

Parameters:
  • paths (list) – list of paths
  • target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)
get_param_values(**tags)

Returns the parameter values of the baseline object

Returns:numpy array of linear_regression coefficients
log_diagnostics(paths, prefix)

Log extra information per iteration based on the collected paths

predict(path)

Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline

Parameters:path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …
Returns:numpy array of the same length as paths[“observations”] specifying the reward baseline
Return type:(np.ndarray)
set_params(value, **tags)

Sets the parameter values of the baseline object

Parameters:value – numpy array of linear_regression coefficients