Baselines¶

Baseline (Interface)¶

class meta_policy_search.baselines.Baseline[source]¶

Reward baseline interface

fit(paths)[source]¶

Fits the baseline model with the provided paths

Parameters:	paths – list of paths

get_param_values()[source]¶: Returns the parameter values of the baseline object

log_diagnostics(paths, prefix)[source]¶: Log extra information per iteration based on the collected paths

predict(path)[source]¶

Predicts the reward baselines for a provided trajectory / path

Parameters:	path – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …

Returns: numpy array of the same length as paths[“observations”] specifying the reward baseline

set_params(value)[source]¶

Sets the parameter values of the baseline object

Parameters:	value – parameter value to be set

Linear Feature Baseline¶

class meta_policy_search.baselines.LinearFeatureBaseline(reg_coeff=1e-05)[source]¶

Linear (polynomial) time-state dependent return baseline model (see. Duan et al. 2016, “Benchmarking Deep Reinforcement Learning for Continuous Control”, ICML)

Fits the following linear model

reward = b0 + b1*obs + b2*obs^2 + b3*t + b4*t^2+ b5*t^3

Parameters:	reg_coeff – list of paths

fit(paths, target_key='returns')¶

Fits the linear baseline model with the provided paths via damped least squares

Parameters:	paths (list) – list of paths target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)

get_param_values(**tags)¶

Returns the parameter values of the baseline object

Returns:	numpy array of linear_regression coefficients

log_diagnostics(paths, prefix)¶: Log extra information per iteration based on the collected paths

predict(path)¶

Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline

Parameters:	path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …
Returns:	numpy array of the same length as paths[“observations”] specifying the reward baseline
Return type:	(np.ndarray)

set_params(value, **tags)¶

Sets the parameter values of the baseline object

Parameters:	value – numpy array of linear_regression coefficients

LinearTimeBaseline¶

class meta_policy_search.baselines.LinearTimeBaseline(reg_coeff=1e-05)[source]¶

Linear (polynomial) time-dependent reward baseline model

Fits the following linear model

reward = b0 + b3*t + b4*t^2+ b5*t^3

Parameters:	reg_coeff – list of paths

fit(paths, target_key='returns')¶

Fits the linear baseline model with the provided paths via damped least squares

Parameters:	paths (list) – list of paths target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)

get_param_values(**tags)¶

Returns the parameter values of the baseline object

Returns:	numpy array of linear_regression coefficients

log_diagnostics(paths, prefix)¶: Log extra information per iteration based on the collected paths

predict(path)¶

Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline

Parameters:	path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, …
Returns:	numpy array of the same length as paths[“observations”] specifying the reward baseline
Return type:	(np.ndarray)

set_params(value, **tags)¶

Sets the parameter values of the baseline object

Parameters:	value – numpy array of linear_regression coefficients