Baselines¶
Baseline (Interface)¶
-
class
meta_policy_search.baselines.
Baseline
[source]¶ Reward baseline interface
-
fit
(paths)[source]¶ Fits the baseline model with the provided paths
Parameters: paths – list of paths
-
log_diagnostics
(paths, prefix)[source]¶ Log extra information per iteration based on the collected paths
-
predict
(path)[source]¶ Predicts the reward baselines for a provided trajectory / path
Parameters: path – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, … Returns: numpy array of the same length as paths[“observations”] specifying the reward baseline
-
Linear Feature Baseline¶
-
class
meta_policy_search.baselines.
LinearFeatureBaseline
(reg_coeff=1e-05)[source]¶ Linear (polynomial) time-state dependent return baseline model (see. Duan et al. 2016, “Benchmarking Deep Reinforcement Learning for Continuous Control”, ICML)
Fits the following linear model
reward = b0 + b1*obs + b2*obs^2 + b3*t + b4*t^2+ b5*t^3
Parameters: reg_coeff – list of paths -
fit
(paths, target_key='returns')¶ Fits the linear baseline model with the provided paths via damped least squares
Parameters: - paths (list) – list of paths
- target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)
-
get_param_values
(**tags)¶ Returns the parameter values of the baseline object
Returns: numpy array of linear_regression coefficients
-
log_diagnostics
(paths, prefix)¶ Log extra information per iteration based on the collected paths
-
predict
(path)¶ Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline
Parameters: path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, … Returns: numpy array of the same length as paths[“observations”] specifying the reward baseline Return type: (np.ndarray)
-
set_params
(value, **tags)¶ Sets the parameter values of the baseline object
Parameters: value – numpy array of linear_regression coefficients
-
LinearTimeBaseline¶
-
class
meta_policy_search.baselines.
LinearTimeBaseline
(reg_coeff=1e-05)[source]¶ Linear (polynomial) time-dependent reward baseline model
Fits the following linear model
reward = b0 + b3*t + b4*t^2+ b5*t^3
Parameters: reg_coeff – list of paths -
fit
(paths, target_key='returns')¶ Fits the linear baseline model with the provided paths via damped least squares
Parameters: - paths (list) – list of paths
- target_key (str) – path dictionary key of the target that shall be fitted (e.g. “returns”)
-
get_param_values
(**tags)¶ Returns the parameter values of the baseline object
Returns: numpy array of linear_regression coefficients
-
log_diagnostics
(paths, prefix)¶ Log extra information per iteration based on the collected paths
-
predict
(path)¶ Abstract Class for the LinearFeatureBaseline and the LinearTimeBaseline Predicts the linear reward baselines estimates for a provided trajectory / path. If the baseline is not fitted - returns zero baseline
Parameters: path (dict) – dict of lists/numpy array containing trajectory / path information such as “observations”, “rewards”, … Returns: numpy array of the same length as paths[“observations”] specifying the reward baseline Return type: (np.ndarray)
-
set_params
(value, **tags)¶ Sets the parameter values of the baseline object
Parameters: value – numpy array of linear_regression coefficients
-