Optimizers¶
Conjugate Gradient Optimizer¶
-
class
meta_policy_search.optimizers.
ConjugateGradientOptimizer
(cg_iters=10, reg_coeff=0, subsample_factor=1.0, backtrack_ratio=0.8, max_backtracks=15, debug_nan=False, accept_violation=False, hvp_approach=<meta_policy_search.optimizers.conjugate_gradient_optimizer.FiniteDifferenceHvp object>)[source]¶ Bases:
meta_policy_search.optimizers.base.Optimizer
Performs constrained optimization via line search. The search direction is computed using a conjugate gradient algorithm, which gives x = A^{-1}g, where A is a second order approximation of the constraint and g is the gradient of the loss function.
Parameters: - cg_iters (int) – The number of conjugate gradients iterations used to calculate A^-1 g
- reg_coeff (float) – A small value so that A -> A + reg*I
- subsample_factor (float) – Subsampling factor to reduce samples when using “conjugate gradient. Since the computation time for the descent direction dominates, this can greatly reduce the overall computation time.
- backtrack_ratio (float) – ratio for decreasing the step size for the line search
- max_backtracks (int) – maximum number of backtracking iterations for the line search
- debug_nan (bool) – if set to True, NanGuard will be added to the compilation, and ipdb will be invoked when nan is detected
- accept_violation (bool) – whether to accept the descent step if it violates the line search condition after exhausting all backtracking budgets
- hvp_approach (obj) – Hessian vector product approach
-
build_graph
(loss, target, input_ph_dict, leq_constraint)[source]¶ Sets the objective function and target weights for the optimize function
Parameters: - loss (tf_op) – minimization objective
- target (Policy) – Policy whose values we are optimizing over
- inputs (list) – tuple of tf.placeholders for input data which may be subsampled. The first dimension corresponds to the number of data points
- extra_inputs (list) – tuple of tf.placeholders for hyperparameters (e.g. learning rate, if annealed)
- leq_constraint (tuple) – A constraint provided as a tuple (f, epsilon), of the form f(*inputs) <= epsilon.
-
constraint_val
(input_val_dict)[source]¶ Computes the value of the KL-divergence between pre-update policies for given inputs
Parameters: - inputs (list) – inputs needed to compute the inner KL
- extra_inputs (list) – additional inputs needed to compute the inner KL
Returns: value of the loss
Return type: (float)
-
gradient
(input_val_dict)[source]¶ Computes the gradient of the loss function
Parameters: - inputs (list) – inputs needed to compute the gradient
- extra_inputs (list) – additional inputs needed to compute the loss function
Returns: flattened gradient
Return type: (np.ndarray)
MAML First Order Optimizer¶
-
class
meta_policy_search.optimizers.
MAMLFirstOrderOptimizer
(tf_optimizer_cls=<class 'tensorflow.python.training.adam.AdamOptimizer'>, tf_optimizer_args=None, learning_rate=0.001, max_epochs=1, tolerance=1e-06, num_minibatches=1, verbose=False)[source]¶ Bases:
meta_policy_search.optimizers.base.Optimizer
Optimizer for first order methods (SGD, Adam)
Parameters: - tf_optimizer_cls (tf.train.optimizer) – desired tensorflow optimzier for training
- tf_optimizer_args (dict or None) – arguments for the optimizer
- learning_rate (float) – learning rate
- max_epochs – number of maximum epochs for training
- tolerance (float) – tolerance for early stopping. If the loss fucntion decreases less than the specified tolerance
- an epoch, then the training stops. (after) –
- num_minibatches (int) – number of mini-batches for performing the gradient step. The mini-batch size is
- size//num_minibatches. (batch) –
- verbose (bool) – Whether to log or not the optimization process
-
build_graph
(loss, target, input_ph_dict)[source]¶ Sets the objective function and target weights for the optimize function
Parameters: - loss (tf_op) – minimization objective
- target (Policy) – Policy whose values we are optimizing over
- input_ph_dict (dict) – dict containing the placeholders of the computation graph corresponding to loss