Optimizers

Conjugate Gradient Optimizer

class meta_policy_search.optimizers.ConjugateGradientOptimizer(cg_iters=10, reg_coeff=0, subsample_factor=1.0, backtrack_ratio=0.8, max_backtracks=15, debug_nan=False, accept_violation=False, hvp_approach=<meta_policy_search.optimizers.conjugate_gradient_optimizer.FiniteDifferenceHvp object>)[source]

Bases: meta_policy_search.optimizers.base.Optimizer

Performs constrained optimization via line search. The search direction is computed using a conjugate gradient algorithm, which gives x = A^{-1}g, where A is a second order approximation of the constraint and g is the gradient of the loss function.

Parameters:
  • cg_iters (int) – The number of conjugate gradients iterations used to calculate A^-1 g
  • reg_coeff (float) – A small value so that A -> A + reg*I
  • subsample_factor (float) – Subsampling factor to reduce samples when using “conjugate gradient. Since the computation time for the descent direction dominates, this can greatly reduce the overall computation time.
  • backtrack_ratio (float) – ratio for decreasing the step size for the line search
  • max_backtracks (int) – maximum number of backtracking iterations for the line search
  • debug_nan (bool) – if set to True, NanGuard will be added to the compilation, and ipdb will be invoked when nan is detected
  • accept_violation (bool) – whether to accept the descent step if it violates the line search condition after exhausting all backtracking budgets
  • hvp_approach (obj) – Hessian vector product approach
build_graph(loss, target, input_ph_dict, leq_constraint)[source]

Sets the objective function and target weights for the optimize function

Parameters:
  • loss (tf_op) – minimization objective
  • target (Policy) – Policy whose values we are optimizing over
  • inputs (list) – tuple of tf.placeholders for input data which may be subsampled. The first dimension corresponds to the number of data points
  • extra_inputs (list) – tuple of tf.placeholders for hyperparameters (e.g. learning rate, if annealed)
  • leq_constraint (tuple) – A constraint provided as a tuple (f, epsilon), of the form f(*inputs) <= epsilon.
constraint_val(input_val_dict)[source]

Computes the value of the KL-divergence between pre-update policies for given inputs

Parameters:
  • inputs (list) – inputs needed to compute the inner KL
  • extra_inputs (list) – additional inputs needed to compute the inner KL
Returns:

value of the loss

Return type:

(float)

gradient(input_val_dict)[source]

Computes the gradient of the loss function

Parameters:
  • inputs (list) – inputs needed to compute the gradient
  • extra_inputs (list) – additional inputs needed to compute the loss function
Returns:

flattened gradient

Return type:

(np.ndarray)

loss(input_val_dict)[source]

Computes the value of the loss for given inputs

Parameters:
  • inputs (list) – inputs needed to compute the loss function
  • extra_inputs (list) – additional inputs needed to compute the loss function
Returns:

value of the loss

Return type:

(float)

optimize(input_val_dict)[source]

Carries out the optimization step

Parameters:
  • inputs (list) – inputs for the optimization
  • extra_inputs (list) – extra inputs for the optimization
  • subsample_grouped_inputs (None or list) – subsample data from each element of the list

MAML First Order Optimizer

class meta_policy_search.optimizers.MAMLFirstOrderOptimizer(tf_optimizer_cls=<class 'tensorflow.python.training.adam.AdamOptimizer'>, tf_optimizer_args=None, learning_rate=0.001, max_epochs=1, tolerance=1e-06, num_minibatches=1, verbose=False)[source]

Bases: meta_policy_search.optimizers.base.Optimizer

Optimizer for first order methods (SGD, Adam)

Parameters:
  • tf_optimizer_cls (tf.train.optimizer) – desired tensorflow optimzier for training
  • tf_optimizer_args (dict or None) – arguments for the optimizer
  • learning_rate (float) – learning rate
  • max_epochs – number of maximum epochs for training
  • tolerance (float) – tolerance for early stopping. If the loss fucntion decreases less than the specified tolerance
  • an epoch, then the training stops. (after) –
  • num_minibatches (int) – number of mini-batches for performing the gradient step. The mini-batch size is
  • size//num_minibatches. (batch) –
  • verbose (bool) – Whether to log or not the optimization process
build_graph(loss, target, input_ph_dict)[source]

Sets the objective function and target weights for the optimize function

Parameters:
  • loss (tf_op) – minimization objective
  • target (Policy) – Policy whose values we are optimizing over
  • input_ph_dict (dict) – dict containing the placeholders of the computation graph corresponding to loss
loss(input_val_dict)[source]

Computes the value of the loss for given inputs

Parameters:input_val_dict (dict) – dict containing the values to be fed into the computation graph
Returns:value of the loss
Return type:(float)
optimize(input_val_dict)[source]

Carries out the optimization step

Parameters:input_val_dict (dict) – dict containing the values to be fed into the computation graph
Returns:(float) loss before optimization