Optimizers¶

Conjugate Gradient Optimizer¶

class meta_policy_search.optimizers.ConjugateGradientOptimizer(cg_iters=10, reg_coeff=0, subsample_factor=1.0, backtrack_ratio=0.8, max_backtracks=15, debug_nan=False, accept_violation=False, hvp_approach=<meta_policy_search.optimizers.conjugate_gradient_optimizer.FiniteDifferenceHvp object>)[source]¶

Bases: meta_policy_search.optimizers.base.Optimizer

Performs constrained optimization via line search. The search direction is computed using a conjugate gradient algorithm, which gives x = A^{-1}g, where A is a second order approximation of the constraint and g is the gradient of the loss function.

Parameters:

cg_iters (int) – The number of conjugate gradients iterations used to calculate A^-1 g
reg_coeff (float) – A small value so that A -> A + reg*I
subsample_factor (float) – Subsampling factor to reduce samples when using “conjugate gradient. Since the computation time for the descent direction dominates, this can greatly reduce the overall computation time.
backtrack_ratio (float) – ratio for decreasing the step size for the line search
max_backtracks (int) – maximum number of backtracking iterations for the line search
debug_nan (bool) – if set to True, NanGuard will be added to the compilation, and ipdb will be invoked when nan is detected
accept_violation (bool) – whether to accept the descent step if it violates the line search condition after exhausting all backtracking budgets
hvp_approach (obj) – Hessian vector product approach

build_graph(loss, target, input_ph_dict, leq_constraint)[source]¶

Sets the objective function and target weights for the optimize function

Parameters:

loss (tf_op) – minimization objective
target (Policy) – Policy whose values we are optimizing over
inputs (list) – tuple of tf.placeholders for input data which may be subsampled. The first dimension corresponds to the number of data points
extra_inputs (list) – tuple of tf.placeholders for hyperparameters (e.g. learning rate, if annealed)
leq_constraint (tuple) – A constraint provided as a tuple (f, epsilon), of the form f(*inputs) <= epsilon.

constraint_val(input_val_dict)[source]¶

Computes the value of the KL-divergence between pre-update policies for given inputs

Parameters:	inputs (list) – inputs needed to compute the inner KL extra_inputs (list) – additional inputs needed to compute the inner KL
Returns:	value of the loss
Return type:	(float)

gradient(input_val_dict)[source]¶

Computes the gradient of the loss function

Parameters:	inputs (list) – inputs needed to compute the gradient extra_inputs (list) – additional inputs needed to compute the loss function
Returns:	flattened gradient
Return type:	(np.ndarray)

loss(input_val_dict)[source]¶

Computes the value of the loss for given inputs

Parameters:	inputs (list) – inputs needed to compute the loss function extra_inputs (list) – additional inputs needed to compute the loss function
Returns:	value of the loss
Return type:	(float)

optimize(input_val_dict)[source]¶

Carries out the optimization step

Parameters:	inputs (list) – inputs for the optimization extra_inputs (list) – extra inputs for the optimization subsample_grouped_inputs (None or list) – subsample data from each element of the list

MAML First Order Optimizer¶

class meta_policy_search.optimizers.MAMLFirstOrderOptimizer(tf_optimizer_cls=<class 'tensorflow.python.training.adam.AdamOptimizer'>, tf_optimizer_args=None, learning_rate=0.001, max_epochs=1, tolerance=1e-06, num_minibatches=1, verbose=False)[source]¶

Bases: meta_policy_search.optimizers.base.Optimizer

Optimizer for first order methods (SGD, Adam)

Parameters:

tf_optimizer_cls (tf.train.optimizer) – desired tensorflow optimzier for training
tf_optimizer_args (dict or None) – arguments for the optimizer
learning_rate (float) – learning rate
max_epochs – number of maximum epochs for training
tolerance (float) – tolerance for early stopping. If the loss fucntion decreases less than the specified tolerance
an epoch, then the training stops. (after) –
num_minibatches (int) – number of mini-batches for performing the gradient step. The mini-batch size is
size//num_minibatches. (batch) –
verbose (bool) – Whether to log or not the optimization process

build_graph(loss, target, input_ph_dict)[source]¶

Sets the objective function and target weights for the optimize function

Parameters:	loss (tf_op) – minimization objective target (Policy) – Policy whose values we are optimizing over input_ph_dict (dict) – dict containing the placeholders of the computation graph corresponding to loss

loss(input_val_dict)[source]¶

Computes the value of the loss for given inputs

Parameters:	input_val_dict (dict) – dict containing the values to be fed into the computation graph
Returns:	value of the loss
Return type:	(float)

optimize(input_val_dict)[source]¶

Carries out the optimization step

Parameters:	input_val_dict (dict) – dict containing the values to be fed into the computation graph
Returns:	(float) loss before optimization