Meta-Policy Search¶
Meta-Trainer¶
-
class
meta_policy_search.meta_trainer.
Trainer
(algo, env, sampler, sample_processor, policy, n_itr, start_itr=0, num_inner_grad_steps=1, sess=None)[source]¶ Bases:
object
Performs steps of meta-policy search.
Pseudocode:
for iter in n_iter: sample tasks for task in tasks: for adapt_step in num_inner_grad_steps sample trajectories with policy perform update/adaptation step sample trajectories with post-update policy perform meta-policy gradient step(s)
Parameters: - algo (Algo) –
- env (Env) –
- sampler (Sampler) –
- sample_processor (SampleProcessor) –
- baseline (Baseline) –
- policy (Policy) –
- n_itr (int) – Number of iterations to train for
- start_itr (int) – Number of iterations policy has already trained for, if reloading
- num_inner_grad_steps (int) – Number of inner steps per maml iteration
- sess (tf.Session) – current tf session (if we loaded policy, for example)