Course on Machine Learning II

-> Home page -> Teaching activities flag

Basic information on the course:

Academic year 2023/2024

Basic study literature for the course, complementing lectures:

aima Sutton-Barto: Reinforcement Learning: An Introduction. 2nd ed, 2018 1st ed, 1998

Additional material to the course

Lecture presentations (in Slovak)

go Selected Mathematical Titbits
go Introduction to Reinforcement Learning
go Finite Markov Decision Process
go Dynamic Programming Methods
go Monte Carlo Methods
go Temporal-difference Methods
go Value Function Approximation
go Policy Approximation
go Between MC and TD
go Deep RL (value functions)
go Planning and learning

Final exam topics

  • Selected Mathematical Titbits
    • Expected value
    • Probability
    • Markov chain
    • Infinite series
  • Introduction to Reinforcement Learning (RL)
    • Characterisation of RL
    • Basic elements of RL
    • Interaction between agent and environment
    • Exploration vs exploitation
    • Taxonomy of RL methods
  • Finite Markov Decision Process (MDP)
    • Markov decission process (MDP)
    • Environment dynamics modelling
    • Reward and return modelling
    • Value functions and optimal value functions
    • Bellman expectation equation for v
    • Bellman expectation equation for q
    • Action selection and optimal action selection
    • Bellman optimality equations
    • Optimal policy retrieval
  • Dynamic programming based RL
    • Dynamic programming and RL
    • Policy evaluation
    • Policy improvement and its theorem
    • Policy iteration
    • Value iteration
    • Synchronous and asynchronous DP
  • Monte Carlo based RL
    • Action value function estimation
    • General policy iteration and convergence
    • On-policy learning with exploring starts
    • On-policy learning with soft policy
    • Off-policy and importance sampling
    • Off-policy action value function estimation
    • Off-policy optimal policy estimation
  • Temporal-difference based RL
    • TD vs MC approach
    • Value function estimation
    • Iterative policy learning
    • On-policy Sarsa algorithm
    • Off-policy Q-learning algorithm
    • Expected Sarsa algorithm
    • Double learning
    • Greedy policy as exporation policy
  • Approximation of value functions
    • Scaling problem
    • Parametric vs memory approximator
    • Framework for parametric value approximation
    • SGD minimization
    • Semi-gradient value function estimation
    • Linear approximator
    • Episodic semi-gradient Sarsa
    • Average reward approach for continuous tasks
    • Differential semi-gradient Sarsa
    • Inkremental vs mini-batch based approach
  • Approximation of policy
    • Parametric policy approximator
    • Gradient learning in episodic environment
    • Reinforce algorithm
    • Reinforce with baseline algorithm
    • One-step episodic actor-critic algorithm
    • Learning in continuous environment
    • One-step continuous actor-critic algorithm
    • Continuous action space
  • Deep RL (value functions)
    • DNN as a parametric approximator
    • Deep Q-learning and incompatibilities
    • DQN + stability maintenance
    • DQN enhancements - Rainbow
  • Planning and learning
    • Model types
    • Unification of planning and learning
    • Algorithm Dyna-Q
    • Exploration in planning context, Dyna-Q+
    • Model sampling strategy
    • Prioritized sweeping

Additional reading

aima Morales: Grokking Deep Reinforcement Learning, 2020

Copyright © MM
Last updated 17.4.2024