StrixTheKiet Notes

Search

❯

❯

ArtificialIntelligence

❯

❯

❯

Model-based Reinforcement Learning

Model-based Reinforcement Learning

Jun 01, 20251 min read

Description:

Estimate the transition and reward functions with the samples during exploration before using these estimates to solve the MDP normally with value or policy iteration.
Generates an approximation of the transition function, $\hat{T} (s, a, s^{'})$ , by keeping counts of the number of times it arrives in each state $s'$ after entering each q-state $(s, a)$ .

Steps:

Step 1: Learn empirical MDP model
- Count outcomes $s ’$ for each $s, a$
- Normalize to give an estimate of $\hat{T} (s, a, s^{'})$
- Discover each $\hat{R} (s, a, s^{'})$ when we experience $(s, a, s ’)$
Step 2: Solve the learned MDP
- For example, use MDP methods, as before

Exploration:

The agent can then generate the the approximate transition function $\hat{T}$ upon request by normalizing the counts it has collected dividing the count for each observed tuple $(s, a, s^{'})$ by the sum over the counts for all instances where the agent was in q-state $(s, a)$
Normalization of counts scales them such that they sum to one, allowing them to be interpreted as probabilities

Graph View

Description:
Steps:
Exploration:

Backlinks

Passive Reinforcement Learning

Created with strixthekiet

GitHub
Email