Statistics Seminars: Posterior weighted reinforcement learning with state uncertainty
6 March 2009 14:15 in PH30
Reinforcement learning models are, in essence, online algorithms to estimate the expected reward in each of a set of states by allocating observed rewards to states and calculating averages. Generally it is assumed that a learner can unambiguously identify the state of nature. However in any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature. Under state uncertainty it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a particular state of the environment. A new technique, posterior weighted reinforcement learning, is introduced. In this process the reinforcement learning updates are weighted according to the posterior state probabilities, calculated after observation of the reward. We show that this modified algorithm can converge to correct reward estimates, and show the procedure to be a variant of an online expectation-maximisation algorithm, allowing further analysis to be carried out.
Note the unusual time and location! To get to PH30, enter the physics building, take the first left (before the stairs), and at the end of the hall, turn right. PH30 is behind the double doors in front of you.
Contact firstname.lastname@example.org for more information