Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Department of Mathematical Sciences

Seminar Archives

On this page you can find information about seminars in this and previous academic years, where available on the database.

Statistics Seminars: Posterior weighted reinforcement learning with state uncertainty

Presented by David Leslie, Bristol University

6 March 2009 14:15 in PH30

Reinforcement learning models are, in essence, online algorithms to estimate the expected reward in each of a set of states by allocating observed rewards to states and calculating averages. Generally it is assumed that a learner can unambiguously identify the state of nature. However in any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature. Under state uncertainty it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a particular state of the environment. A new technique, posterior weighted reinforcement learning, is introduced. In this process the reinforcement learning updates are weighted according to the posterior state probabilities, calculated after observation of the reward. We show that this modified algorithm can converge to correct reward estimates, and show the procedure to be a variant of an online expectation-maximisation algorithm, allowing further analysis to be carried out.

Note the unusual time and location! To get to PH30, enter the physics building, take the first left (before the stairs), and at the end of the hall, turn right. PH30 is behind the double doors in front of you.

Contact sunil.chhita@durham.ac.uk for more information