Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor

Thomas Dueholm Hansen, Peter Bro Miltersen, Uri Zwick

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

Ye showed recently that the simplex method with Dantzig pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most О iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye's analysis in two respects. First, we improve the bound given by Ye and show that Howard's policy iteration algorithm actually terminates after at most О iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement) algorithm, a generalization of Howard's policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, resolving a long standing open problem.
OriginalsprogEngelsk
TitelProceedings of the Second Symposium on Innovations in Computer Science
Antal sider11
ForlagTsinghua University Press, Beijing
Publikationsdato2011
Sider253-263
ISBN (Trykt)978-7-302-24517-9
StatusUdgivet - 2011
BegivenhedInnovations in Computer Science. ICS 2011 - Beijing, Kina
Varighed: 6 jan. 20119 jan. 2011

Konference

KonferenceInnovations in Computer Science. ICS 2011
Land/OmrådeKina
ByBeijing
Periode06/01/201109/01/2011

Fingeraftryk

Dyk ned i forskningsemnerne om 'Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor'. Sammen danner de et unikt fingeraftryk.

Citationsformater