A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

Sanaa Chafik; Cherki Daoui

Research Article

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

by Sanaa Chafik, Cherki Daoui

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 133 - Issue 10

Published: January 2016

Authors: Sanaa Chafik, Cherki Daoui

10.5120/ijca2016908033

PDF

Sanaa Chafik, Cherki Daoui . A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes. International Journal of Computer Applications. 133, 10 (January 2016), 28-33. DOI=10.5120/ijca2016908033

                        @article{ 10.5120/ijca2016908033,
                        author  = { Sanaa Chafik,Cherki Daoui },
                        title   = { A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes },
                        journal = { International Journal of Computer Applications },
                        year    = { 2016 },
                        volume  = { 133 },
                        number  = { 10 },
                        pages   = { 28-33 },
                        doi     = { 10.5120/ijca2016908033 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2016
                        %A Sanaa Chafik
                        %A Cherki Daoui
                        %T A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes%T 
                        %J International Journal of Computer Applications
                        %V 133
                        %N 10
                        %P 28-33
                        %R 10.5120/ijca2016908033
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based on the topology of the problem; moreover, an Open Multi-Processing (Open-MP) programming model is applied to attain efficient parallel performance in solving the Modified algorithm.

References

Alagoz, O., Maillart, L. M., Schaefer, A. J., & Roberts, M. S. 2007. Determining the acceptance of cadaveric livers using an implicit model of the waiting list. Operations Research, 55(1), 24-36.
Bellman, R. 1957. Dynamic Programming. Princeton University Press.
Benjaafar, S., & ElHafsi, M. 2006. Production and inventory control of a single product assemble-to-order system with multiple customer classes.Management Science, 52(12), 1896-1912.
Chafik, S., & Daoui, C. 2015. A Modified Value Iteration Algorithm for Discounted Markov Decision Processes. Journal of Electronic Commerce in Organizations (JECO), 13(3), 47-57.
Daoui, C., Abbad, M., & Tkiouat, M. 2010. Exact decomposition approaches for Markov decision processes: A survey. Advances in Operations Research,2010.
Dean, T., & Lin, S. H. 1995. Decomposition techniques for planning in stochastic domains. In IJCAI (Vol. 2, p. 3).
Galoppo, N., Govindaraju, N. K., Henson, M., & Manocha, D. 2005. LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (p. 3). IEEE Computer Society.
Goto, J. H. 1999. A Markov decision process model for airline meal provisioning (Doctoral dissertation, University of British Columbia).
Lewis, M. E., Ayhan, H., & Foley, R. D. 2002. Bias optimal admission control policies for a multiclass nonstationary queueing system. Journal of Applied Probability, 20-37.
Munos, R. Introduction à l’apprentissage par renforcement, from http://researchers.lille.inria.fr/ ∼ munos/master-mva/.
Papadimitriou, C. H., & Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Mathematics of operations research, 12(3), 441-450.
Pavitsos, A., & Kyriakidis, E. G. 2009. Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers & Operations Research, 36(6), 1993-2006.
PDMIA, G. (2008). Processus décisionnels de Markov en intelligence artificielle. Edité par Olivier Buffet et Olivier Sigaud, 1.
Preux, P. (2008). Apprentissage par renforcement, GRAPA.
Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
Sato, M. (2002, October). OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In Proceedings of the 15th international symposium on System Synthesis (pp. 109-111). ACM.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Markov Decision Processe Discounted reward criterion Policy Iteration algorithm Open Multi-Processing shared memory Parallelizing.