Stochastic Optimal Control: The Discrete Time Case

Introduction to Operations Research
By Joseph G. Ecker, Michael Kupferschmid
Although this textbook is intended for use in a two-semester sequence of courses introducing the mathematical methods of operations research, Part I can also be used alone for a one-semester course on linear programming.
Heuristic Nonserial Dynamic Programming for Large Problems
By John S. Gero, Michael A. Rosenman
Dynamic programming is an extremely powerful optimization approach used for the solution of problems which can be formulated to exhibit a serial stage-state structure.
Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...
By Dimitri P. Bertsekas
Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...