Citation: | De-Rong Liu, Hong-Liang and Li Ding Wang. Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey. International Journal of Automation and Computing, vol. 12, no. 3, pp. 229-242, 2015. https://doi.org/10.1007/s11633-015-0893-y |
[1] |
R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, MA, USA MIT Press, 1998.
|
[2] |
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, New York, NY, USA: John Wiley & Sons, Inc., 1994.
|
[3] |
R. E. Bellman. Dynamic Programming, Princeton, NJ, USA: Princeton University Press, 1957.
|
[4] |
C. Szepesvari. Algorithms for Reinforcement Learning, San Mateo, CA, USA: Morgan & Claypool Publishers, 2010.
|
[5] |
P. J. Werbos. Approximate dynamic programming for realtime control and neural modeling. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White, D. A. Sofge, Eds., New York, USA: Van Nostrand Reinhold, 1992.
|
[6] |
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming, Belmont, MA, USA: Athena Scientific, 1996.
|
[7] |
J. Si, A. G. Barto, W. B. Powell, D. C. Wunsch. Handbook of Learning and Approximate Dynamic Programming, New York, USA: Wiley-IEEE Press, 2004.
|
[8] |
W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality, New York, USA: Wiley-Interscience, 2007.
|
[9] |
F. Y. Wang, H. G. Zhang, D. R. Liu. Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39-47, 2009.
|
[10] |
F. L. Lewis, D. R. Liu. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, Hoboken, NJ, USA: Wiley-IEEE Press, 2013.
|
[11] |
F. Y.Wang, N. Jin, D. R. Liu, Q. L.Wei. Adaptive dynamic programming for finite-horizon optimal control of discretetime nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, vol. 22, no. 1, pp. 24-36, 2011.
|
[12] |
D. Wang, D. R. Liu, Q. L. Wei, D. B. Zhao, N. Jin. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, vol. 48, no. 8, pp. 1825-1832, 2012.
|
[13] |
D. R. Liu, D. Wang, X. Yang. An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Information Sciences, vol. 220, pp. 331-342, 2013.
|
[14] |
H. Li, D. Liu. Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory and Applications, vol. 6, no. 18, pp. 2725-2736, 2012.
|
[15] |
A. Gosavi. Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning, Secaucus, NJ, USA: Springer Science & Business Media, 2003.
|
[16] |
V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan, India: Hindustan Book Agency, 2008.
|
[17] |
S. Lange, T. Gabel, M. Riedmiller. Batch reinforcement learning. Reinforcement Learning: State-of-the-Art, Adaptation, Learning, and Optimization, M. Wiering, M. van Otterlo, Eds., Berlin, Germany: Springer-Verlag, pp. 45-73, 2012.
|
[18] |
D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, vol. 9, no. 3, pp. 310-335, 2011.
|
[19] |
L. Busoniu, R. Babuska, B. D. Schutter, D. Ernst. Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering), Boca Raton, FL, USA: CRC Press, 2010.
|
[20] |
L. Busoniu, D. Ernst, B. De Schutter, R. Babuska. Approximate reinforcement learning: An overview. In Proceedings of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE, Paris, France, 2011.
|
[21] |
M. Geist, O. Pietquin. Algorithmic survey of parametric value function approximation. IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 6, pp. 845-867, 2013.
|
[22] |
G. J. Gordon. Approximate Solutions to Markov Decision Processes, Ph.D. dissertation, Carnegie Mellon University, USA, 1999.
|
[23] |
D. Ormoneit, Ś. Sen. Kernel-based reinforcement learning. Machine Learning, vol. 49, no. 2-3, pp. 161-178, 2002.
|
[24] |
D. Ernst, P. Geurts, L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
|
[25] |
M. Riedmiller. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning, Springer, Porto, Portugal, pp. 317-328, 2005.
|
[26] |
S. J. Bradtke, A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996.
|
[27] |
J. A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, vol. 49, no. 2-3, pp. 233-246, 2002.
|
[28] |
A. Nedić, D. P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003.
|
[29] |
M. G. Lagoudakis, R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, vol. 4, pp. 1107-1149, 2003.
|
[30] |
A. Antos, C. Szepesvári, R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
|
[31] |
A. Antos, C. Szepsevári, R. Munos. Value-iteration based fitted policy iteration: Learning with a single trajectory. In Proceedings of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE, Honolulu, Hawaii, USA, 2007, pp. 330-337, 2007.
|
[32] |
M. Puterman, M. Shin. Modified policy iteration algorithms for discounted Markov decision problems. Management Science, vol. 24, no. 11, pp. 1127-1137, 1978.
|
[33] |
J. N. Tsitsiklis. On the convergence of optimistic policy iteration. Journal of Machine Learning Research, vol. 3, pp. 59-72, 2002.
|
[34] |
B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist. Approximate modified policy iteration. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, pp. 1207-1214, 2012.
|
[35] |
A. M. Farahmand, M. Ghavamzadeh, C. Szepesvári, S. Mannor. Regularized policy iteration. Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, Eds., Cambridge, MA, USA: MIT Press, pp. 441-448, 2008.
|
[36] |
A. M. Farahmand, M. Ghavamzadeh, C. Szepesvari, S. Mannor. Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In Proceedings of American Control Conference, IEEE, St. Louis, MO, USA, pp. 725-730, 2009.
|
[37] |
A. M. Farahmand, C. Szepesvári. Model selection in reinforcement learning. Machine Learning, vol. 85, no. 3, pp. 299-332, 2011.
|
[38] |
M. Loth, M. Davy, P. Preux. Sparse temporal difference learning using LASSO. In Proceedings of IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE, Honolulu, Hawaii, USA, pp. 352-359, 2007.
|
[39] |
J. Z. Kolter, A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, pp. 521-528, 2009.
|
[40] |
J. Johns, C. Painter-Wakefield, R. Parr. Linear complementarity for regularized policy evaluation and improvement. In Proceedings of Neural Information and Processing Systems, Curran Associates, New York, USA, pp. 1009-1017, 2010.
|
[41] |
M. Ghavamzadeh, A. Lazaric, R. Munos, M. W. Hoffman. Finite-sample analysis of Lasso-TD. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, USA, pp. 1177-1184, 2011.
|
[42] |
B. Liu, S. Mahadevan, J. Liu. Regularized off-policy TDlearning. In Proceedings of Advances in Neural Information Processing Systems 25, pp. 845-853, 2012.
|
[43] |
S. Mahadevan, B. Liu. Sparse Q-learning with mirror descent. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, pp. 564-573, 2012.
|
[44] |
M. Petrik, G. Taylor, R. Parr, S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 871-878, 2010.
|
[45] |
M. Geist, B. Scherrer. 1-penalized projected Bellman residual. In Proceedings of the 9th European Workshop on Reinforcement Learning, Athens, Greece, pp. 89-101, 2011.
|
[46] |
M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh. A Dantzig selector approach to temporal difference learning. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 1399-1406, 2012.
|
[47] |
Z. W. Qin, W. C. Li, F. Janoos. Sparse reinforcement learning via convex optimization. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp. 424-432, 2014.
|
[48] |
M. W. Hoffman, A. Lazaric, M. Ghavamzadeh, R. Munos. Regularized least squares temporal difference learning with nested l2 and l1 penalization. In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, Athens, Greece, pp. 102-114, 2012.
|
[49] |
J. Johns, S. Mahadevan. Sparse Approximate Policy Evaluation Using Graph-based Basis Functions, Technical Report UM-CS-2009-041, University of Massachusetts, Amherst, USA, 2009.
|
[50] |
C. Painter-Wakefield, R. Parr. Greedy algorithms for sparse reinforcement learning. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 1391-1398, 2012.
|
[51] |
A. M. Farahmand, D. Precup. Value pursuit iteration. In Proceedings of Advances in Neural Information Processing Systems 25, Stateline, NV, USA pp. 1349-1357, 2012.
|
[52] |
M. Ghavamzadeh, A. Lazaric, O. A. Maillard, R. Munos. LSTD with random projections. In Proceedings of Advances in Neural Information Processing Systems 23, Vancourer, Canada, pp. 721-729, 2010.
|
[53] |
B. Liu, S. Mahadevan. Compressive Reinforcement Learning with Oblique Random Projections, Technical Report UM-CS-2011-024, University of Massachusetts, Amherst, USA, 2011.
|
[54] |
G. Taylor, R. Parr. Kernelized value function approximation for reinforcement learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, pp. 1017-1024, 2009.
|
[55] |
T. Jung, D. Polani. Least squares SVM for least squares TD learning. In Proceedings of the 17th European Conference on Artificial Intelligence, Trento, Italy, pp. 499-503, 2006.
|
[56] |
X. Xu, D. W. Hu, X. C. Lu. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, vol. 18, no. 4, pp. 973-992, 2007.
|
[57] |
F. W. Keller, S. Mannor, D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, ACM, New York, NY, USA, pp. 449-456, 2006.
|
[58] |
R. Parr, C. Painter-Wakefield, L. H. Li, M. L. Littman. Analyzing feature generation for value-function approximation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp. 737-744, 2007.
|
[59] |
R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, pp. 752-759, 2008.
|
[60] |
M. M. Fard, Y. Grinberg, A. M. Farahmand, J. Pineau, D. Precup. Bellman error based feature generation using random projections on sparse spaces. In Proceedings of Advances in Neural Information Processing Systems 26, Stateline, NV, USA, pp. 3030-3038, 2013.
|
[61] |
M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
|
[62] |
S. T. Roweis, L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
|
[63] |
J. Tenenbaum, V. de Silva, J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
|
[64] |
S. Mahadevan. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 553-560, 2005.
|
[65] |
S. Mahadevan. Representation policy iteration. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, Edinburgh, Scotland, pp. 372-379, 2005.
|
[66] |
S. Mahadevan, M. Maggioni, K. Ferguson, S. Osentoski. Learning representation and control in continuous Markov decision processes. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, USA, pp. 1194-1199, 2006.
|
[67] |
S. Mahadevan, M. Maggioni. Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Proceedings of Advances in Neural Information Processing Systems 18, Vancourer, Canada, pp. 843-850, 2005.
|
[68] |
S. Mahadevan, M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, vol. 8, no. 10, pp. 2169-2231, 2007.
|
[69] |
S. Mahadevan. Learning representation and control in Markov decision processes: New frontiers. Foundations and Trends in Machine Learning, vol. 1, no. 4, pp. 403-565, 2009.
|
[70] |
S. Osentoski, S. Mahadevan. Learning state-action basis functions for hierarchical MDPs. In Proceedings of the 24th International Conference on Machine Learning, ACM, New York, NY, USA, pp. 705-712, 2007.
|
[71] |
J. Johns, S. Mahadevan. Constructing basis functions from directed graphs for value function approximation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp. 385-392, 2007.
|
[72] |
J. Johns, S. Mahadevan, C. Wang. Compact spectral bases for value function approximation using Kronecker factorization. In Proceedings of the 22nd National Conference on Artificial Intelligence, AAAI, California, USA, pp. 559-564, 2007.
|
[73] |
M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, pp. 2574-2579, 2007.
|
[74] |
J. H. Metzen. Learning graph-based representations for continuous reinforcement learning domains. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Czech Republic, pp. 81-96, 2013.
|
[75] |
X. Xu, Z. H. Huang, D. Graves, W. Pedrycz. A clusteringbased graph Laplacian framework for value function approximation in reinforcement learning. IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2613-2625, 2014.
|
[76] |
K. Rohanimanesh, N. Roy, R. Tedrake. Towards feature selection in actor-critic algorithms. In Proceedings of Workshop on Abstraction in Reinforcement Learning, Montreal, Canada, pp. 1-9, 2009.
|
[77] |
H. Sprekeler. On the relation of slow feature analysis and Laplacian eigenmaps. Neural Computation, vol. 23, no. 12, pp. 3287-3302, 2011.
|
[78] |
L. Wiskott, T. Sejnowski. Slow feature analysis: Uunsupervised learning of invariances. Neural Computation, vol. 14, no. 4, pp. 715-770, 2002.
|
[79] |
M. Luciw, J. Schmidhuber. Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In Proceedings of the 22nd International Conference on Artificial Neural Networks and Machine Learning, Lausame, Switzerland, pp. 279-287, 2012.
|
[80] |
R. Legenstein, N.Wilbert, L. Wiskott. Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology, vol. 6, no. 8, Article number e1000894, 2010.
|
[81] |
W. Böhmer, S. Grünewälder, Y. Shen, M. Musial, K. Obermayer. Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research, vol. 14, pp. 2067-2118, 2013.
|
[82] |
G. E. Hinton, R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, vol. 313, no. 5786, pp. 504-507, 2006.
|
[83] |
Y. Bengio, A. Courville, P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013.
|
[84] |
I. Arel, D. C. Rose, T. P. Karnowski. Deep machine learning -A new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18, 2010.
|
[85] |
G. E. Hinton, S, Osindero, Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
|
[86] |
R. Salakhutdinov, G. E. Hinton. A better way to pretrain deep Boltzmann machines. In Proceedings of Advances in Neural Information Processing Systems 25, MIT Press, Cambridge, MA, pp. 2456-2464, 2012.
|
[87] |
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy layer-wise training of deep networks. In Proceedings of Advances in Neural Information Processing Systems 19, Stateline, NV, USA, pp. 153-160, 2007.
|
[88] |
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, vol. 11, pp. 3371-3408, 2010.
|
[89] |
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
|
[90] |
G. E. Hinton. A practical guide to training restricted Boltzmann machines. Neural Networks: Tricks of the Trade, 2nd ed., G. Montavon, G. B. Orr, K. R. Müller, Eds., Berlin, Germany Springer, pp. 599-619, 2012.
|
[91] |
B. Sallans, G. E. Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, vol. 5, pp. 1063-1088, 2004.
|
[92] |
M. Otsuka, J. Yoshimoto, K. Doya. Free-energy-based reinforcement learning in a partially observable environment. In Proceedings of the 18th European Symposium on Artifical Neural Networks, Bruges, Belgium, pp. 541-546, 2010.
|
[93] |
S. Elfwing, M. Otsuka, E. Uchibe, K. Doya. Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs. In Proceedings of the 17th International Conference on Neural Information Processing: Theory and algorithms, Sydney, Australia, pp. 215-222, 2010.
|
[94] |
N. Heess, D. Silver, Y. W. Teh. Actor-critic reinforcement learning with energy-based policies. In Proceedings of the 10th European Workshop on Reinforcement Learning, pp. 43-58, 2012.
|
[95] |
F. Abtahi, I. Fasel. Deep belief nets as function approximators for reinforcement learning. In Proceedings of IEEE ICDL-EPIROB, Frankfurt, Germany, 2011.
|
[96] |
P. D. Djurdjevic, D. M. Huber. Deep belief network for modeling hierarchical reinforcement learning policies. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, IEEE, Manchester, UK, pp. 2485-2491, 2013.
|
[97] |
R. Faulkner, D. Precup. Dyna planning using a feature based generative model. In Proceedings of Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Vancourer, Canada, pp. 1-9, 2010.
|
[98] |
S. Lange, M. Riedmiller, A. Voigtlander. Autonomous reinforcement learning on raw visual input data in a real world application. In Proceedings of International Joint Conference on Neural Networks, Brisbane, Australia, pp. 1-8, 2012.
|
[99] |
S. Lange, M. Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In Proceedings of International Joint Conference on Neural Networks, IEEE, Barcelona, Spain, 2010.
|
[100] |
J. Mattner, S. Lange, M. Riedmiller. Learn to swing up and balance a real pole based on raw visual input data. In Proceedings of Advances on Neural Information Processing, Springer-Verlag, Stateline, USA, pp. 126-133, 2012.
|
[101] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antogoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning. In Proceedings of Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Nevada, USA, pp. 1-9, 2013.
|
[102] |
D. P. Bertsekas. Weighted Sup-norm Contractions in Dynamic Programming: A Review and Some New Applications, Technical Report LIDS-P-2884, Laboratory for Information and Decision Systems, MIT, USA, 2012.
|
[103] |
R. Munos. Error bounds for approximate policy iteration. In Proceedings of the 20th International Conference on Machine Learning, Washington DC, USA, pp. 560-567, 2003.
|
[104] |
R. Munos. Performance bounds in Lp-norm for approximate value iteration. SIAM Journal on Control and Optimization, vol. 46, no. 2, pp. 541-561, 2007.
|
[105] |
R. Munos, C. Szepesvari. Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, vol. 9, pp. 815-857, 2008.
|
[106] |
S. A. Murphy. A generalization error for Q-learning. Journal of Machine Learning Research, vol. 6, pp. 1073-1097, 2005.
|
[107] |
O. Maillard, R. Munos, A. Lazaric, M. Ghavamzadeh. Finite-sample analysis of Bellman residual minimization. In Proceedings of the 2nd Asian Conference on Machine Learning, Tokyo, Japan, pp. 299-314, 2010.
|
[108] |
A. Lazaric, M. Ghavamzadeh, R. Munos. Analysis of classification-based policy iteration algorithms. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 607-614, 2010.
|
[109] |
A. Farahmand, R. Munos, C. Szepesvári. Error propagation for approximate policy and value iteration. In Proceedings of Advances on Neural Information and Processing Systems 23, Vancourer, Canada, pp. 568-576, 2010.
|
[110] |
A. Almudevar, E. F. de Arruda. Optimal approximation schedules for a class of iterative algorithms, with an application to multigrid value iteration. IEEE Transactions on Automatic Control, vol. 57, no. 12, pp. 3132-3146, 2012.
|
[111] |
A. Antos, R. Munos, C. Szepsevári. Fitted Q-iteration in continuous action-space MDPs. In Proceedings of Advances in Neural Information and Processing Systems 20, pp. 1-8, 2007.
|
[112] |
A. Lazaric, M. Ghavamzadeh, R. Munos. Finite-sample analysis of LSTD. In Proceedings of the 27th International Conference onMachine Learning, Haifa, Israel, pp. 615-622, 2010.
|
[113] |
A. Lazaric, M. Ghavamzadeh, R. Munos. Finite-sample analysis of least-squares policy iteration. Journal of Machine Learning Research, vol. 13, no. 1, pp. 3041-3074, 2012.
|
[114] |
A. Lazaric. Transfer in reinforcement learning: A framework and a survey. Reinforcement Learning: State-of-the-Art, Adaptation, Learning, and Optimization, M.Wiering, M. van Otterlo, Eds., Berlin, Germeny: Springer-Verlag, pp. 143-173, 2012.
|
[115] |
Y. X. Li, D. Schuurmans. MapReduce for parallel reinforcement learning. In Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning, Athens, Greece, pp. 309-320, 2011.
|