David Silver - The Predictron: End-To-End Learning and Planning (2016)
History /
Edit /
PDF /
EPUB /
BIB /
Created: June 23, 2017 / Updated: November 2, 2024 / Status: finished / 1 min read (~126 words)
Created: June 23, 2017 / Updated: November 2, 2024 / Status: finished / 1 min read (~126 words)
- The predictron is composed of four main components
- A state representation s=f(s) that encodes raw input s
- A model s′, r, γ=m(s,β) that maps from internal state s to subsequent internal state s′, internal reward r, and internal discount γ
- A value function v that outputs internal values v=v(s) representing the future, internal return from internal state s onwards
- An accumulator, which combines together internal rewards, discounts, and values, into an overall estimate of value g
- Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).