Home ML Papers David Silver - The Predictron: End-To-End Learning and Planning (2016)

David Silver - The Predictron: End-To-End Learning and Planning (2016)

History / Edit / PDF / EPUB / BIB /
Created: June 23, 2017 / Updated: November 2, 2024 / Status: finished / 1 min read (~126 words)
Machine learning

The predictron is composed of four main components
- A state representation $\textbf{s} = f(s)$ that encodes raw input $s$
- A model $\textbf{s}'$ , $\textbf{r}$ , $\boldsymbol{\gamma} = m(\textbf{s}, \beta)$ that maps from internal state $\textbf{s}$ to subsequent internal state $\textbf{s}'$ , internal reward $\textbf{r}$ , and internal discount $\boldsymbol{\gamma}$
- A value function $v$ that outputs internal values $\textbf{v} = v(\textbf{s})$ representing the future, internal return from internal state $\textbf{s}$ onwards
- An accumulator, which combines together internal rewards, discounts, and values, into an overall estimate of value $\textbf{g}$

Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).