% Latex Preamble % % Fixes \DeclareMathOperator*{\bigtimes}{\Huge\times} % Mathtools \bigtimes funktioniert aktuell nicht in Obsidian? % Beträge und Misc \newcommand{\qed}[0]{\square} \DeclarePairedDelimiters{\card}{\lvert}{\rvert} \DeclarePairedDelimiters{\abs}{\lvert}{\rvert} \DeclarePairedDelimiters{\norm}{\lVert}{\rVert} \DeclarePairedDelimiters{\euclidnorm}{\lVert}{\rVert_{2}} \DeclarePairedDelimiters{\sumnorm}{\lVert}{\rVert_{1}} \DeclarePairedDelimiters{\maxnorm}{\lVert}{\rVert_{\infty}} \DeclarePairedDelimiters{\ang}{\langle}{\rangle} \newcommand{\nnnorm}[1]{\vert\mkern-1.5mu\vert\mkern-1.5mu\vert #1 \vert\mkern-1.5mu\vert\mkern-1.5mu\vert}% \newcommand{\forbnorm}[1]{\norm{#1}_{F}}% \newcommand{\opnorm}[1]{\norm{#1}_{op}}% \newcommand{\complement}[1]{ #1^{c} } \newcommand{\komplement}[1]{ \overline{#1}} \DeclareMathOperator{\potenzmenge}{\mathcal{P}} \newcommand{\scalemid}[0]{\,\middle|\,} \newcommand{\sdots}[0]{\, \ldots\, } % Not the solution... \newcommand{\cconj}[1]{\bar{#1}} \newcommand{\qqquad}{\qquad\quad} \newcommand{\qqqquad}{\qquad\qquad} \newcommand{\Nu}{\mathcal{V}} \newcommand{\set}{\leftarrow} % row vector alternative for \vert \newcommand{\hor}{\ \raise{0.5ex}{\rule{5mm}{0.48pt}} \ } \newcommand{\lhor}{\!\hor\,} \newcommand{\rhor}{\, \hor\!} \newcommand{\rowvec}[1]{\lhor #1 \rhor} \newcommand{\invs}[0]{^{-1}} \newcommand{\invsa}[1]{#1^{-1}} \DeclareMathOperator{\symdif}{\triangle} \DeclareMathOperator{\Cupdot}{\dot{\bigcup}} \DeclareMathOperator{\cupdot}{\dot{\cup}} % Disjunct Union \DeclareMathOperator*{\disju}{\dot{\sum}} \DeclareMathOperator*{\displu}{\dot{+}} % Divides \newcommand{\divides}[0]{\vert} \newcommand{\ndivides}[0]{\not\hspace{0pt}\vert\ } \newcommand{\slfrac}[2]{\left.#1\middle/#2\right.} % Argmax \newcommand{\argmax}{\mathrm{arg}\,\max} \newcommand{\argmin}{\mathrm{arg}\,\min} \DeclareMathOperator*{\arginf}{arg\,inf} % Floor and Ceil \DeclarePairedDelimiters{\floor}{\lfloor}{\rfloor} \DeclarePairedDelimiters{\ceil}{\lceil}{\rceil} % Graphentheorie \DeclareMathOperator{\dist}{dist} \DeclareMathOperator{\ex}{ex} % Landau-Symbole \DeclareMathOperator{\bigO}{\mathcal{O}} % Lineare Algebra \DeclareMathOperator{\bild}{Bild} \DeclareMathOperator{\kern}{Kern} \newcommand{\transpose}[1]{ #1^{\top} } \newcommand{\adjung}[1]{ #1^{*} } \DeclareMathOperator{\cond}{cond} \DeclareMathOperator{\Rg}{Rg} \DeclareMathOperator{\spur}{Spur} \DeclareMathOperator{\cof}{Cof} \DeclareMathOperator{\adj}{Adj} \newcommand{\bigzero}[0]{\huge0} \newcommand{\fatvec}[1]{\mathbf{#1}} \DeclareMathOperator{\eigraum}{\mathnormal{E}} \DeclareMathOperator{\diag}{diag} \newcommand{\scalprod}[2]{\ang{#1, #2}} \newcommand{\frobscalprod}[2]{\scalprod{#1}{#2}_{F}} \DeclareMathOperator{\ortho}{\perp} \DeclareMathOperator{\notortho}{\not\perp} \DeclareMathOperator{\aff}{aff} \DeclareMathOperator{\interior}{int} \DeclareMathOperator{\relint}{rel int} % Darstellungsmatrix \newcommand{\dmat}[3]{ \prescript{}{\mathcal{#1}}{#2}_{\mathcal{#3}}} % Analysis \newcommand{\nderiv}[2]{#1^{(#2)}} \newcommand{\newtonderiv}[2]{#1^{(#2)}} \newcommand{\taylorpol}[2]{P_{#1, #2}} \newcommand{\taylorrest}[2]{R_{#1, #2}} \newcommand{\pktw}{\stackrel{ \text{pktw.} }{ \longrightarrow }} \DeclareMathOperator{\id}{id} \DeclareMathOperator{\grad}{Grad} \newcommand{\ddiff}[2]{\,d#1d#2} \newcommand{\diff}{\,d} \newcommand{\measurediff}[1]{\,d #1} % Maßintegral \newcommand{\brackint}[1]{\Big[#1\Big]} \newcommand{\gradient}{\nabla} \newcommand{\totdif}[1]{D #1} \newcommand{\hessian}[1]{H_{#1}} \DeclareMathOperator{\jacobian}{J} \DeclareMathOperator{\logsumexp}{LogSumExp} \newcommand{\restricted}[1]{|_{#1}} % Optimierung \newcommand{\feasiset}[0]{\mathcal{F}} % Feasible Region/Set \DeclareMathOperator{\lagrangian}{\mathcal{L}} % Feasible Region/Set \DeclareMathOperator{\dualfunc}{\mathnormal{F}} % Numerik \DeclareMathOperator{\rd}{rd} % Maschinenoperationen \newcommand{\mplus}[0]{\oplus} \newcommand{\mminus}[0]{\ominus} \newcommand{\mtimes}[0]{\otimes} \newcommand{\mdiv}[0]{\oslash} % Zahlentheorie \DeclareMathOperator{\ggt}{ggT} \DeclareMathOperator{\kgv}{kgV} % Logik \DeclareMathOperator{\equi}{\approx} \newcommand{\notimplies}[0]{\centernot\implies} % Maßtheorie % Auf/Absteigend mit limes A \DeclareMathOperator{\asc}{\uparrow} \DeclareMathOperator{\desc}{\downarrow} \newcommand{\indic}[1]{ \mathbb{1}_{#1} } \newcommand{\indics}[1]{ \mathbb{1}\left\{ #1 \right\} } \newcommand{\extreal}[0]{ \overline{ \mathbb{R} } } \newcommand{\extborel}[0]{ \overline{ \mathcal{B} } } \newcommand{\lebesgraum}[0]{ { \mathcal{L} } } % Mengensysteme \newcommand{\setsys}[1]{ \mathcal{#1} } % Wahrscheinlichkeitstheorie / Stochastik \DeclareMathOperator{\prob}{\mathbb{P} } \newcommand{\probz}[1]{ \prob^{#1} } \DeclareMathOperator{\wmf}{\mathnormal{p}} \newcommand{\density}[0]{f} \DeclareMathOperator{\pex}{\mathbb{E} } \newcommand{\pexp}[1]{ \pex_{#1} } \newcommand{\pexn}[0]{ \pex\! } \DeclareMathOperator{\pfunc}{F} \newcommand{\pfuncx}[1]{ \pfunc^{#1}} \DeclareMathOperator{\var}{\mathbb{V} } \DeclareMathOperator{\std}{S} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\covmat}{\mathnormal{\Sigma}} \DeclareMathOperator{\cor}{\rho} \DeclareMathOperator{\retvar}{retVar} % retained variance \newcommand{\uiv}[0]{\stackrel{ \text{u.i.v.} }{ \sim }} \newcommand{\betafunc}[0]{B} % Stochastische Konvergenzen \newcommand{pto}[0]{\stackrel{\prob}{\to}} \newcommand{lpto}[1]{\stackrel{\mathcal{L}^{#1}}{\to}} \newcommand{fsto}[0]{\stackrel{\text{f.s.}}{\to}} \newcommand{dto}[0]{\stackrel{\mathcal{D}}{\to}} \newcommand{deq}[0]{\stackrel{\mathcal{D}}{=}} % Kombinatorik \DeclareMathOperator{\Per}{Per} \newcommand{\per}[2]{ \Per_{#2}^{#1} } \DeclareMathOperator{\Kom}{Kom} \newcommand{\kom}[2]{ \Kom_{#2}^{#1} } \newcommand{\fallfacun}[2]{ (#1)_{#2} } % fall-fac-under \newcommand{\fallfacup}[2]{ {#1}^{\underline{#2} } } % fall-fac-upper \newcommand{\given}[0]{\mid} % Diskrete Verteilungen \DeclareMathOperator{\hyp}{Hyp} \DeclareMathOperator{\bin}{Bin} \DeclareMathOperator{\ber}{Ber} \DeclareMathOperator{\nbin}{Nb} \DeclareMathOperator{\pol}{Pol} \DeclareMathOperator{\geo}{G} \DeclareMathOperator{\poi}{Po} \DeclareMathOperator{\mnom}{Mult} % Stetige Verteilungen \DeclareMathOperator{\normdist}{\mathcal{N}} \DeclareMathOperator{\uniform}{\mathcal{U}} \DeclareMathOperator{\cauchydist}{\mathcal{C}} \DeclareMathOperator{\expdist}{Exp} \DeclareMathOperator{\weibull}{Wei} \DeclareMathOperator{\gammadist}{\Gamma} \DeclareMathOperator{\gammafunc}{\Gamma} \newcommand{\chisqdist}[1]{\boldsymbol{\chi}^2_{#1}} \DeclareMathOperator{\betadist}{BE} \DeclareMathOperator{\studentt}{\tau} \DeclareMathOperator{\lognorm}{LN} % Statistik \newcommand{Chi}[0]{\mathcal{X}} \DeclareMathOperator{\bias}{\mathbb{B} } \DeclareMathOperator{\estimse}{\mathbb{F} } % Mittlerer quadratischer Fehler (MSE eines Estimators) \DeclareMathOperator{\likelihood}{L} \DeclareMathOperator{\loglikelihood}{\mathcal{L}} \DeclareMathOperator{\posterior}{\pi} \DeclareMathOperator{\intervalset}{\mathcal{I}} \DeclareMathOperator{\bic}{BIC} \DeclareMathOperator{\statprob}{P} \DeclareMathOperator{\estimator}{\mathnormal{T}} \DeclareMathOperator{\sampex}{\mathnormal{M}} \DeclareMathOperator{\sampvar}{\mathnormal{V}} \DeclareMathOperator{\corsampvar}{\sampvar^{*}} \DeclareMathOperator{\sampstd}{\sigma} \DeclareMathOperator{\confinterval}{\mathnormal{I}} \newcommand{\hypo}[1]{ H_{#1}} \DeclareMathOperator{\stattest}{\varphi} \newcommand{\testpower}[1]{G_{#1}} \newcommand{\typetwoerr}[1]{\beta_{#1}} \newcommand{\lratio}[2]{\mathnormal{R}_{#1:#2}} \newcommand{\arithmean}[1]{\overline{#1}} \newcommand{\median}[1]{\tilde{#1}} \DeclareMathOperator{\quantil}{Q} % Zeitreihen \newcommand{\stopro}[1]{(#1_{n})_{n\in\mathbb{N}_{0} }} \newcommand{\mapro}[1]{\stopro{#1}} \newcommand{\markovback}[1]{\sigma_{#1}} \DeclareMathOperator{\periode}{\mathnormal{d}} % Reinforcement Learning \newcommand{\states}{S} \newcommand{\startstate}{s^0} \newcommand{\targetstates}{S^t} \newcommand{\actions}{A} \DeclareMathOperator{\transprob}{P} \DeclareMathOperator{\reward}{R} \DeclareMathOperator{\rew}{rew} \DeclareMathOperator{\strat}{\pi} \DeclareMathOperator{\totalreward}{U} \newcommand{\generated}{\sim} \newcommand{\initgenerated}{\generated_0} \newcommand{\bellmann}{u} \DeclareMathOperator{\vialgo}{VI} \DeclareMathOperator{\pialgo}{PI} % Maschinelles Lernen \newcommand{\linmod}[1]{h_{#1}} \newcommand{\sigmoid}{h^{\text{logit} }} \newcommand{\logisticmodel}[1]{\sigmoid_{#1}} \newcommand{\logmod}[1]{\logisticmodel{#1}} \newcommand{\svm}[1]{\operatorname{clf}_{#1}} \newcommand{\knn}[1]{\operatorname{nearest}_{#1}} \newcommand{\maj}[0]{\operatorname{maj}} % für majority \newcommand{\bayes}[1]{\operatorname{clf}^{\text{Bayes}}_{#1}} \newcommand{\naivebayes}[1]{\operatorname{clf}^{\text{NaiveBayes}}_{#1}} \newcommand{\classifier}{\operatorname{clf}} \newcommand{\clf}{\classifier} \DeclareMathOperator{\kmeans}{KMEANS} \DeclareMathOperator{\kmeanspp}{\kmeans\!++} \DeclareMathOperator{\cluster}{cluster} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\att}{att} \DeclareMathOperator{\succ}{succ} \DeclareMathOperator{\agglo}{Agglo} \DeclareMathOperator{\diana}{DIANA} \DeclareMathOperator{\selectfeat}{SelectFeature} \DeclareMathOperator{\tdidt}{TDIDT} \DeclareMathOperator{\entropy}{H} \DeclareMathOperator{\ig}{IG} \newcommand{\regressor}{\operatorname{regr}} \newcommand{\regr}{\regressor} \DeclareMathOperator{\loss}{L} \DeclareMathOperator{\logitloss}{\loss^{\text{logit}}} \DeclareMathOperator{\qfloss}{\loss^{\text{qF}}} % quadratischer Fehler \DeclareMathOperator{\aeloss}{\loss^{\text{auto}}} % Autoencoder Fehler \DeclareMathOperator{\inertia}{inertia} \DeclareMathOperator{\acc}{acc} \DeclareMathOperator{\prec}{prec} \DeclareMathOperator{\rec}{rec} \DeclareMathOperator{\fscore}{F1} \DeclareMathOperator{\kernel}{K} \newcommand{\tp}{\mathcal{TP}} \newcommand{\tn}{\mathcal{TN}} \newcommand{\fp}{\mathcal{FP}} \newcommand{\fn}{\mathcal{FN}} \newcommand{\overstar}[1]{#1^{*}} \DeclareMathOperator{determinationcoeff}{R^2} \DeclareMathOperator{regularizer}{\mathnormal{R}} \newcommand{\tikhonov}{\regularizer_{T}} \newcommand{\tikhonovloss}{\loss_{\regularizer_{T}}} \newcommand{\hingeloss}{\loss^{\text{hinge}}} \newcommand{\dag}[1]{#1^{\dagger}} \newcommand{\assoc}{\Rightarrow} \DeclareMathOperator{\support}{support} \DeclareMathOperator{\conf}{conf} \DeclareMathOperator{\freqitems}{FreqItems} \DeclareMathOperator{\apriori}{Apriori} \DeclareMathOperator{\fptree}{FPTree} \newcommand{\minsupp}{\mathrm{minsupp}} \newcommand{\minconf}{\mathrm{minconf}} \newcommand{\null}{\mathit{null}} \DeclareMathOperator{\mult}{mult} \newcommand{\conditioned}[2]{#1 | #2} \DeclareMathOperator{\fpattern}{\mathcal{fp}} \DeclareMathOperator{\fpgrowth}{FPGrowth} \DeclareMathOperator{\iqr}{IQR} \DeclareMathOperator{\qfunc}{Q} \DeclareMathOperator{\activation}{act} \newcommand{\act}{\activation} \DeclareMathOperator{\signum}{sgn} \newcommand{\idact}{h^\text{id}} \newcommand{\thresh}{h^\text{thresh}} \newcommand{\identity}{h^\text{id}} \newcommand{\relu}{h^\text{relu}} \newcommand{\tanh}{h^\text{tanh}} \newcommand{\softplus}{h^\text{splus}} \newcommand{\splus}{\softplus} \newcommand{\softmax}{h^\text{softmax}} \newcommand{\nnlinears}[1]{z^{(#1)}} % vector of linear parts \newcommand{\nnlinear}[2]{\nnlinears{#1}_{#2}} % linear part of neuron Wx \newcommand{\nnneurons}[1]{a^{(#1)}} % Vector of Neurons \newcommand{\nnneuron}[2]{\nnneurons{#1}_{#2}} % Particular Neuron \newcommand{\allweights}{\mathbb{W}} % Menge aller Gewichte \newcommand{\nnweights}[1]{W^{(#1)}} % Weightmatrix as a whole \newcommand{\nnweight}[3]{\nnweights{#1}_{#2,#3}} % Specific Weight within Weightmatrix \DeclareMathOperator{\cost}{cost} \DeclareMathOperator{\backprop}{BackProp} \DeclareMathOperator{\bpdelta}{\delta} \newcommand{\nnlayerfunc}[1]{f_{#1}} \DeclareMathOperator{\nnfunc}{\phi_\mathbb{W}} \DeclareMathOperator{\count}{count} \DeclareMathOperator{\sumf}{sum} \newcommand{\featspace}{\mathbb{F}} \newcommand{\featrep}[1]{R_{#1}} \newcommand{\feathist}[1]{H_{#1}} \DeclareMathOperator{\similarity}{sim} \newcommand{\sigsim}[3]{ {\langle#1 ,#2\rangle}_{#3} } \DeclareMathOperator{\tf}{tf} \DeclareMathOperator{\idf}{idf} \DeclareMathOperator{\tfidf}{\tf-\idf} \DeclareMathOperator{\convop}{\star} % Farben \newcommand{\gray}[1]{ {\color{Grey} #1} } \newcommand{\blue}[1]{ {\color{ProcessBlue} #1} } \newcommand{\brown}[1]{ {\color{brown} #1} } \newcommand{\cyan}[1]{ {\color{cyan} #1} } \newcommand{\green}[1]{ {\color{green} #1} } \newcommand{\magenta}[1]{ {\color{magenta} #1} } \newcommand{\orange}[1]{ {\color{orange} #1} } \newcommand{\pink}[1]{ {\color{pink} #1} } \newcommand{\purple}[1]{ {\color{Purple} #1} } \newcommand{\red}[1]{ {\color{BrickRed} #1} } \newcommand{\violet}[1]{ {\color{violet} #1} } % /Latex Preamble % $$** - `Eigenschaften`: - [[Reinforcement Learning/Theorem - Iterative Berechnung des Zustandsnutzens bezüglich einer Strategie konvergiert|Iterative Berechnung des Zustandsnutzens bezüglich einer Strategie konvergiert]] - `Konstrukte/Folgerungen`: - [[Reinforcement Learning/Algorithmus - Policy Iteration|Policy Iteration]] - `Konstrukte`: - [[Reinforcement Learning/Algorithmus - Policy Iteration|Policy Iteration]] - `Involvierte Definitionen`: - [[Reinforcement Learning/Definition - Zustandsnutzen bezüglich einer Strategie|Zustandsnutzen bezüglich einer Strategie]] - [[Reinforcement Learning/Theorem - Rekursive Charakterisierung des Zustandsnutzens bezüglich einer Strategie|Rekursive Charakterisierung des Zustandsnutzens bezüglich einer Strategie]] - [[Reinforcement Learning/Definition - Approximierter Zustandsnutzen bezüglich einer Strategie|Approximierter Zustandsnutzen bezüglich einer Strategie]] - siehe auch [[Reinforcement Learning/Theorem - Bellmann-Update|Iterative Berechnung des optimalen Zustandsnutzens]] - `Veranstaltung`: [[Notizen/Einführung in Maschinelles Lernen|EML]] - `Referenz`: [[Notizen/@thimm2024]] (Abschnitt 4.1.3) # ⠀ > [!Theorem] Definition: Iterative Berechnung des Zustandsnutzens bezüglich einer Strategie > > > > Sei $D=(\states, \actions, \transprob, \reward, \startstate, \targetstates)$ ein Markov-Entscheidungsprozess. > > Den Zustandsnutzen bezüglich einer Strategie $\strat$ können wir iterativ berechnen durch das Bellman-Update: > > $$ > \bellmann_{i+1}(s\given \strat) := \sum_{s'\in \states} \transprob(s,\strat(s),s')\cdot (\reward(s,\strat(s),s')+\gamma \bellmann_{i}(s'\given \strat)) > $$ # Anmerkung > [!TIP] Nutzung für den Approximierten Zustandsnutzen bezüglich einer Strategie > > > > Für den approximierten Zustandsnutzen können wir diese Methode äquivalent anwenden.