Radiation zone

A Bellman equation, also known as a dynamic programming equation, named after its discoverer, Richard Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into simpler subproblems, as Bellman's Principle of Optimality prescribes.

The Bellman equation was first applied to engineering control theory and to other topics in applied mathematics, and subsequently became an important tool in economic theory.

Almost any problem which can be solved using optimal control theory can also be solved by analyzing the appropriate Bellman equation. However, the term 'Bellman equation' usually refers to the dynamic programming equation associated with discrete-time optimization problems. In continuous-time optimization problems, the analogous equation is a partial differential equation which is usually called the Hamilton–Jacobi–Bellman equation.

Analytical concepts in dynamic programming

To understand the Bellman equation, several underlying concepts must be understood. First, any optimization problem has some objective – minimizing travel time, minimizing cost, maximizing profits, maximizing utility, et cetera. The mathematical function that describes this objective is called the objective function.

Dynamic programming breaks a multi-period planning problem into simpler steps at different points in time. Therefore, it requires keeping track of how the decision situation is evolving over time. The information about the current situation which is needed to make a correct decision is called the state (See Bellman, 1957, Ch. III.2).^[1]^[2] For example, to decide how much to consume and spend at each point in time, people would need to know (among other things) their initial wealth. Therefore, wealth would be one of their state variables, but there would probably be others.

The variables chosen at any given point in time are often called the control variables. For example, given their current wealth, people might decide how much to consume now. Choosing the control variables now may be equivalent to choosing the next state; more generally, the next state is affected by other factors in addition to the current control. For example, in the simplest case, today's wealth (the state) and consumption (the control) might exactly determine tomorrow's wealth (the new state), though typically other factors will affect tomorrow's wealth too.

The dynamic programming approach describes the optimal plan by finding a rule that tells what the controls should be, given any possible value of the state. For example, if consumption (c) depends only on wealth (W), we would seek a rule $c (W)$ that gives consumption as a function of wealth. Such a rule, determining the controls as a function of the states, is called a policy function (See Bellman, 1957, Ch. III.2).^[1]

Finally, by definition, the optimal decision rule is the one that achieves the best possible value of the objective. For example, if someone chooses consumption, given wealth, in order to maximize happiness (assuming happiness H can be represented by a mathematical function, such as a utility function), then each level of wealth will be associated with some highest possible level of happiness, $H (W)$ . The best possible value of the objective, written as a function of the state, is called the value function.

Richard Bellman showed that a dynamic optimization problem in discrete time can be stated in a recursive, step-by-step form by writing down the relationship between the value function in one period and the value function in the next period. The relationship between these two value functions is called the Bellman equation.

Deriving the Bellman equation

A dynamic decision problem

Let the state at time $t$ be $x_{t}$ . For a decision that begins at time 0, we take as given the initial state $x_{0}$ . At any time, the set of possible actions depends on the current state; we can write this as $a_{t} \in Γ (x_{t})$ , where the action $a_{t}$ represents one or more control variables. We also assume that the state changes from $x$ to a new state $T (x, a)$ when action $a$ is taken, and that the current payoff from taking action $a$ in state $x$ is $F (x, a)$ . Finally, we assume impatience, represented by a discount factor $0 < β < 1$ .

Under these assumptions, an infinite-horizon decision problem takes the following form:

V (x_{0}) = \max_{{a_{t}}_{t = 0}^{\infty}} \sum_{t = 0}^{\infty} β^{t} F (x_{t}, a_{t}),

subject to the constraints

a_{t} \in Γ (x_{t}), x_{t + 1} = T (x_{t}, a_{t}), \forall t = 0, 1, 2, \dots

Notice that we have defined notation $V (x_{0})$ to represent the optimal value that can be obtained by maximizing this objective function subject to the assumed constraints. This function is the value function. It is a function of the initial state variable $x_{0}$ , since the best value obtainable depends on the initial situation.

Bellman's Principle of Optimality

The dynamic programming method breaks this decision problem into smaller subproblems. Richard Bellman's Principle of Optimality describes how to do this:

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)^[1]^[2]^[3]

In computer science, a problem that can be broken apart like this is said to have optimal substructure. In the context of dynamic game theory, this principle is analogous to the concept of subgame perfect equilibrium, although what constitutes an optimal policy in this case is conditioned on the decision-maker's opponents choosing similarly optimal policies from their points of view.

As suggested by the Principle of Optimality, we will consider the first decision separately, setting aside all future decisions (we will start afresh from time 1 with the new state $x_{1}$ ). Collecting the future decisions in brackets on the right, the previous problem is equivalent to:

\max_{a_{0}} {F (x_{0}, a_{0}) + β [\max_{{a_{t}}_{t = 1}^{\infty}} \sum_{t = 1}^{\infty} β^{t - 1} F (x_{t}, a_{t}) : a_{t} \in Γ (x_{t}), x_{t + 1} = T (x_{t}, a_{t}), \forall t \geq 1]}

subject to the constraints

a_{0} \in Γ (x_{0}), x_{1} = T (x_{0}, a_{0}) .

Here we are choosing $a_{0}$ , knowing that our choice will cause the time 1 state to be $x_{1} = T (x_{0}, a_{0})$ . That new state will then affect the decision problem from time 1 on. The whole future decision problem appears inside the square brackets on the right.

The Bellman equation

So far it seems we have only made the problem uglier by separating today's decision from future decisions. But we can simplify by noticing that what is inside the square brackets on the right is the value of the time 1 decision problem, starting from state $x_{1} = T (x_{0}, a_{0})$ .

Therefore we can rewrite the problem as a recursive definition of the value function:

V (x_{0}) = \max_{a_{0}} {F (x_{0}, a_{0}) + β V (x_{1})}

, subject to the constraints:

a_{0} \in Γ (x_{0}), x_{1} = T (x_{0}, a_{0}) .

This is the Bellman equation. It can be simplified even further if we drop time subscripts and plug in the value of the next state:

V (x) = \max_{a \in Γ (x)} {F (x, a) + β V (T (x, a))} .

The Bellman equation is classified as a functional equation, because solving it means finding the unknown function V, which is the value function. Recall that the value function describes the best possible value of the objective, as a function of the state x. By calculating the value function, we will also find the function a(x) that describes the optimal action as a function of the state; this is called the policy function.

The Bellman equation in a stochastic problem

DTZ's public sale group in Singapore auctions all forms of residential, workplace and retail properties, outlets, homes, lodges, boarding homes, industrial buildings and development websites. Auctions are at present held as soon as a month.

We will not only get you a property at a rock-backside price but also in an space that you've got longed for. You simply must chill out back after giving us the accountability. We will assure you 100% satisfaction. Since we now have been working in the Singapore actual property market for a very long time, we know the place you may get the best property at the right price. You will also be extremely benefited by choosing us, as we may even let you know about the precise time to invest in the Singapore actual property market.

The Hexacube is offering new ec launch singapore business property for sale Singapore investors want to contemplate. Residents of the realm will likely appreciate that they'll customize the business area that they wish to purchase as properly. This venture represents one of the crucial expansive buildings offered in Singapore up to now. Many investors will possible want to try how they will customise the property that they do determine to buy by means of here. This location has offered folks the prospect that they should understand extra about how this course of can work as well.

Singapore has been beckoning to traders ever since the value of properties in Singapore started sky rocketing just a few years again. Many businesses have their places of work in Singapore and prefer to own their own workplace area within the country once they decide to have a everlasting office. Rentals in Singapore in the corporate sector can make sense for some time until a business has discovered a agency footing. Finding Commercial Property Singapore takes a variety of time and effort but might be very rewarding in the long term.

is changing into a rising pattern among Singaporeans as the standard of living is increasing over time and more Singaporeans have abundance of capital to invest on properties. Investing in the personal properties in Singapore I would like to applaud you for arising with such a book which covers the secrets and techniques and tips of among the profitable Singapore property buyers. I believe many novice investors will profit quite a bit from studying and making use of some of the tips shared by the gurus." – Woo Chee Hoe Special bonus for consumers of Secrets of Singapore Property Gurus Actually, I can't consider one other resource on the market that teaches you all the points above about Singapore property at such a low value. Can you? Condominium For Sale (D09) – Yong An Park For Lease

In 12 months 2013, c ommercial retails, shoebox residences and mass market properties continued to be the celebrities of the property market. Models are snapped up in report time and at document breaking prices. Builders are having fun with overwhelming demand and patrons need more. We feel that these segments of the property market are booming is a repercussion of the property cooling measures no.6 and no. 7. With additional buyer's stamp responsibility imposed on residential properties, buyers change their focus to commercial and industrial properties. I imagine every property purchasers need their property funding to understand in value.

In the deterministic setting, other techniques besides dynamic programming can be used to tackle the above optimal control problem. Although the agent has to account for the stochasticity, this approach becomes convenient for certain problems.

For a specific example from economics, consider an infinitely-lived consumer with initial wealth endowment a₀ at period 0. He has an instantaneous utility function u(c) where c denotes consumption and discounts the next period utility at a rate of 0<β<1. Assume what is not consumed in period t carries over next period with interest rate r. Then the consumer's utility maximization problem is to choose a consumption plan {c_t} that solves

\max \sum_{0}^{\infty} β^{t} u (c_{t})

subject to

a_{t + 1} = (1 + r) (a_{t} - c_{t}), c_{t} \geq 0,

and

\lim_{t \to \infty} a_{t} \geq 0 .

The first constraint is the capital accumulation/law of motion specified by the problem, while the second constraint is a transversality condition that the consumer does not carry debt at the end of his life. The Bellman equation is

V (a) = \max_{0 \leq c \leq a} {u (c) + β V ((1 + r) (a - c))},

Alternatively, one can treat the sequence problem directly using, for example, the Hamiltonian equations.

Now, if the interest rate varies from period to period, the consumer is face with a stochastic optimization problem. Let the interest r follow a Markov process with probability transition function Q(r, dμ_r) where dμ_r denotes the probability measure governing the distribution of interest rate next period if current interest rate is r. The timing of the model is that the consumer decides his current period consumption after the current period interest rate is announced.

Rather than simply choosing a single sequence {c_t}, the consumer now must chose a sequence {c_t} for each possible realization of a {r_t} in such a way that his lifetime expected utility is maximized:

\max E (\sum_{0}^{\infty} β^{t} u (c_{t})) .

The expectation E is taken with respect to the appropriate probability measure given by Q on the sequences of r's. Because r is governed by a Markov process, dynamic programming simplifies the problem significantly. Then Bellmann equation is simply

V (a, r) = \max_{0 \leq c \leq a} {u (c) + β \int V ((1 + r) (a - c), r^{'}) Q (r, d μ_{r})} .

Under some reasonable assumption, the resulting optimal policy function g(a,r) is measurable.

For a general stochastic sequential optimization problem with Markovian shocks and where the agent is faced with his decision ex-post, the Bellmann equation takes a very similar form

V (x, z) = \max_{c \in Γ (x, z)} F (x, c, z) + β \int V (T (x, c), z^{'}) d μ_{z} (z^{'}) .

Solution methods

The method of undetermined coefficients, also known as 'guess and verify', can be used to solve some infinite-horizon, autonomous Bellman equations.

The Bellman equation can be solved by backwards induction, either analytically in a few special cases, or numerically on a computer. Numerical backwards induction is applicable to a wide variety of problems, but may be infeasible when there are many state variables, due to the curse of dimensionality. Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function.^[4] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters.

By calculating the first-order conditions associated with the Bellman equation, and then using the envelope theorem to eliminate the derivatives of the value function, it is possible to obtain a system of difference equations or differential equations called the 'Euler equations'. Standard techniques for the solution of difference or differential equations can then be used to calculate the dynamics of the state variables and the control variables of the optimization problem.

Applications in economics

The first known application of a Bellman equation in economics is due to Martin Beckmann and Richard Muth.^[5] Martin Beckmann also wrote extensively on consumption theory using the Bellman equation in 1959. His work influenced Edmund S. Phelps, among others.

A celebrated economic application of a Bellman equation is Merton's seminal 1973 article on the intertemporal capital asset pricing model.^[6] (See also Merton's portfolio problem).The solution to Merton's theoretical model, one in which investors chose between income today and future income or capital gains, is a form of Bellman's equation. Because economic applications of dynamic programming usually result in a Bellman equation that is a difference equation, economists refer to dynamic programming as a "recursive method" and a subfield of recursive economics is now recognized within Economics.

Stokey, Lucas & Prescott describe stochastic and nonstochastic dynamic programming in considerable detail, and develop theorems for the existence of solutions to problems meeting certain conditions. They also describe many examples of modeling theoretical problems in economics using recursive methods.^[7] This book led to dynamic programming being employed to solve a wide range of theoretical problems in economics, including optimal economic growth, resource extraction, principal–agent problems, public finance, business investment, asset pricing, factor supply, and industrial organization. Ljungqvist & Sargent apply dynamic programming to study a variety of theoretical questions in monetary policy, fiscal policy, taxation, economic growth, search theory, and labor economics.^[8] Dixit & Pindyck showed the value of the method for thinking about capital budgeting.^[9] Anderson adapted the technique to business valuation, including privately held businesses.^[10]

Using dynamic programming to solve concrete problems is complicated by informational difficulties, such as choosing the unobservable discount rate. There are also computational issues, the main one being the curse of dimensionality arising from the vast number of possible actions and potential state variables that must be considered before an optimal strategy can be selected. For an extensive discussion of computational issues, see Miranda & Fackler,^[11] and Meyn 2007.^[12]

Example

In MDP, a Bellman equation refers to a recursion for expected rewards. For example, the expected reward for being in a particular state s and following some fixed policy $π$ has the Bellman equation:

V^{π} (s) = R (s) + γ \sum_{s^{'}} P (s^{'} | s, π (s)) V^{π} (s^{'}) .

This equation describes the expected reward for taking the action prescribed by some policy $π$ .

The equation for the optimal policy is referred to as the Bellman optimality equation:

V^{*} (s) = R (s) + \max_{a} γ \sum_{s^{'}} P (s^{'} | s, a) V^{*} (s^{'}) .

It describes the reward for taking the action giving the highest expected return.

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

↑ ^1.0 ^1.1 ^1.2 Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
↑ ^2.0 ^2.1 S. Dreyfus (2002), 'Richard Bellman on the birth of dynamic programming' Operations Research 50 (1), pp. 48–51.
↑ R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952
↑ Bertsekas, D. P., Tsitsiklis, J. N., Neuro-dynamic programming. Athena Scientific, 1996
↑ Martin Beckmann and Richard Muth, 1954, "On the solution to the fundamental equation of inventory theory," Cowles Commission Discussion Paper 2116.
↑ Robert C. Merton, 1973, "An Intertemporal Capital Asset Pricing Model," Econometrica 41: 867–887.
↑ *Nancy Stokey, and Robert E. Lucas, with Edward Prescott, 1989. Recursive Methods in Economic Dynamics. Harvard Univ. Press.
↑ Lars Ljungqvist & Thomas Sargent, 2004. Recursive Macroeconomic Theory. MIT Press.
↑ Avinash Dixit & Robert Pindyck, 1994. Investment Under Uncertainty. Princeton Univ. Press.
↑ Anderson, Patrick L., Business Economics & Finance, CRC Press, 2004 (chapter 10), ISBN 1-58488-348-0; The Value of Private Businesses in the United States, Business Economics (2009) 44, 87–108. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.. Economics of Business Valuation, Stanford University Press (2013); ISBN 9780804758307. Stanford Press
↑ Miranda, M., & Fackler, P., 2002. Applied Computational Economics and Finance. MIT Press
↑ S. P. Meyn, 2007. Control Techniques for Complex Networks, Cambridge University Press, 2007. Appendix contains abridged Meyn & Tweedie.

[BellmanDP-1] 1.0 ^1.1 ^1.2 Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.

[dreyfus-2] 2.0 ^2.1 S. Dreyfus (2002), 'Richard Bellman on the birth of dynamic programming' Operations Research 50 (1), pp. 48–51.

[BellmanTheory-3] R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952

[NeuroDynProg-4] Bertsekas, D. P., Tsitsiklis, J. N., Neuro-dynamic programming. Athena Scientific, 1996

[5] Martin Beckmann and Richard Muth, 1954, "On the solution to the fundamental equation of inventory theory," Cowles Commission Discussion Paper 2116.

[6] Robert C. Merton, 1973, "An Intertemporal Capital Asset Pricing Model," Econometrica 41: 867–887.

[7] *Nancy Stokey, and Robert E. Lucas, with Edward Prescott, 1989. Recursive Methods in Economic Dynamics. Harvard Univ. Press.

[8] Lars Ljungqvist & Thomas Sargent, 2004. Recursive Macroeconomic Theory. MIT Press.

[9] Avinash Dixit & Robert Pindyck, 1994. Investment Under Uncertainty. Princeton Univ. Press.

[10] Anderson, Patrick L., Business Economics & Finance, CRC Press, 2004 (chapter 10), ISBN 1-58488-348-0; The Value of Private Businesses in the United States, Business Economics (2009) 44, 87–108. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.. Economics of Business Valuation, Stanford University Press (2013); ISBN 9780804758307. Stanford Press

[11] Miranda, M., & Fackler, P., 2002. Applied Computational Economics and Finance. MIT Press

[12] S. P. Meyn, 2007. Control Techniques for Complex Networks, Cambridge University Press, 2007. Appendix contains abridged Meyn & Tweedie.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Radiation zone

Contents

Analytical concepts in dynamic programming

Deriving the Bellman equation

A dynamic decision problem

Bellman's Principle of Optimality

The Bellman equation

The Bellman equation in a stochastic problem

Solution methods

Applications in economics

Example

See also

References

Navigation menu

Radiation zone

Analytical concepts in dynamic programming

Deriving the Bellman equation

A dynamic decision problem

Bellman's Principle of Optimality

The Bellman equation

The Bellman equation in a stochastic problem

Solution methods

Applications in economics

Example

See also

References

Navigation menu

Search