The meat of yesterday’s class focused on one of the cases of policy optimization highlighted yesterday. We wanted to choose between an infinite set of actions (i.e., the optimization variable was a real-valued vector). We only wanted to make a single decision. And we wanted to do this using samples from a probability distribution rather than a model. This was “Case 4” in yesterday’s blog for those keeping track at home. Simple instances of these problems are solvable by stochastic gradient descent.

Eli Bronstein asked me a compelling question: if binary decision making reduced to hypothesis testing, did continuous decision-making reduce to some sort of prediction problem too? If we were solving these problems using stochastic gradient descent, did that mean that deep down they were machine learning problems?

I think the answer is no. Perhaps one of you, my dear readers, can see a reduction of this general problem to machine learning, and I would be very interested if this was the case! But today let me argue why stochastic optimization is more general through a few examples. I’ll propose three examples of policy optimization from Case 4 that push us away from problems that are simply about prediction. In each example, I will highlight the main components: the action to be taken, the data that the policymaker has on hand, and the random model of uncertainty.

I also want to tie my hands a bit. I only want to propose problems that can be solved by first-order methods like SGD. So these examples will be very simple. Moreover, I am going to restrict to problems with a single decision. This makes the examples a bit less compelling as well. As we’ll see next week, the problem space becomes much more expansive when we make multiple decisions sequentially.

**Portfolio optimization**

Portfolio optimization, formulated by Markowitz in 1952, is one of the first stochastic programming problems ever proposed. You have a bunch of money that you want to invest today and hope to sell at a future time. You’d like to balance your expected return against potential volatility. One way to do this is to sum the expected return with the expected volatility, using a parameter g that balances your risk tolerance:

```
maximize E[return of portfolio - g x (fluctuations of portfolio)]
subject to fixed budget constraint
```

This balances return and risk in one cost function. The return is a *linear* function of your allocation and the fluctuations will equal the variance. The policy here is the portfolio allocation, and the randomness is the natural fluctuation of asset prices.

If you know the mean and variance of w, this problem becomes a *quadratic program*, that can be plugged into a variety of optimization solvers. If you don’t know the mean and variance but have a simulator that you trust, you could run stochastic gradient descent to compute an allocation.

**Stocking with uncertain demand**

Let’s say you are running a shop and want to know how much of every item in your store you’d like to order for the week. The amount of each item sold will equal the minimum of the demand and the amount you have in the shop. You would like to have enough to satisfy demand, but not too many or else you’ll be stuck with excess stock. This problem can be formulated as the stochastic optimization

```
maximize E[sum of profits]
subject to (1) budget constraint
and (2) profit = sale price x minimum(demand,quantity) - cost x quantity
```

Here, the policy is the quantity of each item you purchase. The random variable is demand. The cost here is piecewise linear, which makes it a bit trickier to compute in closed form. But if you could sample from the demand model, you could once again minimize this using SGD.

**Fermat–Weber Location Problem**

For the final example, you want to place a wifi transmitter in a new office. You don’t know exactly where people will be during the day, but you have a reasonable stochastic simulator for them. You’d like to place the transmitter as close to the people as possible to maximize their wifi signal. We can pose this one as a stochastic minimization problem.

`minimize E[distance of people to transmitter]`

If you have a simulator of the random locations of people, you can minimize this using a stochastic gradient method. This problem is a generalization of the *Fermat–Weber Location Problem*. Gradient algorithms for this method are discussed in this paper by Beck and Teboulle.

Can you think of other simple, single-stage, stochastic policy problems? Help us out with other examples in the comments!