Robust Portfolio Optimization in Models with Diminishing Returns

Table of Contents

Introduction

In our last post, we demonstrated how uncertainty can lead us to spread our resources across multiple “assets”, even when we have some data indicating which asset has the best return on investment.

Many economic models exhibit diminishing returns; that is, as we increase investment on a particular asset, the return on investment associated with that asset decreases. In models that incorporate production costs, the return may even be negative past a certain point. This phenomenon makes optimization more challenging, but the basic conclusion from the last post still holds: uncertainty introduces additional risk into the portfolio allocation problem that can be mitigated through robust optimization.

Models with Diminishing Returns

Suppose we wish to allocate a budget $C$ across $m \geq 2$ assets whose performances are characterized by (unknown) parameters $\beta$. If we allocate $c_i$ resources to asset $i$, with $c_i \geq 0$ for all $i$ and $\sum_i c_i \leq C$, we suppose the “return” is given by $f(c; \beta)$. “Return” could mean profit, topline revenue, total sales, anything we wish to maximize. Likewise, our “budget” could be money or any finite, fungible resource that can be utilized by any asset.

If $\left. \frac{\partial^2 f}{\partial c_i^2} \right|_x \leq 0$ for all $i$ and at all levels of investment $x$, then $f$ has diminishing returns. Notably, we do not consider models that exhibit increasing returns at low levels of investment followed by decreasing returns past some threshold. Such models would have $\left. \frac{\partial^2 f}{\partial c_i^2} \right|_x \leq 0$ for all $x \geq X$, but there would be no such constraint on returns below this level of investment. If realistic levels of investment exceed this threshold, we might as well model the performance as being consistently diminishing, which greatly simplifies the approach and reduces the computational requirements.

We do allow for models where $\left. \frac{\partial^2 f}{\partial c_i^2} \right|_x = 0$ for arbitrarily high $x$, for at least one $i$. For such models, we can continue increasing investment indefinitely without observing diminishing returns. The model considered in our last post, $f(c, \beta) = c^T \beta$, falls under this category. If $i = \mathrm{argmax} \{ \beta \}$ (and $\beta_i > 0$) then we should invest our entire budget in asset $i$. As we saw in the last post, uncertainty in the parameters $\beta$ still encourages diversification.

We assume further that $f$ is concave, or log-concave, in $c$ when holding $\beta$ fixed ($f$ is log-concave in $c$ if $\log f$ is concave in $c$). If $f$ is log-concave but not concave, we focus instead on $g = \log f$ without further distinction. Clearly, maximizing $g$ is the same as maximizing $f$. We also assume $f$ (or $g$, as the case may be) is convex in $\beta$ when holding $c$ fixed. This might seem an odd coincidence, but here are some examples:

  • $f(c, \beta) = c^T \beta$: linear in both $c$ and $\beta$, when holding the other fixed. Since linear functions are both concave and convex, the requirements hold.
  • $f(c, \beta) = \alpha \cdot \Pi_i c_i^{\beta_i}$ (the Cobb-Douglas production function) is convex in $\beta$ (let $y_i = \log c_i$; then $f(y, \beta) = \exp(y^T \beta + \log(\alpha))$, which is clearly convex in $\beta$) and concave in $c$ for $0 \preceq \beta \preceq 1$. Typically we would constrain $\beta$ to be between 0 and 1 during model estimation to ensure concavity in $c$.
  • $f(c, \beta) = \sum_i h_i (c_i; \beta_i)$, where $h_i$ are nonparametric, smooth functions concave in the first argument. Such functions are typically estimated as a sum of basis functions with weights $\beta$. These functions can be estimated directly from data with shape constraints like concavity imposed. This approach is appealing because while concavity in $c$ is often mandated by economic realities, convexity in $\beta$ might seem too good to be true. But this formulation demonstrates we can be perfectly general. Nonparametric Estimation Under Shape Constraints by Piet Groeneboom and Geurt Jongbloed covers this fascinating topic (disclosure: I haven’t read this book).

Optimization

In this section, we assume we have a point estimate of $\beta$, with negligible uncertainty. In a subsequent section we will show how uncertainty in $\beta$ introduces additional risk to be mitigated. We wish to solve:

$$ \begin{array}{cl} \mbox{maximize} & f(c; \beta) \\\ \mbox{subject to} & c \succeq 0 \\\ & \mathbf{1}^T c \leq C \end{array} $$ We emphasize here that $\beta$ is fixed, so that $c$ is the optimization variable. It is tempting to replace the final inequality with equality, but of course the optimal allocation may make full use of the budget, and we can characterize situations where this is not the case. Such situations are readily interpretable.

The optimality criteria are: $$ \begin{align} -\left.\frac{\partial f}{\partial c_i}\right|_{c^\star} - \lambda_i^\star + \nu^\star &= 0, \; i=1, \ldots, m\\\ c^\star \succeq 0, \; \mathbf{1}^T c^\star &\leq C \quad \textrm{(primal feasibility)} \\\ \lambda^\star \succeq 0, \; \nu^\star &\geq 0 \quad \textrm{(dual feasibility)} \\\ \lambda_i^\star c_i^\star = 0, \; i=1, \ldots, m &\quad \textrm{(complementary slackness)} \\\ \nu^\star \cdot (C - \mathbf{1}^T c^\star) = 0 &\quad \textrm{(complementary slackness)}. \end{align} $$ (See Convex Optimization by Boyd and Vandenberghe for details.) Here, $\lambda_i^\star$ is the Lagrange multiplier associated with the constraint $c_i \geq 0$, and $\nu^\star$ is the multiplier associated with $\mathbf{1}^T c \leq C$.

Let us first suppose that we use our full budget, so that $\mathbf{1}^T c^\star = C$. Suppose that $c_i^\star > 0$ for some $i$. In that case, complementary slackness requires that $\lambda_i^\star = 0$, so the optimality condition is $\left.\frac{\partial f}{\partial c_i}\right|_{c^\star} = \nu^\star$. This is true for all assets with nonzero investment. We therefore interpret $\nu^\star$ as the marginal return on investment. The optimal allocation has the property that the marginal return is equal with respect to all assets with non-zero investment.

Now suppose $c_i^\star = 0$ (but $\mathbf{1}^T c^\star = C$ still). Now the optimality condition is: $\lambda_i^\star = \nu^\star - \left. \frac{\partial f}{\partial c_i}\right|_{c^\star},$ with $\lambda_i^\star \geq 0$. That can only be the case if $\left. \frac{\partial f}{\partial c_i}\right|_{c^\star} \leq \nu^\star.$ That is, the only reason we would choose not to invest in a particular asset is if the marginal return with respect to the asset is less than $\nu^\star$, the marginal return associated with the portfolio, and in this case, the corresponding Lagrange multiplier, $\lambda_i^\star$, is the amount of discrepancy.

Next suppose the optimal allocation spends less than the amount budgeted: $\mathbf{1}^T c^\star < C$. Now complementary slackness requires $\nu^\star = 0$. We will see that $\nu^\star$ is still the marginal return. If $c_i^\star > 0$, then $\lambda_i^\star = 0$ as before. Now the optimality condition is $\left. \frac{\partial f}{\partial c_i} \right|_{c^\star} = 0$. That is, when we spend less than our total budget, it can only be because any assets with non-zero investment are associated with zero marginal return. And finally, when $c_i^\star = 0$, then the optimality condition is $\left. \frac{\partial f}{\partial c_i} \right|_{c^\star} = - \lambda_i^\star \leq 0.$ That is, the only reason we wouldn’t invest in an asset at all when spending less than the full budget, is if the associated marginal return is non-positive.

In summary, the Lagrange multipliers provide valuable insights about the marginal returns. We would likely miss these insights completely if we just plugged this problem into some solver!

(This section, and especially the focus on Lagrange multipliers, was influenced by the paper, “Allocation of Fungible Resources via a Fast, Scalable Price Discovery Method” by A. Agrawal, S. Boyd, D. Narayanan, F. Kazhamiaka, and M. Zaharia, 2021.)

Averaging over a Posterior Distribution

Our key theme is that point estimates involve uncertainty often neglected in optimization. Too often, we treat optimization as being separate from estimation.

Now suppose in addition to a point estimate for $\beta$, we have a full posterior distribution. This permits optimizing for the expected return, averaged over the full posterior. When $f(c; \beta)$ is concave in $c$ (the log-concave case will be discussed next), we can simply replace $f$ with $h(c) = E_{\mathcal{B}}[f(c; \beta)]$, where $\mathcal{B}$ is the posterior distribution of $\beta$. Since integration preserves concavity, if $f$ is concave in $c$, then so is $h$. The Lagrange multipliers have a similar interpretation, but with respect to expected marginal returns, averaged over the posterior.

To solve the optimization problem, we have to evaluate the quantities $h$, $\nabla h$, and $\nabla^2 h$. These integrals may be difficult to evaluate. Monte Carlo techniques, such as Importance Sampling, may be useful. (My go-to reference is Monte Carlo Strategies in Scientific Computing by Jun S. Liu.) In the simplest form, we can approximate $h(c) = \frac{1}{b} \sum_{k=1}^b f(c, \beta^{(k)})$, where $\beta^{(k)}$ are samples from the posterior.

When $f$ is log-concave, we run into difficulties, since log-concavity is not necessarily closed under integration. That is, $h(c) = \int f(c; \beta) d\mathcal{B}$ is not necessarily log-concave. Certainly we can define $h(c) = \int \log f(c; \beta) d\mathcal{B}$, which is concave when $f$ is log-concave in $c$, but then we’re optimizing the wrong thing. We’re optimizing $\exp(E[\log(f)])$ rather than $f$, and thanks to Jensen’s inequality these are different things. This point warrants further consideration in applications.

We see that a posterior distribution allows us, in principle, to incorporate uncertainty about the model into the optimization problem, but we only have a tractable problem under more specific conditions than are manageable with approaches based on point estimates or robustness (discussed below).

Robust Optimization

Assessing the worst case performance and incorporating robustness into the optimization problem is analogous to our last post. So I’ll just state the optimization problem and provide some commentary: $$ \begin{array}{cl} \mbox{maximize${}_{c}$ minimize${}_{\beta}$} & f(c, \beta) \\\ \mbox{subject to} & c \succeq 0 \\\ & \mathbf{1}^T c \leq C \\\ & \beta \in S \\\ & f(c, \hat{\beta}) \geq (1 - \theta) \cdot \textrm{perf}_{\textrm{opt}}, \end{array} $$ where $\textrm{perf}_{\textrm{opt}} = \underset{c:\, c \succeq 0, \, \mathbf{1}^T c \leq C}{\sup} { f(c, \hat{\beta}) }$ is the optimal performance if our point estimate is perfectly correct. When we have a posterior on $\beta$, we can also replace $f(c, \hat{\beta})$ in the last constraint with $h(c) = \int f(c, \beta) d\mathcal{B}$, with $\textrm{perf}_{\textrm{opt}}$ becoming $\underset{c:\, c \succeq 0, \, \mathbf{1}^T c \leq C}{\sup} { h(c) }$. In either case, this last constraint posits a minimum acceptable nominal or average performance, with a robustness budget of $\theta$.

When the confidence region, $S$, is convex, and when $f$ is convex-concave in $\beta$ and $c$ as we have assumed throughout, then we can solve the robust problem efficiently, perhaps in a few milliseconds for problems with a few dozen assets. Depending on the form of $f$, we should be able to solve problems even with thousands of assets in less than a few seconds.

As in our last post, it may make sense to solve the problem for a variety of $\theta\textrm{s,}$ plot the problem values against the nominal/average performances, and choose a $\theta$ based on the tradeoffs between nominal and worst case performance. We can use warm starts to speed up multiple solves, so it should be possible to trace out the entire trade-off curve in just a small multiple of the time it takes to solve a single instance.

It is somewhat more challenging to analyze the Lagrange multipliers in this problem, because the objective function, $p(c) = \underset{\beta \in S}{\inf} f(c, \beta)$, is not differentiable. In this case, the optimality conditions are based on sub-gradients. Convex Optimization Algorithms by Dimitri Bertsekas gives the sub-gradient for $p(c)$ (in addition to much excellent material on minimax problems), but I have not pursued this. I anticipate the Lagrange multipliers can still be interpreted relative to marginal returns, perhaps in terms of worst case marginal returns.

Considering the mathematical complexity of the robust approach, it’s understandable why people might want to solve the conceptually-much-simpler problem where we treat the point estimate as being correct. However, solving the robust problem is only slightly more computationally intensive than solving the non-robust problem, and will likely lead to superior performance. If nothing else, it makes sense to assess the worst case performance arising from the model uncertainty. We may identify a meaningful risk in our strategy. There are two ways of mitigating this risk: collecting more data (thus decreasing the uncertainty in $\beta$), or directly managing the uncertainty with robust optimization. While collecting more data is conceptually simpler (and more data is always better!), it is also likely to be much more expensive than implementing and solving the robust problem. No matter how much data we collect, there is always uncertainty in the model. Assessing and mitigating the impact of this uncertainty is computationally efficient and advantageous.

Conclusions

In this post, we discussed three approaches to portfolio optimization in models with diminishing returns. We considered models concave in the resource allocation, $c$, and convex in the model parameters, $\beta$, holding the other fixed.

First we considered the case where we ignore any model uncertainties, treating $\beta$ as fixed. By examining the Lagrange multipliers, we found some insights regarding the optimal allocation, with respect to the marginal returns. Next we supposed a posterior distribution on the model parameters, and optimized for the average return. This incorporates all available information regarding the model parameters, $\beta$, but is not necessarily robust against model uncertainty. Finally, we demonstrated, how to incorporate worst-case performance into the optimization problem. This robust approach ensures good performance no matter the true model, without overly detrimental impact to expected performance.

All variants considered are computationally efficient. Robust optimization is both powerful and practical.

Subscribe to Adventures in Why

* indicates required
Bob Wilson
Bob Wilson
Data Scientist

The views expressed on this blog are Bob’s alone and do not necessarily reflect the positions of current or previous employers.