The Right Thing, Not the Easy Thing
New entrants into the field of systematic investing must look for ways to differentiate themselves.
New entrants into the field of systematic investing must look for ways to differentiate themselves. Impressive backtested performance is one way to stand out. But are these backtests robust? Are they likely to withstand the challenges of real-world implementation? Historical returns can be noisy, and even small changes to how an experiment is run can produce very different outcomes. While it is easy to find exciting results in a backtest, we believe the right thing to do is to evaluate these findings using a comprehensive research framework and determine whether the discoveries are truly new and can benefit investors going forward.
Setting The Stakes
Let’s say we have two simulated strategies, A and B, that focus on US small cap value stocks and are rebalanced annually. Strategy A returned 15.6% per year from January 1973 to June 2019. A slight methodology tweak to strategy B, however, boosted its performance to 16.7%. The easy thing to do with this observation is to conclude that B is superior. After all, a pickup of 110 basis points per year. (1.1%) is no small feat.
So, what was the special enhancement for B relative to A? Simply changing the rebalancing month from May to January. As we see from Exhibit 1, there was a substantial variation in the returns based on the choice of rebalancing month for otherwise identical small cap value strategies. However, one should not interpret this 110-basis point spread as an expected value-add. In fact, this is a cautionary tale about the potential for noise in empirical research.
What’s the right thing to do? Be wary of the potential for such noise and seek to mitigate its influence on the inferences drawn from the empirical findings. In this case, the right approach could be to use annual staggered rebalancing that takes an average of all the bars in Exhibit 1, thus reducing the potential to game the backtest by picking the rebalancing month.
In Search of Anomalies
When it comes to financial research, few things are more exciting than finding a new anomaly, or a variable that appears to drive differences in average returns. Let’s take stock volatility as an example. As shown in Exhibit 2, using Fama/French data on quintiles formed by sorting US stocks on past volatility (standard deviation), average returns for the low volatility quintile have been similar to those of the Fama/French Total US Market Research Index since the 1960s, despite the former having lower-than-market volatility. The easy thing to do? Declare that we have found a new anomaly.
INot so fast. We believe the right way to analyze a new pattern in the historical return data is in the context of known drivers of expected returns. Specifically, is this new observation additive to our understanding of asset pricing? When evaluated against the Fama/French Five-Factor Model in Exhibit 3, the answer appears to be no. The small intercept and t-stat suggest the returns of low volatility stocks are well explained by their exposure to known return drivers. No anomaly here, just a reminder that empirical analysis should not be conducted in a vacuum.
Tweaking The Mousetrap
Exceptionally negative recent value premiums have prompted many investors to scramble for ways to improve their process of pursuing the premium. That has led some down the road of tinkering with the metric used to define value versus growth. One example is adjusting book values for intangible assets, such as patents, copyrights, brands, and reputation.
This adjustment is alluring on the surface. Over the 10-year period ending December 2018, using a price-to-book ratio adjusted for internally developed intangibles would have substantially narrowed the annualized return difference between value and growth but not entirely eradicated value’s underperformance. (1) The easy thing to do? Adopt the fix immediately. The right thing to do? Take a step back and ask: “Are internal intangibles a new phenomenon? And, if not new, have they grown in importance over time?” The answer to the first question is no. History buffs will know that the US started issuing patents back in 1790 and registering trademarks in 1870. So intangibles have been part of the economic landscape and capital markets for a long time.
To answer the second question, you have to estimate internally developed intangibles because they are generally expensed on the income statement under the US accounting principles, rather than capitalized on the balance sheet. Our recent paper on intangibles did that, finding that internally developed intangibles have been a steady fraction of company assets for a long time. Exhibit 4 shows, for the US Market, that they represented about 30% of company assets back in the 1980s and they represent about 30% of company assets today.
Our deep dive into intangibles also highlighted the noise involved in the estimation of internally developed intangibles. Procedures for doing so involve accumulating prior expenditures on research and design (R&D) and selling, general, and administration (SGA), relying on assumptions for how much of each component to include and over what time horizon to amortize. Couple these dependencies with data limitations, particularly for R&D, and there is ample susceptibility to noise with this adjustment.
Perhaps due to this noise and the higher uncertainty around the valuation of internally developed intangibles (which, unlike externally acquired ones, do not go through a market assessment), we find that adjusting for internally developed intangibles does not yield consistently higher value and profitability premiums.
As mentioned earlier, the US value premium over the past decade benefited from the adjustment. But further inspection of this result shows it is mainly driven by different 4Dimensional Fund AdvisorsPlease see the end of this document for important disclosures. sector exposures—turns out, the past decade was a good one for technology stocks, and technology had a higher weight in a value strategy formed on adjusted price-to-book. And, despite an improvement with the adjustment, the value premium was still negative.
Looking over the longer term and assessing value and profitability premiums together, the impact from intangibles adjustments has been minimal. Specifically, we see that the value premium gets a little larger while the profitability premium gets a little smaller. The net effect is near zero. There is no compelling evidence an investor can more effectively pursue higher expected returns by adjusting the value and profitability metrics for internally developed intangibles.
In summary, the easy thing to do in response to the recent dismal performance of the value premium is to immediately adopt an intangibles adjustment. But we believe the right thing to do is to evaluate thoughtfully and fully the impact of such an adjustment. We need to consider the high level of uncertainty around the value of internally developed intangibles (for example, only 8% of drugs that start the research phase end up in the marketplace (2)) and the noise that this adjustment will inject into the implementation process. We also need to consider that the empirical results from this adjustment are not compelling. Moreover, mixing and matching accounting variables to find the optimum backtest risks falling into the trap of overfitting the data (3) —in other words, finding adjustments that make the past returns look great but might have no impact on future returns. The right thing to do is to look beyond an alpha or a Sharpe ratio and assess the rationale behind using each individual variable, how alternative variable specifications may differ, and what consequences this might have for implementation.
Timing Isn’t Everything
It is probably not news to value investors that premiums can be negative even for long periods of time. There is obviously a major incentive to find ways to predict and avoid disappointing performance. Accordingly, substantial research has been conducted on timing markets and premiums.
In great news for investors who own a time machine, Dimensional’s research on the subject has even identified a successful timing strategy for the value premium. A mean reversion approach in Italian stocks that switches in and out of value based on the trailing average of the value premium has outperformed a buy-and-hold value approach by 7.5 percentage points per year. For those of us living in the present, enthusiasm over this result continuing in the future is tempered by the fact that this specific timing strategy is the best one out of 680 strategies tested (4) and actually underperformed buy-and-hold strategies outside of Italian stocks by more than two percentage points per year. As illustrated by the excess returns for all 680 strategies in Exhibit 5, underperformance was by far the most likely outcome for these approaches to timing premiums.
The odds are high that—if you are willing to pore through the data long enough—you will eventually find a timing strategy that worked in the past. There are many inputs to a timing strategy: the premium being pursued, the indicator used for timing (e.g., valuation ratios or past performance), and the threshold for switching (in both directions), to name a few. The vast number of potential combinations of these inputs is ripe for data mining. Of course, the successful outcomes are the ones that will garner attention, but investors should acknowledge these as pyrite (5): attractive at first glance but not robust to further testing.
Realized premiums can be negative even for long stretches of time. The easy thing to do is to vary exposure to premiums in accordance with the output of a model that appears successful in a backtest. But rigorous research casts doubt on this approach effectively reducing the downside risks of the equity, size, value, or profitability premiums. Moreover, this approach can often yield meaningful trading and tax costs. Therefore, we believe the right thing to do is to deliver what you say you will deliver through the continuous pursuit of the reliable equity premiums. In addition to reducing investor surprises, this helps ensure one captures the premiums when they appear.
Finding Wisdom
In the interest of writing a column rather than a textbook, we have touched on only a fraction of the empirical research on expected returns. If all these decades of poking and prodding the data have proved anything, it’s that you can’t prove something with data alone. The endless combinations available to researchers and the noise inherent in stock return data imply historical return comparisons should be evaluated with a heaping spoon of salt.
This is not to dismiss the importance of empirical research. Performance simulations can help us understand the behavior of returns and inform our expectations for future returns. However, investors need to avoid the risk of extrapolating historical patterns that may have occurred by chance. How can investors wade through the noise? Comprehensively evaluating the research may be a tall order, just as it’s probably not realistic to expect a patient to perform his or her own surgery. But a good place to start is asking questions about empirical techniques. The more moving parts in the backtest, the more susceptible the inferences are to noise. An investor’s assessment of the likelihood that simulated investment performance translates into real-world portfolios may depend upon whether a manager can demonstrate an approach to research that mitigates these sensitivities.
From a manager standpoint, it’s important to have a framework that guides research and hypotheses, even before looking at the data. Research must be conducted in a way that yields robust inferences. This includes running robust experiments that attempt to alleviate random sources of noise and outcomes excessively dependent upon a particular set of circumstances. It also means aligning the research with how its insights will be deployed within the investment process so that inferences from the research are more accurate. Deeply understanding empirical financial research requires extensive expertise developed through time. But this is a key part of applying judgment in the investment process: it is easy to find something that looks great in hindsight, but the right thing to do is to find ways to add value going forward.
Rizova, Savina and Namiko Saito (2020), “Intangibles and Expected Stock Returns”, available upon request.
Ernst R. Berndt, Adrian H.B. Gottschalk, and Matthew W. Strobeck, “Opportunities for Improving the Drug Development Process: Results From a Survey of Industry and the FDA,” Innovation Policy and the Economy 6 (2006): 91–121.
Robert Novy-Marx, “Backtesting Strategies Based on Multiple Signals,” NBER Working Paper No. w21329 (2015), available at: papers.ssrn.com/sol3/papers.cfm?abstract_id=2629935.
Wei Dei, “Premium Timing with Valuation Ratios” (white paper, Dimensional Fund Advisors, 2016), available here.
Pyrite, or iron sulfide, is more commonly known as “fool’s gold.”
Ryan H. Peters and Lucian A. Taylor, “Intangible Capital and the Investment-q Relation,” Journal of Financial Economics 123, no. 2 (2017): 251–272.