28 Metrics

Marketing research lives and dies by measurement. A construct that cannot be operationalized into a reproducible number cannot enter a regression, cannot be priced by equity markets, and cannot be defended to a sceptical reviewer or a sceptical chief financial officer. This chapter is a working catalogue of the metrics that quantitative marketing scholars actually compute—the financial indicators that connect marketing actions to firm value, the firm-level controls drawn from accounting data, and the increasingly text- and image-based marketing constructs that machine learning has made measurable at scale. It is also a methods chapter: for the harder constructs—marketing capability above all—we give the estimator, the identifying assumptions, and runnable code that reconstructs the measure from raw Compustat and patent data.

The organizing distinction is between outcomes and inputs to outcomes. A metric such as return on marketing investment is an outcome the firm wants to maximize; metrics such as firm size, leverage, and profitability are covariates that confound or moderate the marketing–performance relationship and must be controlled. A third class—capability metrics—are neither directly observed nor simple ratios of accounting items; they are latent efficiencies recovered by econometric frontier estimation, and they demand the most care. We treat the three classes in turn, leading each with the intuition for what the number is supposed to capture and following immediately with its formal definition, its data source, and the assumptions under which it is identified.

Throughout, the practical substrate is the Wharton Research Data Services (WRDS) ecosystem: the Center for Research in Security Prices (CRSP) for market data, Compustat for accounting fundamentals, Thomson Financial 13F filings for institutional ownership, and the WRDS U.S. patents database for innovation output. The reader who works through the code will end with a firm-year panel on which any of the marketing–finance models in Chapter 23 can be estimated.

28.1 Financial Metrics

Financial metrics translate a marketing intervention into the language that capital markets and corporate boards speak. They fall into three families: return measures that scale profit by the capital deployed to earn it, value-creation measures that ask whether the firm earned more than its cost of capital, and covariate measures that characterize the firm’s size, risk, and profitability for use as controls.

28.1.1 Return on Investment and Return on Marketing Investment

The most elementary return metric scales net profit by the capital that produced it. Return on investment (ROI) is

\[ \text{ROI} = \frac{\text{Net Profit}}{\text{Investment}}, \tag{28.1}\]

a unitless ratio interpretable as the profit earned per dollar committed. ROI is attractive precisely because it is dimensionless and therefore comparable across projects of different scale, but that virtue is also its central weakness: it discards the magnitude of the investment, so a tiny project with a spectacular percentage return can dominate a large project that creates far more absolute value. ROI is a ranking device, not a value-maximization objective.

For marketing specifically, the analogue isolates the incremental effect of the marketing dollar. Return on marketing investment (ROMI), sometimes written ROIM, is

\[ \text{ROMI} = \frac{\text{IRAM} - \text{CM}}{\text{MS}}, \tag{28.2}\]

where IRAM is the incremental revenue attributable to marketing, CM is the contribution cost of the marketing investment, and MS is marketing spending. A positive ROMI signals that the marketing program returned more than it cost on the margin. The deceptively simple numerator hides the entire identification problem of the field: incremental revenue is a counterfactual quantity—the revenue that would not have occurred absent the marketing—and recovering it requires either an experiment or a credible model of the no-marketing baseline. Naively crediting marketing with all post-campaign revenue (the endogeneity of spend with demand) inflates ROMI without bound, which is why the construct is better understood as a target for causal estimation than as an accounting ratio.

Note

ROI and ROMI are summary statistics of a causal effect, not the effect itself. Their managerial appeal—a single comparable percentage—is exactly what makes them easy to game by manipulating the denominator (under-counting the true cost base) or by attributing organic demand to the campaign. Report them alongside the identification strategy that produced the incremental numerator.

28.1.2 Economic Value Added

ROI compares profit to capital but is silent on whether that profit cleared the cost of capital. Economic value added (EVA), also called economic profit, fills that gap. It measures the residual wealth a firm generates after deducting a charge for all the capital it employs, equity as well as debt, on an after-tax basis:

\[ \text{EVA} = \text{NOPAT} - (\text{Invested Capital} \times \text{WACC}). \tag{28.3}\]

Here net operating profit after taxes (NOPAT) equals operating profit times one minus the tax rate; invested capital is the sum of debt, capital leases, and shareholders’ equity (equivalently, equity plus long-term debt measured at the start of the period); and WACC is the weighted average cost of capital, the blended return the firm must pay its providers of capital. The product \(\text{WACC} \times \text{Invested Capital}\) is the finance charge: the opportunity cost of the funds tied up in the business. EVA is positive only when operating performance exceeds that charge, which is the precise sense in which positive EVA means the firm created value rather than merely earned a profit.

The cost of capital itself is a weighted average over the firm’s capital structure,

\[ \text{WACC} = \frac{K_e \, E}{E + D} + \frac{K_d \,(1 - t)\, D}{E + D}, \tag{28.4}\]

where \(E\) and \(D\) are the market values of equity and debt, \(K_e\) is the required return on equity, \(K_d(1-t)\) is the after-tax return on debt, and \(t\) is the marginal tax rate. The after-tax adjustment on debt reflects the tax deductibility of interest.

Because invested capital can equivalently be written as total assets net of current liabilities, a common balance-sheet implementation of Equation 28.3 is

\[ \text{EVA} = \text{NOPAT} - (\text{total assets} - \text{current liabilities}) \times \text{WACC}. \tag{28.5}\]

EVA rests heavily on the book value of invested capital, which makes it most informative for asset-rich firms whose balance sheets capture the bulk of the resources they deploy. For firms whose value resides in intangibles—software, brands, data, organizational capital—book invested capital understates the true capital base, and EVA correspondingly overstates value creation. The intangibles problem is not a minor caveat; it is the reason marketing scholars increasingly favor market-based metrics over EVA for technology and consumer-brand firms.

28.1.3 Market Value Added

Where EVA is a flow concept measured each period, market value added (MVA) is the corresponding stock: the cumulative wealth the firm has created for its capital providers since inception. It is the gap between what the market says the firm’s claims are worth and what investors originally contributed:

\[ \text{MVA} = \text{market value of shares (or enterprise value)} - \text{book value of shareholders' equity}. \tag{28.6}\]

A firm with persistently positive EVA accumulates positive MVA; under the residual- income identity, MVA equals the present value of all expected future EVA. MVA thus sidesteps the period-by-period WACC estimation that EVA requires, at the cost of inheriting all the noise of market expectations: it tells you what the market believes the firm has created, not what it has demonstrably created.

28.1.4 Firm-Level Covariates from Accounting Data

Most marketing–finance studies are observational, so the credibility of any estimated marketing effect rests on controlling for the firm characteristics that jointly drive marketing decisions and financial outcomes. The literature has converged on a standard battery of accounting-based controls, each with a canonical operationalization. We collect them here with their definitions, their empirical pedigree, and the source items in Compustat.

Profitability is conventionally measured as operating income before depreciation scaled by total assets, a return-on-assets variant that strips out the financing and depreciation choices that contaminate net income (Grewal et al. 2008; McAlister et al. 2016):

\[ \text{profitability} = \frac{\text{operating income before depreciation}}{\text{total assets}}. \tag{28.7}\]

Firm size enters as the natural logarithm of total assets, the log transformation taming the extreme right skew of the firm-size distribution and rendering coefficients interpretable as elasticities (Grewal, Chandrashekaran, and Citrin 2010; McAlister et al. 2016; Nezami, Worm, and Palmatier 2018):

\[ \text{firm size} = \log(\text{total assets}). \tag{28.8}\]

Sales growth, the percentage change in gross sales, proxies for the firm’s demand-side momentum and growth opportunities (Grewal, Chandrashekaran, and Citrin 2010; Nezami, Worm, and Palmatier 2018; Rao, Agarwal, and Dahlhoff 2004). Cash flow, measured as the log of operating cash flow in millions, captures internal financing capacity and is a standard control in studies linking marketing to firm risk (Chakravarty and Grewal 2011; Malshe and Agarwal 2015).

Financial leverage scales long-term debt by the book value of assets, indexing the firm’s reliance on debt financing and its exposure to financial distress (Kashmiri and Mahajan 2017; Chakravarty and Grewal 2011):

\[ \text{financial leverage} = \frac{\text{long-term debt}}{\text{book value of assets}}. \tag{28.9}\]

Abnormal stock return is frequently dichotomized: a dummy equal to one when a firm’s stock return exceeds its industry-averaged return marks out-performers in a way robust to the heavy tails of raw returns (Markovitch, Steckel, and Yeung 2005; Chakravarty and Grewal 2011, 2016).

Beyond these workhorses, specialized studies build bespoke financial constructs. Unexpected size-adjusted advertising investment—the residual from a model of expected advertising given firm size—isolates the surprise component of marketing spend that markets have not already priced (Chakravarty and Grewal 2016; Kim and McAlister 2011; Liu, Shankar, and Yun 2017). Shareholder complaints, drawn from the RiskMetrics governance database, proxy for investor dissatisfaction and governance friction (Wies et al. 2019).

28.1.5 Book Equity

Book equity (BE) is a deceptively intricate construct because the “right” measure depends on data availability, and the field’s conventions trace to the Fama–French data library. The canonical definition is worth quoting in full:

“BE is the book value of stockholders’ equity, plus balance sheet deferred taxes and investment tax credit (if available), minus the book value of preferred stock. Depending on availability, we use the redemption, liquidation, or par value (in that order) to estimate the book value of preferred stock. Stockholders’ equity is the value reported by Moody’s or Compustat, if it is available. If not, we measure stockholders’ equity as the book value of common equity plus the par value of preferred stock, or the book value of assets minus total liabilities (in that order).” (Davis, Fama, and French 2000)

The hierarchy of fallbacks is the point: preferred stock is valued by redemption, then liquidation, then par; stockholders’ equity is taken from the cleanest available source and reconstructed only when necessary. The Compustat implementation coalesces the preferred-stock items in order of preference and adds deferred taxes:

Code

library(tidyverse)
# Preferred-stock book value: redemption (pstkrv), then liquidation (pstkl),
# then par (pstk); add deferred taxes and investment tax credit (txdtc).
book_equity <- seq - coalesce(pstkrv, pstkl, pstk, 0) + coalesce(txdtc, 0)

28.1.6 Net Contribution

The bridge from marketing spend to profit is the net contribution function, which underlies the advertising-budgeting and response-modeling literature. Net contribution is gross margin times sales revenue, less the cost of the marketing that produced those sales:

\[ \text{NC} = m \times S(a) - k a, \tag{28.10}\]

where \(m\) is gross margin, \(S(a)\) is the sales-response function mapping marketing effort \(a\) to sales, and \(k\) is the unit cost of effort. The first-order condition \(m \, S'(a^\*) = k\) characterizes the optimal effort \(a^\*\): spend until the marginal gross profit from an additional unit of effort equals its marginal cost. The shape of \(S(\cdot)\)—concave, S-shaped, or saturating—governs whether an interior optimum exists, which is the recurring substantive question of the advertising-response literature.

28.1.7 A Reference Map of Compustat Items

The metrics above all reduce to combinations of a relatively small set of WRDS data items. Because reproducing any one of them requires knowing the exact item mnemonic and its source file, Table 28.1 assembles the core mapping. All items without a CRSP annotation come from Compustat Fundamentals Annual; the mnemonics follow the WRDS data-items reference.¹

Table 28.1: Core firm-level metrics and their Compustat/CRSP data items.

Metric	Data item	Source file
Book value of equity	PRCC_C x CSHO	CRSP/Compustat Merged
Capital intensity	CAPX / AT	Cash flow; balance sheet
Cash flow	(IBC + DP) / AT	Cash flow; income statement
Cash holdings	CHE / AT	Balance sheet
Cost of capital (proxy)	XINT / DLC	Income statement; balance sheet
Earnings per share	NI / CSHO	Income statement; misc.
Firm size	log(AT)	Balance sheet
Leverage	(DLTT + DLC) / SEQ	Balance sheet
Market-to-book ratio	MKVALT / BKVLPS	Supplemental; balance sheet
Market value	MKVALT or CSHO x PRCC_F	Supplemental; misc.
Payout ratio	(DVP + DVC + PRSTKC) / IB	Income statement; cash flow
R&D intensity	XRD / AT	Income statement; balance sheet
Return on assets (ROA)	NI / AT	Income statement; balance sheet
Return on equity (ROE)	NI / (CSHO x PRCC_F)	Income statement; supplemental
Return on investment (ROI)	NI / ICAPT	Income statement; balance sheet
Tangibility	PPENT / AT	Balance sheet
Tobin’s Q	(AT + CSHO x PRCC_F - CEQ) / AT	Balance sheet; supplemental
Total equity	PSTKC + CSHO	Balance sheet; misc.

A second, finance-oriented battery—used in studies of corporate investment, financing, and payout policy—follows the conventions catalogued by Kahle and Stulz (2017). Table 28.2 reproduces the construction rules, which differ from Table 28.1 chiefly in their use of lagged assets in denominators (to avoid mechanical contemporaneous correlation) and in the explicit treatment of missing R&D as zero.

Table 28.2: Valuation, investment, financing, and payout metrics following Kahle and Stulz (2017).

Category	Metric	Construction
Valuation	Tobin’s Q	(AT + CSHO x PRCC_F - CEQ) / AT
Valuation	Market cap	prc x shrout (CRSP)
Valuation	Revenue Herfindahl	revt_i^2 / sum(revt) within 3-digit NAICS x year
Investment	CapEx / assets	capx / lag(at)
Investment	R&D / assets	xrd / lag(at); missing R&D set to 0
Investment	Fixed assets / assets	ppent / at
Investment	Cash / assets	che / at
Profitability	Operating cash flow / assets	(oibdp - xint - txt) / lag(at)
Profitability	ROA	ib / at
Financing	Book leverage	(dltt + dlc) / at
Financing	Market leverage	(dltt + dlc) / (at - ceq + che x prcc_f)
Financing	Net leverage	(dltt + dlc - che) / at
Financing	Net equity issuance	(sstk - prstkc) / lag(at)
Ownership	Institutional ownership	% shares held by institutions (13F)
Ownership	Blockholder	institution holding >= 10% of shares (13F)
Payout	Dividends / assets	dvc / lag(at)
Payout	Repurchase / assets	(prstkc - pstk) / lag(at)
Payout	Total payout / assets	(dvc + prstkc) / lag(at)

Institutional-ownership and blockholder variables come from Thomson Financial 13F filings; the macro denominator for the market-cap/GDP ratio is series GDPA from the U.S. Bureau of Economic Analysis.

28.1.8 Industry Concentration and Diversity

Several of the metrics above require an industry-concentration index, and the literature has settled on the Herfindahl form. For revenue shares \(s_i\) within an industry, the Herfindahl–Hirschman index is \(\sum_i s_i^2\); the revenue-Herfindahl in Table 28.2 computes this within each 3-digit NAICS industry and year. The same quantity, applied to sector market shares, is the Simpson diversity index familiar from ecology—the two are algebraically identical, both equal to the probability that two randomly drawn units belong to the same category. High Herfindahl (low Simpson diversity) signals a concentrated industry; the index thus does double duty as a competition control and as a measure of how diversified a firm’s or market’s activity is across sectors.

28.2 Marketing Metrics

The financial metrics above are computed from structured accounting data. The marketing constructs in this section are different in kind: trust, sentiment, willingness to pay, purchase intention, and brand reputation are psychological states that classically required surveys to measure but are now increasingly recovered from unstructured text and images by machine learning. The methodological frontier here is the use of pre-trained language and vision models to convert social-media content into validated marketing measures at a scale surveys cannot reach.

28.2.1 Trust

Trust is the willingness to rely on an exchange partner in whom one has confidence, and it has long been measured by multi-item attitudinal scales. The frontier substitutes behavioral and relational signals in social-media data for self-report: Roy et al. (2017) develop an algorithmic measure of brand trust from the structure and content of consumers’ social-media interactions, demonstrating that trust leaves a computational trace that can be extracted without surveying anyone.

28.2.2 Sentiment

Sentiment—the valence of expressed affect—is core to human communication and is the single most demanded text-analytic measure in marketing, applied to social media, news, customer feedback, and corporate communications. The central practical question is which method to use, because the menu ranges from simple lexicons that map words to polarity scores to transfer-learning language models that are far more accurate but far more demanding.

Hartmann et al. (2023) resolve this question empirically with a meta-analysis spanning 272 datasets and roughly twelve million sentiment-labeled documents. Their headline finding is that transfer-learning models—pre-trained transformers fine-tuned on sentiment—deliver the best performance, outperforming lexicons by more than twenty percent in accuracy on average. The advantage is not uniform: it widens with the number of sentiment classes and is moderated by text length, and the leaderboard-topping benchmark model is not always the best choice for a given research setting. Crucially for reproducibility, the authors supply a pre-trained model (SiEBERT) and open-source scripts, lowering the barrier to applying state-of-the-art sentiment analysis. The practical lesson is that method choice should be made deliberately against the research question, the data, and the available computational resources—not by reflexively reaching for a lexicon because it is convenient.

28.2.3 Willingness to Pay

Willingness to pay (WTP)—the maximum price a consumer will accept for a good—is the demand-side primitive that underlies pricing and welfare analysis. Recovering it from naturally occurring text rather than from elicitation experiments is an active frontier; He, Anderson, and Rucker (2023) represents recent work in this direction, extracting WTP signals from expressed consumer language.

28.2.4 Purchase Intention

Purchase intention—the self-reported likelihood of buying—is a leading indicator of behavior and a workhorse dependent variable. Hartmann et al. (2021) show that it can be inferred from images, not just text, and in doing so overturn a common assumption about social-media metrics. Smartphones have made it trivial for consumers to share branded imagery, and the authors classify that imagery into three types using convolutional neural networks: packshots (the product alone), consumer selfies (a consumer’s face shown with the brand), and brand selfies (the product held from the consumer’s own visual perspective, with no consumer face visible). Applying language models to social-media responses across more than 250,000 brand-image posts from 185 brands on Twitter and Instagram, they find a revealing dissociation: consumer selfies generate more likes and comments, but brand selfies induce higher purchase intentions. Engagement metrics and purchase intent diverge.

The dissociation has managerial bite. In a display-advertising field test, brand selfies earned higher click-through rates than consumer selfies, and a laboratory experiment traced the mechanism to self-reference: the first-person perspective of the brand selfie invites the viewer to imagine holding the product themselves. The broader methodological point is that machine learning can decode marketing-relevant constructs from multimedia content, and that traditional engagement counts (likes, comments) may mislead about the constructs managers actually care about. The purchase-intention classifier is released as a fine-tuned RoBERTa model.²

28.2.5 Brand Reputation

Brand reputation—the aggregate esteem in which a brand is held—has migrated from survey trackers to real-time social listening. Rust et al. (2021) measure brand reputation directly from Twitter data, demonstrating that a construct historically captured by periodic, expensive surveys can be estimated continuously from public conversation, with the attendant gains in timeliness and the attendant risks of platform-specific selection.

28.3 Marketing Capability

Capability is the most demanding metric in this chapter and the one that most rewards careful estimation, because it is not observed at all—it is a latent efficiency that must be inferred from the gap between what a firm achieves and what the best firms achieve with comparable inputs. The construct originates with Dutta, Narasimhan, and Rajiv (1999), who define a firm’s capability as “its ability to deploy the resources (inputs) available to it to achieve the desired objective(s) (output).” The higher a firm’s functional capability, the more efficiently it converts its inputs into the relevant functional output; equivalently, the lower its functional inefficiency, the higher its capability. This input–output framing is what makes stochastic frontier analysis the natural estimator.

28.3.1 The Substantive Argument

The motivating insight of Dutta, Narasimhan, and Rajiv (1999) is that in high-technology markets, raw technological prowess is not enough. A firm can possess formidable research and development (R&D) capability—generating a stream of high-quality innovations—yet fail commercially because it lacks the marketing capability to translate those innovations into products consumers value and buy. Marketing capability has its largest effect on quality-adjusted innovation output precisely for firms with a strong technological base: the firms that benefit most from great marketing capability are those that already have a strong R&D foundation, because only they have innovations worth commercializing. The interaction of marketing and R&D capabilities is therefore the single most important determinant of firm performance—high-technology firms must be able both to generate innovation continuously and to commercialize it.

This argument dictates the estimation strategy. Three capabilities are estimated jointly because they feed one another: marketing capability drives sales from a firm’s technological base, advertising stock, marketing stock, customer relationships, and installed base; R&D capability drives quality-adjusted technological output; and operations capability drives the cost of production. The functional relationships Dutta, Narasimhan, and Rajiv (1999) posit are, schematically,

\[ \text{Sales} = f(\text{technological base},\ \text{advertising stock},\ \text{marketing stock},\ \text{customer relationships},\ \text{installed base}), \]

\[ \text{Quality-adjusted output} = f(\text{technological base},\ \text{cumulative R\&D},\ \text{marketing capability}), \]

\[ \text{Cost of production} = f(\text{output},\ \text{cost of capital},\ \text{labor cost},\ \text{technological base},\ \text{marketing capability}). \]

The appearance of marketing capability inside the R&D and operations equations is the formal expression of the substantive claim that the capabilities are interdependent, not separable.

Figure 28.1 makes the recursive structure explicit.

flowchart TD
    A[Advertising stock] --> M[Marketing frontier:<br/>log sales]
    MK[Marketing stock] --> M
    TB[Technological base] --> M
    REC[Receivables / CRM] --> M
    IB[Installed base] --> M
    M --> ME[Marketing efficiency]
    ME --> R[R&D frontier:<br/>log tech output]
    RD[R&D stock] --> R
    TB --> R
    R --> RE[R&D efficiency]
    ME --> O[Operations frontier:<br/>log COGS]
    LC[Labor cost] --> O
    CC[Cost of capital] --> O
    TB --> O
    O --> OE[Operations efficiency]

Figure 28.1: The interdependent capability system of Dutta, Narasimhan, and Rajiv (1999). Stocks of marketing, advertising, and R&D investment feed three stochastic frontiers; estimated marketing efficiency enters the R&D and operations frontiers, capturing the interdependence of capabilities.

28.3.2 Stock Variables and the Koyck Transformation

Marketing and innovation investments do not affect sales only in the year they are incurred; their effect decays geometrically over time. The standard device for turning a flow of expenditure into a stock is the Koyck geometric-lag transformation (Koyck 1954). For an expenditure flow \(x_t\) and a retention rate \(\lambda \in (0,1)\), the stock is

\[ \text{Stock}_t = \sum_{j=0}^{t-1} \lambda^{j} \, x_{t-j}, \tag{28.11}\]

so that each past dollar contributes \(\lambda^j\) of its original force after \(j\) years. The advertising-stock literature anchors \(\lambda\) empirically: weights of 0.4 (Peles 1971) and 0.5 (Z. Wang and Kim 2017) are standard for advertising stock, and Dutta, Narasimhan, and Rajiv (2005) use a weight of 0.5 for marketing expenditure and 0.4 for R&D expenditure (their p. 281). The choice of \(\lambda\) is consequential—too high a retention rate over-credits ancient spending—and should be defended against the estimated carryover in the relevant category rather than imposed by habit.

28.3.3 Measuring R&D Output: Innovativeness and Width

A raw patent count is a poor measure of technological output because patents differ enormously in quality; Dutta, Narasimhan, and Rajiv (1999) deliberately avoid it in favor of two citation-based, quality-adjusted measures. The first, innovativeness, follows the citation-weighting tradition of Trajtenberg (1990b) and Trajtenberg (1990a): a patent that is cited often is more valuable, so patents are weighted by how far their citation count exceeds the industry norm. The second, width of applicability, follows Jaffe, Trajtenberg, and Henderson (1993): a patent cited by firms in other industries has broader applicability, so patents are weighted by the share of their citations that come from outside the focal industry.

Concretely, the innovativeness-adjusted output is built in three steps. First, compute the average number of citations received by all sample patents within an industry—defined at one-, two-, three-, and four-digit SIC granularity—where the original study used a single mean across all firms and years. Second, weight each firm’s patent by its citation count divided by that industry-sample average. Third, sum the citation-weighted patents within a firm-year. The width-adjusted output parallels this: for each patent compute the proportion of its citations originating outside the focal SIC code; weight the patent by that proportion divided by the industry-average proportion; and sum the weighted patents within a firm-year.

28.3.4 The Stochastic Frontier Estimator

The three capabilities are recovered by stochastic frontier analysis (SFA), which is the right tool precisely because capability is defined as efficiency relative to a best-practice frontier. For firm \(i\) in year \(t\) with output \(y_{it}\) and inputs \(\mathbf{x}_{it}\), the production frontier is

\[ \log y_{it} = \mathbf{x}_{it}' \boldsymbol{\beta} + v_{it} - u_{it}, \tag{28.12}\]

where the composed error decomposes into a symmetric noise term \(v_{it} \sim \mathcal{N}(0, \sigma_v^2)\) and a one-sided inefficiency term \(u_{it} \ge 0\). The frontier \(\mathbf{x}_{it}'\boldsymbol{\beta} + v_{it}\) is the maximum attainable output; the firm falls short of it by \(u_{it}\). Capability is then the technical efficiency

\[ \text{TE}_{it} = \exp(-u_{it}) \in (0, 1], \tag{28.13}\]

recovered as the conditional expectation \(\mathbb{E}[\exp(-u_{it}) \mid v_{it} - u_{it}]\) following Jondrow and others. For a production frontier, inefficiency decreases output, so the sign on \(u_{it}\) is negative (ineffDecrease = TRUE in the code below); for a cost frontier—used for the operations capability, where the output is the cost of goods sold to be minimized—inefficiency increases cost and the sign reverses (ineffDecrease = FALSE).

The identifying assumptions are exactly those that make SFA both powerful and fragile. First, the inefficiency term must be one-sided and distributionally specified (half-normal, truncated-normal, or exponential); the efficiency estimates are not robust to gross misspecification of this distribution. Second, the inputs \(\mathbf{x}_{it}\) must be exogenous to \(u_{it}\)—if firms with high latent capability systematically invest more, the frontier is biased and the recovered efficiencies absorb the endogeneity. Third, the log specification requires strictly positive inputs and outputs, which is why the data pipeline below replaces zeros and missing stocks with small positive constants. None of these assumptions is innocuous, and the interpretation of the recovered efficiencies as “capability” is only as good as the frontier is correctly specified.

28.3.5 Data Construction

We now build the firm-year panel from WRDS. The pipeline connects to the WRDS PostgreSQL server, pulls Total Q (a refined Tobin’s Q), the Compustat fundamentals needed for the input stocks, and the patent data needed for the R&D-output measures, and assembles them into a single panel. The replication is constrained by the WRDS U.S. patents coverage (2011–2019 for the citation files), so the post-2010 window is used for estimation; the original 1985–1994 study period would require building a name-matching algorithm against raw USPTO bulk data.

Code

library(RPostgres)
library(tidyverse)

# WRDS connection. Supply your own credentials via environment variables;
# never hard-code passwords into a reproducible script.
wrds <- dbConnect(
  Postgres(),
  host = "wrds-pgdata.wharton.upenn.edu",
  port = 9737,
  dbname = "wrds",
  sslmode = "require",
  user = Sys.getenv("wrds_user"),
  pass = Sys.getenv("wrds_pass")
)

Total Q provides the valuation outcome used in some replications below.

Code

res <- dbSendQuery(wrds, "SELECT DISTINCT gvkey, fyear, q_tot
                          FROM totalq_all.total_q
                          WHERE fyear >= 2000 AND
                                gvkey IS NOT NULL AND
                                q_tot IS NOT NULL")
totalq <- dbFetch(res, n = -1)
dbClearResult(res)
totalq |> write_rds(file.path("data", "totalq.rds"))

Code

library(tidyverse)
totalq <- read_rds(file.path("data", "totalq.rds")) |>
  rename(year = fyear)

The fundamentals pull retrieves the income-statement and balance-sheet items needed to build the input stocks, joins industry classifications, and interpolates missing firm-years. Spline interpolation is used in preference to linear interpolation because the underlying series (advertising, R&D, sales) are smooth and trending; linear fill would introduce kinks that the frontier would misread as inefficiency.

Code

res <- dbSendQuery(
  wrds,
  "SELECT DISTINCT gvkey, fyear, conm, curcd, cogs, rect, revt, sale,
                   xad, xrd, xsga, ppegt, emp, act, xopr, xint, dlc,
                   xlr, uxintd, invfg
   FROM comp_na_daily_all.funda
   WHERE fyear >= 2000 AND gvkey IS NOT NULL AND
         sale IS NOT NULL AND sale > 0 AND revt IS NOT NULL"
)
capability <- dbFetch(res, n = -1)
dbClearResult(res)

# Industry classifications.
res <- dbSendQuery(wrds,
  "SELECT gvkey, gind, gsubind, naics, sic
   FROM comp_na_daily_all.names")
ind <- dbFetch(res, n = -1)
dbClearResult(res)

capability <- capability |>
  left_join(ind, by = join_by(gvkey)) |>
  rename(year = fyear) |>
  unique() |>
  arrange(gvkey, year) |>
  group_by(gvkey, year) |>
  slice(1) |>           # one record per firm-year
  ungroup()

# Spline interpolation across the non-missing span of each series.
spline_interpolate <- function(x) {
  nz <- which(!is.na(x))
  if (length(nz) == 0) return(x)
  first_non_na <- nz[1]
  last_non_na  <- nz[length(nz)]
  x[first_non_na:last_non_na] <-
    zoo::na.spline(x[first_non_na:last_non_na], na.rm = TRUE)
  x
}
library(zoo)

capability <- capability |>
  group_by(gvkey) |>
  arrange(year, .by_group = TRUE) |>
  fill(conm, curcd, gind, gsubind, naics, sic, .direction = "downup") |>
  ungroup() |>
  group_by(gvkey) |>
  complete(year = min(year):max(year)) |>     # fill gaps within firm
  arrange(year, .by_group = TRUE) |>
  fill(conm, curcd, gind, gsubind, naics, sic, .direction = "downup") |>
  mutate(across(
    c(xrd, xsga, xad, emp, ppegt, xint, act, invfg, xlr,
      cogs, rect, revt, sale, xopr, dlc),
    spline_interpolate
  )) |>
  ungroup() |>
  arrange(gvkey, year)

capability |> write_rds(file.path("data", "capability", "capability.rds"))

Code

capability <- read_rds(file.path("data", "capability", "capability.rds"))

The patent–firm linkage assigns patents to gvkeys, keeping for each patent the match with the highest WRDS confidence score and breaking ties deterministically. Annual patent counts per firm follow.

Code

res <- dbSendQuery(wrds,
  "SELECT gvkey, link_bdate, patnum, wrds_score
   FROM wrdsapps_patents.uspatents_gvkey_linking
   WHERE gvkey IS NOT NULL")
wrdsapps_patents_link <- dbFetch(res, n = -1) |>
  group_by(patnum) |> slice_max(order_by = wrds_score, n = 1) |> ungroup() |>
  group_by(patnum) |> slice(1) |> ungroup()
dbClearResult(res)

res <- dbSendQuery(wrds,
  "SELECT DISTINCT patnum, grantdate, cited_patnum, cited_pat_gdate, cite_type
   FROM wrdsapps_patents.uspatents_citations")
wrdsapps_patents_citations <- dbFetch(res, n = -1)
dbClearResult(res)

patent_output <- wrdsapps_patents_link |>
  mutate(year = year(link_bdate)) |>
  group_by(gvkey, year) |>
  summarise(pat_count = n(), .groups = "drop") |>
  complete(gvkey, year = range(year)) |>
  mutate(across(-c(gvkey, year), ~ replace_na(., 0)))

patent_output |> write_rds(file.path("data", "capability", "patent_output.rds"))

Code

patent_output <- read_rds(file.path("data", "capability", "patent_output.rds"))

28.3.5.1 Innovativeness-adjusted output

Implementing the three-step innovativeness measure requires the industry citation averages at each SIC granularity. The following computes citations per patent per year, the industry averages at one- through four-digit SIC, the per-patent weights, and finally the firm-year sums.

Code

# Citations received per patent per year, with firm and industry attached.
df_cite_patent_year <- wrdsapps_patents_citations |>
  group_by(year = year(grantdate), patnum = cited_patnum) |>
  summarise(cite_count = n(), .groups = "drop") |>
  inner_join(wrdsapps_patents_link, by = join_by(patnum)) |>
  filter(year(link_bdate) <= year) |>     # patent predates the citing year
  inner_join(ind |> select(gvkey, sic) |> na.omit() |> unique(),
             by = "gvkey") |>
  select(-c(link_bdate, wrds_score)) |>
  unique()

# Industry-year average citations at each SIC granularity.
ind_avg_patent_sic <- df_cite_patent_year |>
  group_by(year, sic) |>
  summarize(ind_avg_year_patent_sic = mean(cite_count, na.rm = TRUE),
            .groups = "drop")
ind_avg_patent_sic3 <- df_cite_patent_year |>
  mutate(sic3 = substr(sic, 1, 3)) |>
  group_by(year, sic3) |>
  summarize(ind_avg_year_patent_sic3 = mean(cite_count, na.rm = TRUE),
            .groups = "drop")
ind_avg_patent_sic2 <- df_cite_patent_year |>
  mutate(sic2 = substr(sic, 1, 2)) |>
  group_by(year, sic2) |>
  summarize(ind_avg_year_patent_sic2 = mean(cite_count, na.rm = TRUE),
            .groups = "drop")
ind_avg_patent_sic1 <- df_cite_patent_year |>
  mutate(sic1 = substr(sic, 1, 1)) |>
  group_by(year, sic1) |>
  summarize(ind_avg_year_patent_sic1 = mean(cite_count, na.rm = TRUE),
            .groups = "drop")

# Per-patent weights = citations / industry average; sum within firm-year.
tech_innv <- df_cite_patent_year |>
  mutate(sic1 = substr(sic, 1, 1),
         sic2 = substr(sic, 1, 2),
         sic3 = substr(sic, 1, 3)) |>
  inner_join(ind_avg_patent_sic,  by = join_by(year, sic)) |>
  inner_join(ind_avg_patent_sic3, by = join_by(year, sic3)) |>
  inner_join(ind_avg_patent_sic2, by = join_by(year, sic2)) |>
  inner_join(ind_avg_patent_sic1, by = join_by(year, sic1)) |>
  mutate(
    weight_year_patent_sic  = cite_count / ind_avg_year_patent_sic,
    weight_year_patent_sic3 = cite_count / ind_avg_year_patent_sic3,
    weight_year_patent_sic2 = cite_count / ind_avg_year_patent_sic2,
    weight_year_patent_sic1 = cite_count / ind_avg_year_patent_sic1
  ) |>
  group_by(gvkey, year) |>
  summarise(
    tech_inv_patent      = sum(weight_year_patent_sic,  na.rm = TRUE),
    tech_inv_patent_sic3 = sum(weight_year_patent_sic3, na.rm = TRUE),
    tech_inv_patent_sic2 = sum(weight_year_patent_sic2, na.rm = TRUE),
    tech_inv_patent_sic1 = sum(weight_year_patent_sic1, na.rm = TRUE),
    .groups = "drop"
  )

tech_innv |> write_rds(file.path("data", "capability", "tech_innv.rds"))

Code

tech_innv <- read_rds(file.path("data", "capability", "tech_innv.rds"))
# Firms with no patents in a year get a small positive value so that
# the log frontier is defined.
tech_innv <- tech_innv |>
  complete(gvkey, year = min(tech_innv$year):max(tech_innv$year)) |>
  mutate(across(-c(gvkey, year), ~ replace_na(., 0.1)))

28.3.5.2 Width-of-applicability output

The width measure counts, for each patent, the share of its citations that come from firms in a different SIC code, then weights by that share relative to the industry-average share.

Code

tech_width_all <- wrdsapps_patents_citations |>
  inner_join(
    wrdsapps_patents_link |> select(-wrds_score) |> rename(gvkey_cite = gvkey),
    by = join_by(patnum)) |>
  select(-link_bdate) |>
  inner_join(
    wrdsapps_patents_link |> select(-wrds_score) |> rename(gvkey_cited = gvkey),
    by = join_by(cited_patnum == patnum)) |>
  filter(year(cited_pat_gdate) >= year(link_bdate)) |>
  select(-link_bdate) |>
  inner_join(ind |> select(gvkey, sic) |> rename(sic_cite = sic),
             by = join_by(gvkey_cite == gvkey)) |>
  inner_join(ind |> select(gvkey, sic) |> rename(sic_cited = sic),
             by = join_by(gvkey_cited == gvkey)) |>
  mutate(outside = if_else(sic_cite == sic_cited, 0, 1)) |>
  unique()

df_cite_patent_year_outside <- tech_width_all |>
  filter(outside == 1) |>
  group_by(patnum = cited_patnum, year = year(grantdate),
           gvkey = gvkey_cited, sic = sic_cited) |>
  summarise(outside_cite_count = n(), .groups = "drop") |>
  unique()

tech_width <- df_cite_patent_year_outside |>
  inner_join(df_cite_patent_year, by = join_by(patnum, year, gvkey, sic)) |>
  unique() |>
  mutate(prop = outside_cite_count / cite_count) |>
  mutate(mean_prop = mean(prop)) |>
  mutate(weight = prop / mean_prop, weighted_patent = weight * cite_count) |>
  group_by(gvkey, year) |>
  summarize(tech_width = sum(weighted_patent), .groups = "drop")

tech_width |> write_rds(file.path("data", "capability", "tech_width.rds"))

Code

tech_width <- read_rds(file.path("data", "capability", "tech_width.rds"))
min_year <- min(tech_width$year); max_year <- max(tech_width$year)
tech_width <- tech_width |>
  complete(gvkey, year = min_year:max_year) |>
  mutate(across(-c(gvkey, year), ~ replace_na(., 0.1)))
rm(min_year, max_year)

28.3.5.3 Historical patent output

To extend coverage backward, a historical patent file (1981–2012) drawn from the NBER/Searle dynamic-assignee data supplements the WRDS counts.³ The two sources are reconciled into a single patent-count series, preferring the historical count where both exist.

Code

patent_output_hist <-
  rio::import(file.path("data", "capability", "patents_conveyance.dta")) |>
  select(h_assignee_code, grant_date, patentid) |>
  na.omit() |>
  inner_join(
    rio::import(file.path("data", "capability", "dynass_nber_searle.dta")) |>
      select(gvkey1, h_assignee_code) |> na.omit(),
    by = join_by(h_assignee_code)) |>
  mutate(year = year(grant_date)) |>
  rename(gvkey = gvkey1) |>
  group_by(gvkey, year) |>
  summarise(patent_count = n(), .groups = "drop") |>
  complete(gvkey, year = range(year)) |>
  mutate(across(-c(gvkey, year), ~ replace_na(., 0)))

patent_output_hist |>
  write_rds(file.path("data", "capability", "patent_output_hist.rds"))

Code

patent_output_hist <-
  read_rds(file.path("data", "capability", "patent_output_hist.rds"))

patent_output_total <- patent_output |>
  full_join(patent_output_hist, by = join_by(gvkey, year)) |>
  mutate(pat_count = case_when(
    !is.na(patent_count) ~ patent_count,
    !is.na(pat_count)    ~ pat_count,
    TRUE                 ~ 0
  )) |>
  select(-patent_count) |>
  arrange(gvkey, year) |>
  filter(year >= 2000)

28.3.5.4 Building the input stocks

The final construction step applies the Koyck transformation of Equation 28.11 (with \(\lambda = 0.4\)) to each investment flow, producing the patent, technology, marketing, advertising, installed-base, and R&D stocks, and imputes the cost-of-capital and labor-cost inputs from industry–year means where firm-level values are missing. The cost of capital is proxied by interest expense over current debt (\(\text{xint}/\text{dlc}\)) and labor cost by per-employee staff expense (\(\text{xlr}/\text{emp}\)); both are filled hierarchically from four-digit down to one-digit SIC industry averages so that no firm-year is dropped for a single missing input.

Code

lambda <- 0.4

df_cap <- capability |>
  mutate(sic1 = substr(sic, 1, 1),
         sic2 = substr(sic, 1, 2),
         sic3 = substr(sic, 1, 3)) |>
  filter(xad >= 0, xrd >= 0) |>
  left_join(tech_innv,           by = join_by(gvkey, year)) |>
  left_join(tech_width,          by = join_by(gvkey, year)) |>
  left_join(patent_output_total, by = join_by(gvkey, year)) |>
  mutate(
    pat_count     = if_else(is.na(pat_count), 0, pat_count),
    dlc           = if_else(dlc == 0, NA, dlc),
    costofcapital = xint / dlc,            # interest expense / current debt
    emp           = if_else(emp == 0, NA, emp),
    xlr           = if_else(xlr <= 0, NA, xlr),
    laborcost     = xlr / emp
  )

# Hierarchical industry-year imputation of cost of capital and labor cost.
impute_grp <- function(df, ...) {
  df |>
    group_by(...) |>
    mutate(
      costofcapital = ifelse(is.na(costofcapital),
                             mean(costofcapital, na.rm = TRUE), costofcapital),
      laborcost     = ifelse(is.na(laborcost),
                             mean(laborcost, na.rm = TRUE), laborcost)
    ) |>
    ungroup()
}
df_cap <- df_cap |>
  impute_grp(sic, year)  |> impute_grp(sic3, year) |>
  impute_grp(sic2, year) |> impute_grp(sic1, year)

# Koyck geometric-lag stocks (equation @eq-koyck) for each investment flow.
koyck_stock <- function(x, lambda) {
  map_dbl(seq_along(x), ~ {
    if (all(is.na(x[1:.x]))) NA_real_
    else sum(x[1:.x] * lambda ^ (.x - seq_along(x[1:.x])), na.rm = TRUE)
  })
}
df_cap <- df_cap |>
  group_by(gvkey) |>
  arrange(year, .by_group = TRUE) |>
  mutate(
    pat_stock     = koyck_stock(pat_count,      lambda),
    techbase_innv = koyck_stock(tech_inv_patent, lambda),
    techbase_width= koyck_stock(tech_width,     lambda),
    marstock      = koyck_stock(xsga,           lambda),
    adstock       = koyck_stock(xad,            lambda),
    installedbase = koyck_stock(sale,           lambda),
    rdstock       = koyck_stock(xrd,            lambda)
  ) |>
  ungroup()

df_cap |> write_rds(file.path("data", "capability", "df_cap.rds"))

28.3.6 Estimating the Frontiers: Dutta, Narasimhan, and Rajiv (1999) Replication

With the panel built, the three frontiers of Equation 28.12 are estimated with the frontier package. Each capability is estimated twice—once with the innovativeness-adjusted technology measure and once with the width-adjusted measure—so that the robustness of the recovered efficiencies to the output-quality definition can be assessed. The marketing frontier is a production frontier (maximize sales), the operations frontier is a cost frontier (minimize cost of goods sold), and the recovered marketing efficiency is fed forward into the R&D and operations frontiers, operationalizing the interdependence of capabilities.

Code

library(plm)
library(tidyverse)
df_cap <- read_rds(file.path("data", "capability", "df_cap.rds"))

df_cap_panel_dutta <- df_cap |>
  filter(year > 2010) |>
  pdata.frame(c("gvkey", "year"))

Code

# Expensive: 6 stochastic-frontier MLE fits (tens of minutes), so — like every other
# frontier block in this chapter — it is pre-computed and saved to df_cap_panel_dutta.rds
# below, then read back by the analysis chunks. Set eval: true to re-estimate from scratch.
library(frontier)

# Marketing capability: production frontier for log sales.
cap_mar_innv <- frontier::sfa(
  log(sale) ~ log(adstock) + log(marstock) + log(techbase_innv) +
    log(rect) + log(installedbase) + sic1 + factor(year),
  ineffDecrease = TRUE,   # maximize: inefficiency lowers output
  timeEffect = TRUE,
  data = df_cap_panel_dutta
)
df_cap_panel_dutta$mar_eff_innv <-
  frontier::efficiencies(cap_mar_innv, asInData = TRUE)

cap_mar_width <- frontier::sfa(
  log(sale) ~ log(adstock) + log(marstock) + log(techbase_width) +
    log(rect) + log(installedbase) | sic1 + factor(year),
  ineffDecrease = TRUE, timeEffect = TRUE, data = df_cap_panel_dutta
)
df_cap_panel_dutta$mar_eff_width <-
  frontier::efficiencies(cap_mar_width, asInData = TRUE)

# R&D capability: marketing efficiency enters as an input (interdependence).
cap_rd_innv <- sfa(
  log(tech_inv_patent) ~ log(techbase_innv) + log(rdstock) +
    log(mar_eff_innv) + log(mar_eff_innv) * log(rdstock) |
    sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = TRUE, data = df_cap_panel_dutta
)
df_cap_panel_dutta$rd_eff_innv <-
  frontier::efficiencies(cap_rd_innv, asInData = TRUE)

cap_rd_width <- sfa(
  log(tech_width) ~ log(techbase_width) + log(rdstock) +
    log(mar_eff_width) + log(mar_eff_width) * log(rdstock) |
    sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = TRUE, data = df_cap_panel_dutta
)
df_cap_panel_dutta$rd_eff_width <-
  frontier::efficiencies(cap_rd_width, asInData = TRUE)

# Operations capability: cost frontier for log COGS (minimize).
cap_op_innv <- sfa(
  log(cogs) ~ log(invfg) + log(laborcost) + log(costofcapital) +
    log(techbase_innv) + log(mar_eff_innv) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = FALSE,  # minimize: inefficiency raises cost
  data = df_cap_panel_dutta
)
df_cap_panel_dutta$op_eff_innv <-
  frontier::efficiencies(cap_op_innv, asInData = TRUE)

cap_op_width <- sfa(
  log(cogs) ~ log(invfg) + log(laborcost) + log(costofcapital) +
    log(techbase_width) + log(mar_eff_width) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = FALSE, data = df_cap_panel_dutta
)
df_cap_panel_dutta$op_eff_width <-
  frontier::efficiencies(cap_op_width, asInData = TRUE)

df_cap_panel_dutta |>
  write_rds(file.path("data", "capability", "df_cap_panel_dutta.rds"))

The recovered efficiencies can then be correlated with each other and with sales to check that they behave as capability measures should—positively associated with performance and only moderately correlated with one another, since a firm strong in marketing need not be strong in R&D.

Code

df_cap_panel_dutta |> select(contains("eff")) |> na.omit() |> cor()

An alternative implementation uses the sfa package in place of frontier; the two agree up to numerical tolerance.

Code

library(sfa)

28.3.7 Alternative Capability Specifications

The Dutta frontier is one of several closely related specifications, and the panel built above supports the others with minor changes to the input set. We give three published variants, each differing in how output is measured and which stocks enter the frontier; the substantive payoff is that the recovered capability efficiencies are reassuringly correlated across specifications, lending the construct convergent validity.

28.3.7.1 Saboo, Kumar, and Anand (2017)

This specification uses lagged sales as the installed-base proxy and patent counts (rather than citation-weighted output) for the R&D frontier; the operations frontier is a standard cost frontier in operating expense. It is computationally heavy because of the lag structure.

Code

library(frontier)
df_cap_panel_saboo <- df_cap |>
  group_by(gvkey) |> arrange(year) |>
  mutate(sale_t_1 = dplyr::lag(sale, n = 1)) |>
  ungroup() |>
  pdata.frame(c("gvkey", "year"))

cap_mar <- sfa(
  log(sale) ~ log(xsga) + log(rect) + log(sale_t_1) | sic1 + factor(year),
  ineffDecrease = TRUE, timeEffect = TRUE, data = df_cap_panel_saboo
)
df_cap_panel_saboo$mar_eff <- efficiencies(cap_mar, asInData = TRUE)

cap_rd <- sfa(
  log(pat_count) ~ log(rdstock) + log(pat_stock) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = TRUE, data = df_cap_panel_saboo
)
df_cap_panel_saboo$rd_eff <- efficiencies(cap_rd, asInData = TRUE)

cap_op <- sfa(
  log(xopr) ~ log(act) + log(ppegt) + log(emp) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = FALSE, data = df_cap_panel_saboo
)
df_cap_panel_saboo$op_eff <- efficiencies(cap_op, asInData = TRUE)

28.3.7.2 Elhelaly and Ray (2023)

This variant (with Koyck weight 0.5) enriches each frontier with three-year accumulated stocks. The marketing frontier adds both advertising and its stock, both marketing expense and its stock, and both current and three-year receivables; the R&D frontier uses three-year patent and R&D accumulations; and the operations frontier is a cost frontier:

\[ \begin{aligned} \log(\text{sales}) &= \log(\text{xad}) + \log(\text{adstock}) + \log(\text{xsga}) + \log(\text{marstock}) \\ &\quad + \log(\text{rect}) + \log(\text{rec}) + \log(\text{installedbase}), \end{aligned} \tag{28.14}\]

where \(\text{rec}\) is accounts receivable summed over the three years prior;

\[ \log(\text{patent}) = \log(\text{patstock}) + \log(\text{xrd}) + \log(\text{accumrd}), \tag{28.15}\]

where \(\text{patstock}\) is the three-year patent total, \(\text{xrd}\) is total R&D expense, and \(\text{accumrd}\) is the three-year R&D total; and

\[ \log(\text{cogs}) = \log(\text{output}) + \log(\text{laborcost}) + \log(\text{costofcapital}), \tag{28.16}\]

where \(\text{output}\) is the dollar value of output, \(\text{laborcost}\) is per-employee wages and benefits, and \(\text{costofcapital}\) is the average long-term interest rate.

Code

df_cap_panel_elhelaly <- df_cap |>
  select(gvkey, year, contains("sic"), sale, xad, adstock, invfg, xsga,
         marstock, rect, installedbase, pat_count, pat_stock, xrd, cogs,
         laborcost, costofcapital) |>
  group_by(gvkey) |> arrange(gvkey, year) |>
  mutate(
    rec      = dplyr::lag(rect, 1) + dplyr::lag(rect, 2) + dplyr::lag(rect, 3),
    patstock = dplyr::lag(pat_count, 1) + dplyr::lag(pat_count, 2) + dplyr::lag(pat_count, 3),
    accumrd  = dplyr::lag(xrd, 1) + dplyr::lag(xrd, 2) + dplyr::lag(xrd, 3)
  ) |>
  pdata.frame(c("gvkey", "year"))

Code

library(frontier)

cap_mar <- frontier::sfa(
  log(sale) ~ log(xad) + log(adstock) + log(xsga) + log(marstock) +
    log(rect) + log(rec) + log(installedbase) | sic1 + factor(year),
  ineffDecrease = TRUE, timeEffect = TRUE, data = df_cap_panel_elhelaly
)
df_cap_panel_elhelaly$mar_eff <-
  frontier::efficiencies(cap_mar, asInData = TRUE)

cap_rd <- sfa(
  log(pat_count) ~ log(patstock) + log(xrd) + log(accumrd) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = TRUE, data = df_cap_panel_elhelaly
)
df_cap_panel_elhelaly$rd_eff <-
  frontier::efficiencies(cap_rd, asInData = TRUE)

cap_op <- sfa(
  log(cogs) ~ log(invfg) + log(laborcost) + log(costofcapital) | sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = FALSE, data = df_cap_panel_elhelaly
)
df_cap_panel_elhelaly$op_eff <-
  frontier::efficiencies(cap_op, asInData = TRUE)

df_cap_panel_elhelaly |>
  write_rds(file.path("data", "capability", "df_cap_panel_elhelaly.rds"))

Code

df_cap_panel_elhelaly <-
  read_rds(file.path("data", "capability", "df_cap_panel_elhelaly.rds"))
df_cap_panel_elhelaly |> select(contains("eff")) |> na.omit() |> cor()
#>             mar_eff      rd_eff      op_eff
#> mar_eff  1.00000000 -0.02846301 -0.05873300
#> rd_eff  -0.02846301  1.00000000 -0.04156963
#> op_eff  -0.05873300 -0.04156963  1.00000000

28.3.7.3 Cao, Feng, and Wiles (2023)

This specification uses a valuation outcome—Total Q in place of Tobin’s Q—for the marketing frontier and a lagged-R&D, lagged-patent-stock structure for the R&D frontier:

\[ \log(\text{Total } Q_t) = \log(\text{xsga}_t) + \log(\text{xsga}_{t-1}) + \log(\text{xad}) + \log(\text{pat}), \tag{28.17}\]

\[ \log(\text{pat}) = \log(\text{xrd}_t) + \log(\text{xrd}_{t-1}) + \log(\text{patstock}_{t-1}), \tag{28.18}\]

where \(\text{pat}\) is the number of patents.

Code

df_cap_panel_cao <- df_cap |>
  left_join(totalq, by = join_by(gvkey, year)) |>
  select(gvkey, year, contains("sic"), q_tot, xsga, xad,
         pat_count, xrd, pat_stock) |>
  group_by(gvkey) |> arrange(gvkey, year) |>
  mutate(
    xsga_t_1      = dplyr::lag(xsga, n = 1),
    xrd_t_1       = dplyr::lag(xrd, n = 1),
    pat_stock_t_1 = dplyr::lag(pat_stock, n = 1)
  ) |>
  pdata.frame(c("gvkey", "year"))

Code

library(frontier)

cap_mar <- frontier::sfa(
  log(q_tot) ~ log(xsga) + log(xsga_t_1) + log(xad) + log(pat_count) |
    sic1 + factor(year),
  ineffDecrease = TRUE, timeEffect = TRUE, data = df_cap_panel_cao
)
df_cap_panel_cao$mar_eff <-
  frontier::efficiencies(cap_mar, asInData = TRUE)

cap_rd <- sfa(
  log(pat_count) ~ log(xrd) + log(xrd_t_1) + log(pat_stock_t_1) |
    sic1 + factor(year),
  timeEffect = TRUE, ineffDecrease = TRUE, data = df_cap_panel_cao
)
df_cap_panel_cao$rd_eff <-
  frontier::efficiencies(cap_rd, asInData = TRUE)

df_cap_panel_cao |>
  write_rds(file.path("data", "capability", "df_cap_panel_cao.rds"))

Code

df_cap_panel_cao <-
  read_rds(file.path("data", "capability", "df_cap_panel_cao.rds"))
df_cap_panel_cao |> select(contains("eff")) |> na.omit() |> cor()
#>            mar_eff     rd_eff
#> mar_eff  1.0000000 -0.1092964
#> rd_eff  -0.1092964  1.0000000

28.3.8 The Broader Marketing-Capability Literature

The frontier-estimation tradition above is one strand of a much larger literature that decomposes marketing capability into functional sub-capabilities and links each to performance. The foundational input weights used in the stocks above—0.5 for marketing expenditure and 0.4 for R&D—come from Dutta, Narasimhan, and Rajiv (2005), and the brand/advertising-stock weighting from Peles (1971) and Z. Wang and Kim (2017). Building on these primitives, scholars have isolated specialized capabilities: marketing capability broadly (Bahadir, Bharadwaj, and Srivastava 2008; Xiong and Bharadwaj 2013; Wiles, Morgan, and Rego 2012; Mishra and Modi 2016; Dinner, Kushwaha, and Steenkamp 2018), marketing alliance capability (Swaminathan and Moorman 2009), digitized selling capability (D. S. Johnson 2005), big-data capability (J. S. Johnson, Friend, and Lee 2017), social-CRM capability (Trainor et al. 2014), and customer-relationship capability (Z. Wang and Kim 2017).

The single most influential disaggregation is Morgan, Slotegraaf, and Vorhies (2009), who link three core marketing capabilities to profit growth and uncover a tension invisible to coarser analyses. Although profit growth is a primary driver of stock price, the mechanism by which marketing capabilities feed it was poorly understood. Using a cross-industry sample of 114 firms, Morgan, Slotegraaf, and Vorhies (2009) decompose marketing capability into market sensing, brand management, and customer relationship management (CRM), and decompose profit growth into revenue growth and margin growth. The capabilities exert both direct and synergistic effects on the two growth components—but, critically, brand-management and CRM capabilities can counteract each other across the components: a capability that lifts revenue growth may simultaneously depress margin growth, and vice versa. A surface-level analysis that examines only aggregate profit growth would miss these offsetting effects and mis-state the true relationship between capabilities and performance. The methodological moral reinforces the theme of this chapter: the level of aggregation at which a metric is defined determines what relationships it can reveal.

A generic frontier implementation, parameterized by a sub_market grouping, ties the strands together; the sfaR package offers a cross-sectional alternative when the panel structure is unavailable.

Code

library(frontier)
mar_cap <- sfa(
  log(sales) ~ sub_market +
    log(adstock) + log(marketingstock) + log(techbase) +
    log(receivable) + log(installedbase),
  data = capability,
  ineffDecrease = TRUE,  # production frontier
  timeEffect = TRUE      # time-varying error-components frontier
)
efficiencies(mar_cap)

Code

library(sfaR)
mar_cap <- sfacross(
  log(sales) ~ sub_market +
    log(adstock) + log(marketingstock) + log(techbase) +
    log(receivable) + log(installedbase),
  data = capability
)
efficiencies(mar_cap)

28.3.8.1 Digital, Social-Media, and Organizational Capabilities

The capability lens has been extended to the digital domain. Survey-based studies of digital marketing capability (F. Wang 2020; Homburg and Wielgos 2022) and a synthesizing review (Herhausen et al. 2020) map the organizational routines through which firms convert digital assets into performance, while Nguyen et al. (2015) treats social-media strategic capability as a distinct competence.

A complementary line generalizes from functional capabilities to the embeddedness of capabilities in the organization. Grewal and Slotegraaf (2007) argue that managers must deploy scarce resources to build durable capabilities, and that neglecting the underlying processes obscures how capabilities translate into competitive advantage. Their central construct is capability embeddedness—the depth at which a capability is ingrained in the organization, itself a consequence of managerial resource-allocation decisions. Methodologically, they introduce a hierarchical composed-error framework (a multilevel generalization of the stochastic frontier) that applies to cross-sectional and panel data alike, and they show in a retailing application that capability embeddedness directly improves performance even after controlling for tangible and intangible resources. The framework’s payoff is diagnostic: recognizing whether the objectives of different capabilities are convergent or divergent tells a manager whether deepening embeddedness will amplify or undercut firm performance—an organizational echo of the revenue/margin tension in Morgan, Slotegraaf, and Vorhies (2009).

28.4 Management Metrics

A final, smaller family of metrics characterizes the firm’s strategy and leadership, which moderate how marketing investments convert into performance. Two constructs have proven especially measurable from archival data. Founding strategy, and specifically the degree of market differentiation a firm adopts at founding, can be recovered from the language of its early communications and product positioning; Guzman and Li (2023) develop a scalable measure of this founding differentiation. CEO overconfidence—a behavioral trait with consequences for innovation and investment—is measured from executives’ revealed reluctance to exercise in-the-money stock options, following the option-based approach Galasso and Simcoe (2011) apply to study innovation. Both illustrate the chapter’s recurring method: a psychological or strategic construct, once thought to require surveys or inside access, recovered at scale from the residue firms leave in archival and market data.

28.5 Key Takeaways

A metric is a definition plus an identification argument. The financial measures of Section 28.1 are easy to compute and hard to interpret causally: ROI and ROMI summarize an effect whose incremental numerator is itself the object of inference, and EVA and MVA price value creation only as well as invested capital and market expectations are measured. The accounting covariates are largely settled in their construction—Table 28.1 and Table 28.2 catalogue the conventions—but their role is to neutralize confounding, and they earn that role only when chosen to match the marketing decision under study. The marketing constructs of Section 28.2 exemplify the field’s methodological frontier: sentiment, purchase intention, trust, and reputation, once the province of surveys, are now recovered from text and images by pre-trained models whose accuracy and biases must themselves be validated. Capability, the subject of Section 28.3, is the hardest case and the template for the rest: a latent efficiency, identified only under explicit distributional and exogeneity assumptions, whose credibility rests entirely on the care taken in building the input stocks and specifying the frontier. Across all three families, the lesson is the same—report the metric, but report the assumptions that make it mean what you claim.

Bahadir, S. Cem, Sundar G Bharadwaj, and Rajendra K Srivastava. 2008. “Financial Value of Brands in Mergers and Acquisitions: Is Value in the Eye of the Beholder?” Journal of Marketing 72 (6): 49–64. https://doi.org/10.1509/jmkg.72.6.49.

Cao, Zixia, Hui Feng, and Michael A Wiles. 2023. “When Do Marketing Ideation Crowdsourcing Contests Create Shareholder Value? The Effect of Contest Design and Marketing Resource Factors.” Journal of Marketing, 00222429231191446.

Chakravarty, Anindita, and Rajdeep Grewal. 2011. “The Stock Market in the Drivers Seat! Implications for r&d and Marketing.” Management Science 57 (9): 1594–1609. https://doi.org/10.1287/mnsc.1110.1317.

———. 2016. “Analyst Earning Forecasts and Advertising and r&d Budgets: Role of Agency Theoretic Monitoring and Bonding Costs.” Journal of Marketing Research 53 (4): 580–96. https://doi.org/10.1509/jmr.14.0204.

Davis, James L., Eugene F. Fama, and Kenneth R. French. 2000. “Characteristics, Covariances, and Average Returns: 1929 to 1997.” The Journal of Finance 55 (1): 389–406. https://doi.org/10.1111/0022-1082.00209.

Dinner, Isaac M, Tarun Kushwaha, and Jan-Benedict E M Steenkamp. 2018. “Psychic Distance and Performance of MNCs During Marketing Crises.” Journal of International Business Studies 50 (3): 339–64. https://doi.org/10.1057/s41267-018-0187-z.

Dutta, Shantanu, Om Narasimhan, and Surendra Rajiv. 1999. “Success in High-Technology Markets: Is Marketing Capability Critical?” Marketing Science 18 (4): 547–68. https://doi.org/10.1287/mksc.18.4.547.

———. 2005. “Conceptualizing and Measuring Capabilities: Methodology and Empirical Application.” Strategic Management Journal 26 (3): 277–85. https://doi.org/10.1002/smj.442.

Elhelaly, Nehal, and Sourav Ray. 2023. “Collaborating to Innovate: Balancing Strategy Dividend and Transactional Efficiencies.” Elhelaly, N., & Ray, S.(2023). EXPRESS: Collaborating to Innovate: Balancing Strategy Dividend and Transactional Efficiencies. Journal of Marketing, 0 (Ja). Https://Doi. Org/10.1177/00222429231222269.

Galasso, Alberto, and Timothy S Simcoe. 2011. “CEO Overconfidence and Innovation.” Management Science 57 (8): 1469–84.

Grewal, Rajdeep, Anindita Chakravarty, Min Ding, and John Liechty. 2008. “Counting Chickens Before the Eggs Hatch: Associating New Product Development Portfolios with Shareholder Expectations in the Pharmaceutical Sector.” International Journal of Research in Marketing 25 (4): 261–72. https://doi.org/10.1016/j.ijresmar.2008.07.001.

Grewal, Rajdeep, Murali Chandrashekaran, and Alka V. Citrin. 2010. “Customer Satisfaction Heterogeneity and Shareholder Value.” Journal of Marketing Research 47 (4): 612–26. https://doi.org/10.1509/jmkr.47.4.612.

Grewal, Rajdeep, and Rebecca J Slotegraaf. 2007. “Embeddedness of Organizational Capabilities.” Decision Sciences 38 (3): 451–88.

Guzman, Jorge, and Aishen Li. 2023. “Measuring Founding Strategy.” Management Science 69 (1): 101–18.

Hartmann, Jochen, Mark Heitmann, Christina Schamp, and Oded Netzer. 2021. “The Power of Brand Selfies.” Journal of Marketing Research 58 (6): 1159–77.

Hartmann, Jochen, Mark Heitmann, Christian Siebert, and Christina Schamp. 2023. “More Than a Feeling: Accuracy and Application of Sentiment Analysis.” International Journal of Research in Marketing 40 (1): 75–87.

He, Sharlene, Eric T Anderson, and Derek D Rucker. 2023. “EXPRESS: Measuring Willingness to Pay: A Comparative Method of Valuation.” Journal of Marketing, 00222429231195564.

Herhausen, Dennis, Dario Miočević, Robert E. Morgan, and Mirella H. P. Kleijnen. 2020. “The Digital Marketing Capabilities Gap.” Industrial Marketing Management 90 (October): 276–90. https://doi.org/10.1016/j.indmarman.2020.07.022.

Homburg, Christian, and Dominik M. Wielgos. 2022. “The Value Relevance of Digital Marketing Capabilities to Firm Performance.” Journal of the Academy of Marketing Science 50 (4): 666–88. https://doi.org/10.1007/s11747-022-00858-7.

Jaffe, A. B., M. Trajtenberg, and R. Henderson. 1993. “Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations.” The Quarterly Journal of Economics 108 (3): 577–98. https://doi.org/10.2307/2118401.

Johnson, D. S. 2005. “Digitization of Selling Activity and Sales Force Performance: An Empirical Investigation.” Journal of the Academy of Marketing Science 33 (1): 3–18. https://doi.org/10.1177/0092070304266119.

Johnson, Jeff S., Scott B. Friend, and Hannah S. Lee. 2017. “Big Data Facilitation, Utilization, and Monetization: Exploring the 3Vs in a New Product Development Process.” Journal of Product Innovation Management 34 (5): 640–58. https://doi.org/10.1111/jpim.12397.

Kahle, Kathleen M., and René M. Stulz. 2017. “Is the US Public Corporation in Trouble?” Journal of Economic Perspectives 31 (3): 67–88. https://doi.org/10.1257/jep.31.3.67.

Kashmiri, Saim, and Vijay Mahajan. 2017. “Values That Shape Marketing Decisions: Influence of Chief Executive Officers’ Political Ideologies on Innovation Propensity, Shareholder Value, and Risk.” Journal of Marketing Research 54 (2): 260–78. https://doi.org/10.1509/jmr.14.0110.

Kim, MinChung, and Leigh M. McAlister. 2011. “Stock Market Reaction to Unexpected Growth in Marketing Expenditure: Negative for Sales Force, Contingent on Spending Level for Advertising.” Journal of Marketing 75 (4): 68–85. https://doi.org/10.1509/jmkg.75.4.68.

Koyck, Leendert Marinus. 1954. Distributed Lags and Investment Analysis. Vol. 4. North-Holland Publishing Company.

Liu, Yan, Venkatesh Shankar, and Wonjoo Yun. 2017. “Crisis Management Strategies and the Long-Term Effects of Product Recalls on Firm Value.” Journal of Marketing 81 (5): 30–48. https://doi.org/10.1509/jm.15.0535.

Malshe, Ashwin, and Manoj Agarwal. 2015. “From Finance to Marketing: Impact of Financial Leverage on Customer Satisfaction.” Journal of Marketing, June, 150626124337002. https://doi.org/10.1509/jim.15.0312.

Markovitch, Dmitri G., Joel H. Steckel, and Bernard Yeung. 2005. “Using Capital Markets as Market Intelligence: Evidence from the Pharmaceutical Industry.” Management Science 51 (10): 1467–80. https://doi.org/10.1287/mnsc.1050.0401.

McAlister, Leigh, Raji Srinivasan, Niket Jindal, and Albert A. Cannella. 2016. “Advertising Effectiveness: The Moderating Effect of Firm Strategy.” Journal of Marketing Research 53 (2): 207–24. https://doi.org/10.1509/jmr.13.0285.

Mishra, Saurabh, and Sachin B. Modi. 2016. “Corporate Social Responsibility and Shareholder Wealth: The Role of Marketing Capability.” Journal of Marketing 80 (1): 26–46. https://doi.org/10.1509/jm.15.0013.

Morgan, Neil A, Rebecca J Slotegraaf, and Douglas W Vorhies. 2009. “Linking Marketing Capabilities with Profit Growth.” International Journal of Research in Marketing 26 (4): 284–93.

Nezami, Mehdi, Stefan Worm, and Robert W. Palmatier. 2018. “Disentangling the Effect of Services on B2B Firm Value: Trade-Offs of Sales, Profits, and Earnings Volatility.” International Journal of Research in Marketing 35 (2): 205–23. https://doi.org/10.1016/j.ijresmar.2017.12.002.

Nguyen, Bang, Xiaoyu Yu, T. C. Melewar, and Junsong Chen. 2015. “Brand Innovation and Social Media: Knowledge Acquisition from Social Media, Market Orientation, and the Moderating Role of Social Media Strategic Capability.” Industrial Marketing Management 51 (November): 11–25. https://doi.org/10.1016/j.indmarman.2015.04.017.

Peles, Yoram. 1971. “Rates of Amortization of Advertising Expenditures.” Journal of Political Economy 79 (5): 1032–58. https://doi.org/10.1086/259813.

Rao, Vithala R., Manoj K. Agarwal, and Denise Dahlhoff. 2004. “How Is Manifest Branding Strategy Related to the Intangible Value of a Corporation?” Journal of Marketing 68 (4): 126–41. https://doi.org/10.1509/jmkg.68.4.126.42735.

Roy, Atanu, Jisu Huh, Alexander Pfeuffer, and Jaideep Srivastava. 2017. “Development of Trust Scores in Social Media (TSM) Algorithm and Application to Advertising Practice and Research.” Journal of Advertising 46 (2): 269–82. https://doi.org/10.1080/00913367.2017.1297272.

Rust, Roland T., William Rand, Ming-Hui Huang, Andrew T. Stephen, Gillian Brooks, and Timur Chabuk. 2021. “Real-Time Brand Reputation Tracking Using Social Media.” Journal of Marketing 85 (4): 21–43. https://doi.org/10.1177/0022242921995173.

Saboo, Alok R, V Kumar, and Ankit Anand. 2017. “Assessing the Impact of Customer Concentration on Initial Public Offering and Balance Sheet–Based Outcomes.” Journal of Marketing 81 (6): 42–61.

Swaminathan, Vanitha, and Christine Moorman. 2009. “Marketing Alliances, Firm Networks, and Firm Value Creation.” Journal of Marketing 73 (5): 52–69. https://doi.org/10.1509/jmkg.73.5.52.

Trainor, Kevin J., James (Mick) Andzulis, Adam Rapp, and Raj Agnihotri. 2014. “Social Media Technology Usage and Customer Relationship Performance: A Capabilities-Based Examination of Social CRM.” Journal of Business Research 67 (6): 1201–8. https://doi.org/10.1016/j.jbusres.2013.05.002.

Trajtenberg, Manuel. 1990a. “A Penny for Your Quotes: Patent Citations and the Value of Innovations.” The RAND Journal of Economics 21 (1): 172. https://doi.org/10.2307/2555502.

———. 1990b. “Product Innovations, Price Indices and the (Mis)measurement of Economic Performance.” https://doi.org/10.3386/w3261.

Wang, Fatima. 2020. “Digital Marketing Capabilities in International Firms: A Relational Perspective.” International Marketing Review 37 (3): 559–77. https://doi.org/10.1108/imr-04-2018-0128.

Wang, Zhan, and Hyun Gon Kim. 2017. “Can Social Media Marketing Improve Customer Relationship Capabilities and Firm Performance? Dynamic Capability Perspective.” Journal of Interactive Marketing 39 (August): 15–26. https://doi.org/10.1016/j.intmar.2017.02.004.

Wies, Simone, Arvid Oskar Ivar Hoffmann, Jaakko Aspara, and Joost M. E. Pennings. 2019. “Can Advertising Investments Counter the Negative Impact of Shareholder Complaints on Firm Value?” Journal of Marketing 83 (4): 58–80. https://doi.org/10.1177/0022242919841584.

Wiles, Michael A., Neil A. Morgan, and Lopo L. Rego. 2012. “The Effect of Brand Acquisition and Disposal on Stock Returns.” Journal of Marketing 76 (1): 38–58. https://doi.org/10.1509/jm.09.0209.

Xiong, Guiyang, and Sundar Bharadwaj. 2013. “Asymmetric Roles of Advertising and Marketing Capability in Financial Returns to News: Turning Bad into Good and Good into Great.” Journal of Marketing Research 50 (6): 706–24. https://doi.org/10.1509/jmr.12.0278.

The item definitions follow the WRDS data-items documentation (wrds_data_items). Items prefixed in CRSP/Compustat Merged (e.g., PRCC_C, CSHO) are drawn from the merged fundamentals files; market-price and shares-outstanding items used for market capitalization come from CRSP.↩︎
The fine-tuned purchase-intention model is distributed publicly as a large RoBERTa checkpoint, allowing researchers to score new English-language text for expressed purchase intent without retraining.↩︎
The two large input files (dynass_nber_searle.dta and patents_conveyance.dta) are too large to redistribute here but are publicly archived; the code reconstructs the firm-year patent counts from them.↩︎