flowchart TB
subgraph Funnel["Purchase Funnel (firm's view)"]
direction TB
A1[Awareness] --> A2[Familiarity] --> A3[Consideration] --> A4[Purchase] --> A5[Loyalty]
end
subgraph Journey["Consumer Decision Journey (consumer's view)"]
direction TB
B1[Initial consideration set] --> B2[Active evaluation]
B2 --> B3[Moment of purchase]
B3 --> B4[Post-purchase experience]
B4 -->|loyalty loop| B1
B2 -->|set expands| B1
end
13 Advertising
Advertising is paid, mediated communication from an identified sponsor designed to shift what consumers know, feel, and ultimately do. It is among the largest discretionary line items a firm controls and one of the oldest objects of study in marketing, yet two facts about it remain in productive tension. The first is that advertising plainly works: a long tradition of meta-analysis and market-response modeling documents that it moves consumer mindsets, behavior, and—through those channels—firm value (Wang, Zhang, and Ouyang 2008; Hewett et al. 2016). The second is that, measured honestly, its average effect is small, noisy, and frequently indistinguishable from zero. The most rigorous modern field evidence finds that the bulk of brands earn statistically insignificant or negative advertising elasticities, that observational estimates are swamped by selection, and that a majority of observed advertising schedules carry negative marginal returns (B. Shapiro, Hitsch, and Tuchman 2018; B. T. Shapiro, Hitsch, and Tuchman 2021; Gordon et al. 2017; Lewis and Rao 2015). Reconciling “advertising works” with “most advertising doesn’t pay” is the central problem of the field, and it is fundamentally a problem of measurement.
This chapter is organized around three lenses that the literature applies to that problem, and the reader who finishes it should be able to move fluently among them. The behavioral lens asks how an exposure is processed inside a single mind: which psychological stages a message must clear to change a belief, when affect substitutes for cognition, and how involvement moderates the route to persuasion. The theoretical lens asks how rational agents—consumers, advertisers, and intermediaries—interact when advertising is informative, intrusive, and costly. The econometric lens asks the hardest question of all: given non-experimental data in which firms advertise because they expect demand, how does one recover the causal effect of advertising rather than the spurious correlation between spending and the sales that motivated it? We treat the behavioral models as the source of testable mechanisms, the theoretical models as the source of equilibrium discipline, and the econometric models as the arbiter of what is actually true. Throughout, the unifying thread is that advertising’s effect is best understood not as a level but as an increment above a baseline—and that the entire empirical enterprise turns on how credibly that baseline is constructed (Gordon et al. 2019).
A note on scope and ethics frames everything that follows. Audiences dislike intrusive advertising, especially when they are vulnerable (Tybout and calkins 2005), and the welfare consequences of targeting, deception, and repetition recur as substantive questions rather than asides. Advertising that persuades by exploiting vulnerability is both a behavioral phenomenon and a regulatory one, and the chapter treats it as both.
13.1 The Behavioral Approach
The behavioral tradition opens the consumer’s head and asks what an advertisement has to accomplish there. Its oldest and most durable idea is that persuasion is a sequence of conditional stages, each of which can fail.
13.1.1 Persuasion as a Stochastic Information-Processing Chain
McGuire and Staelin (1983) models the effect of a persuasive communication as a chain of information-processing steps, each of which the receiver must clear before the next becomes possible. Because the steps are sequential and each is probabilistic, the probability that an exposure ultimately changes behavior is the product of the stage-wise conditional probabilities. Writing the stages as presentation (the message reaches the receiver), attention, comprehension, yielding (attitude change), retention (persistence of the new belief), and behavior, the persuasion probability is
\[ P(\text{behavior}) = P(\text{presentation}) \cdot P(\text{attention} \mid \text{presentation}) \cdot P(\text{comprehension} \mid \text{attention}) \cdot P(\text{yielding} \mid \text{comprehension}) \cdot P(\text{retention} \mid \text{yielding}) \cdot P(\text{behavior} \mid \text{retention}). \tag{13.1}\]
The multiplicative form in Equation 13.1 is the source of the model’s most useful implication: because the stages compound, a campaign is only as strong as its weakest link, and very high performance on one stage cannot rescue a near-zero probability on another. Each stage maps onto a distinct managerial lever and a distinct measurement instrument—presentation to reach and media weight, attention to recognition tests, comprehension to recall and semantic-differential profiles, yielding to measured attitude change, retention to delayed attitude measurement, and behavior to observed choice. The independent variables that move these probabilities are organized in McGuire and Staelin (1983) as a communication matrix crossing source, message, channel, receiver, and destination. Advertising appeals operate on the yielding stage through valence: positive (gain-framed) appeals reduce anxiety, negative (loss-framed) appeals raise it. The model’s chief limitation—and the seam along which later work pulled it apart—is its assumption of a single fixed hierarchy in which cognition always precedes affect which always precedes behavior.
The “hierarchy of effects” tradition that Equation 13.1 formalizes traces to practitioner think-/feel-/do- models of the 1960s. Its scientific value is not the specific ordering, which is often wrong, but the discipline of decomposing a single aggregate effect into separately measurable conditional stages. Most modern diagnostics—funnels, brand-lift studies, attention metrics—are descendants of this decomposition.
13.1.2 From Funnel to Journey
The managerial counterpart to the processing chain is the purchase funnel: awareness, familiarity, consideration, purchase, and loyalty (with loyalty split into active and passive forms). The funnel is a firm’s-eye abstraction—a narrowing pipeline the marketer pushes prospects down. The influential reframing associated with McKinsey’s consumer decision journey replaces the narrowing funnel with a cyclical process organized around the consumer’s perspective: an initial consideration set, a phase of active evaluation in which the set can expand, a moment of purchase, and a post-purchase experience that feeds loyalty and future consideration. The two views are complementary rather than competing, as Figure 13.1 contrasts; a useful way to hold them together is that the journey adds a dimension the funnel suppresses, turning a one-directional narrowing into a recurring loop. Batra and Keller (2016) push this further into a dynamic, expanded customer decision journey that integrates paid, owned, and earned media across traditional and digital touchpoints and asks how communications interact across them.
13.1.3 Antecedents, Processing, and Consequences
The decisive theoretical advance over the fixed hierarchy is to treat advertising processing as a contingent system whose route depends on the consumer’s needs, motivation, ability, and opportunity. Maclnnis and Jaworski (1989) supply the canonical synthesis, organizing the field’s prior strands—the hierarchy-of-effects models, the multiattribute attitude and cognitive-response models of the 1970s—into a single antecedents → processing → consequences architecture.
On the antecedent side, needs are partitioned into utilitarian needs (functional problem-solving) and expressive needs, the latter splitting into socially expressive needs (to project or reflect a self-image) and experiential needs (to satisfy sensory or cognitive appetites). The pivotal antecedent is motivation to process brand information in the ad—what the consumer-behavior literature calls involvement—which may be situational (tied to the moment) or enduring (tied to the person). Motivation governs attention and processing capacity: higher motivation buys greater attention and greater capacity for analyzing the ad, but capacity is finite, so capacity spent on the ad is unavailable for competing tasks. Utilitarian needs direct attention to brand attributes; expressive needs direct it to symbolic and experiential value. Processing then proceeds through escalating levels— feature analysis, basic categorization, meaning analysis, information integration, role-taking, and constructive processing—moderated by ability and opportunity. The consequences are cognitive responses (thoughts) and emotional responses (feelings), and these can be partitioned by referent into message-related (brand-relevant), execution-related (brand-irrelevant), and viewing-context-related (environmental) responses. The architecture’s value is integrative: it subsumes the elaboration-likelihood model of persuasion (Petty and Cacioppo 1986), Mitchell’s brand-processing model, and Lutz’s typology of ad responses as special cases. A central testable claim is that attitude toward the ad (\(A_{\text{Ad}}\)) mediates attitude toward the brand (\(A_{\text{B}}\)) specifically at the higher processing levels—meaning analysis, information integration, and role-taking—implying a managerial trade-off between investing an ad’s finite real estate in brand/message content versus execution.
The complementary review by Vakratsas and Ambler (1999) abstracts across this entire literature with a single reduced-form schematic—advertising input passes through filters (e.g., motivation, ability) into the consumer’s cognition, affect, and experience, which in turn drive behavior—and then taxonomizes the field’s models by which of these mediators they admit and in what order. Table 13.1 organizes the principal families.
| Family | Mediating sequence | Core mechanism | Representative theory |
|---|---|---|---|
| Cognitive (C) | Cognition only | Information changes beliefs; price vs. non-price ads move price sensitivity asymmetrically | Information-economics models |
| Pure affect (A) | Affect only | Mere exposure; wear-in then wear-out yields an inverted-U | Two-factor / optimal-arousal theories |
| Persuasive hierarchy (CA) | Cognition → affect → behavior | Elaboration drives attitude; involvement moderates | ELM (Petty and Cacioppo 1986); Maclnnis and Jaworski (1989) |
| Low-involvement (CEA) | Cognition → experience → affect | Behavior precedes firm attitudes | Low-involvement learning |
| Integrative (CAE) | Cognition → affect → experience | Expectations then trial confirm/disconfirm | Two-stage trial models |
| Hierarchy-free (NH) | No fixed order | Route is contingent on context | Vakratsas and Ambler (1999) |
Two empirical generalizations survive this taxonomy and recur below. First, experience, affect, and cognition all serve as mediators of advertising effects, so a model that omits any one is misspecified for some product class. Second, short-term advertising elasticities are small and decline over the product life cycle—a behavioral fact that anticipates the econometric findings of Section 13.3. At the aggregate level, the venerable finding that roughly three exposures per purchase cycle is “optimal” reflects the same affective dynamics of wear-in and wear-out (Vakratsas and Ambler 1999). The broader program of cataloguing such regularities is synthesized by Wind and Sharp (2009), who compile a set of empirical generalizations about advertising while flagging the field’s persistent gaps in boundary conditions, advertising properties, and measurement.
13.1.4 Cognitive and Affective Mediators
The cognitive route formalizes persuasion as the algebra of the thoughts an ad provokes. Wright (1973) distinguishes three modes of spontaneous cognitive response— support arguments that reinforce the message, counterarguments that resist it, and source derogations that attack the communicator—and models acceptance as their weighted net balance,
\[ \text{Acceptance} = w_{\text{SA}} \sum_{i} \text{SA}_i - w_{\text{CA}} \sum_{j} \text{CA}_j - w_{\text{SD}} \sum_{k} \text{SD}_k, \tag{13.2}\]
where \(\text{SA}\), \(\text{CA}\), and \(\text{SD}\) count support arguments, counterarguments, and source derogations, and the weights \(w\) scale their persuasive impact. Equation 13.2 makes the cognitive-response prediction precise: anything that suppresses counterargument production—distraction, low involvement, or the message-relevant situational factors of content-processing involvement and message modality emphasized by Wright (1973)—mechanically raises acceptance. This is the engine behind Batra and Ray (1986a)’s analysis of repetition: repeated exposure raises brand attitude and purchase intention while support and counterargument production are low, but once such production rises, attitudes level off and then decline. Repetition’s benefit therefore wears out, and the wear-out can be pushed back by varying ad execution; the appropriate dosing interval is the purchase cycle, with roughly three exposures per cycle a common optimum. Ability, motivation, and opportunity are the antecedents that govern when this cognitive processing engages at all.
The affective route insists that evaluation does not reduce to cognition. The landmark statement is Zajonc (1980)’s, which deserves quotation because it reorients the entire downstream literature.
Preferences need no inferences: affective reactions to stimuli can be acquired first, can occur without extensive perceptual and cognitive encoding, and can influence subsequent judgments independently of and prior to cognition.
On this view affect is a system-1 response that can precede and bias the slower, deliberate system-2 cognition rather than following from it. The mechanisms are classical (Pavlovian) conditioning—an automatic physiological reaction transferred to a previously neutral stimulus—and evaluative conditioning, the direct transfer of affect from one stimulus to another, which Gibson (2008) shows (using implicit-measure designs) reshapes explicit attitudes mainly when no strong prior preference exists. A useful distinction within affect separates diffuse, long-lasting moods from discrete, source-specific emotions. The empirical payoff of taking affect seriously is large and well replicated: ad-evoked feelings raise brand attitudes both directly and indirectly through attitude toward the ad, across involvement levels and product types, with the effect strongest for hedonic rather than utilitarian products (Pham, Geuens, and De Pelsmacker 2013); emotional reactions mediate advertising’s effect on attitudes toward both ad and brand (Holbrook and Batra 1987); and attitude toward the ad itself shifts brand attitudes, with execution cues and source likeability mattering most in low-involvement settings (Batra and Ray 1986b; MacKenzie, Lutz, and Belch 1986; Mitchell and Olson 1981). Validated scales separate the hedonic and utilitarian dimensions of consumer attitude that this literature relies on (Batra and Ahtola 1991; Voss, Spangenberg, and Grohmann 2003).
Two further results sharpen the picture. Counterintuitively, momentary interruptions can promote persuasion by amplifying arousal and the need for completion, an effect concentrated among consumers low in need for cognition (Kupor and Tormala 2015). And when creative strategy is decomposed into composite content metrics, the components rank by their effect on advertising elasticity in a clear order—experiential content first, then cognitive, then affective (Dall’Olio and Vakratsas 2022)—a finding that gives the cognition/affect/experience trichotomy of Table 13.1 direct managerial teeth.
A distinct memory mechanism complicates all of the above: ads do not arrive in isolation. Burke and Srull (1988) document competitive interference, in which one ad degrades memory for another. Memory for a brand’s ad is impaired retroactively by later ads—both from competitors in the same class and from other products in the same manufacturer’s line—and proactively by prior ads, and the memory benefit of repetition is itself blunted by the presence of competing advertising. Interference is why elasticities estimated in clutter-free laboratories overstate field effects, a theme Section 13.3 returns to.
13.1.5 Involvement and the Two Routes to Persuasion
The construct that organizes the behavioral literature’s contingencies is involvement—loosely, how deeply a consumer wants to think about a product. Two frameworks dominate: the elaboration-likelihood model (ELM) (Petty and Cacioppo 1979) and the heuristic-systematic model (Chaiken 1980). The ELM’s core claim is that persuasion travels one of two routes. The central route engages effortful elaboration of message arguments; attitudes formed this way are durable and predictive of behavior. The peripheral route relies on cues outside the argument—source attractiveness, mere repetition, mood—and yields attitudes that are temporary and weakly predictive. Petty, Cacioppo, and Schumann (1983) supply the canonical demonstration via a crossover interaction: argument quality moves attitudes more under high involvement, while a peripheral cue (a celebrity endorser versus an ordinary one) moves attitudes more under low involvement. This single experiment is the field’s cleanest evidence that message content and executional cues are substitutes whose relative potency is governed by involvement.
Petty and Cacioppo (1986) frame the central/peripheral distinction in terms of motivation rather than involvement, and there is a methodological reason to follow them: the involvement construct is notoriously fragmented across the literature, so framing results in terms of motivation invites less definitional dispute. The marketing literature further unbundles the ELM’s “ability” into separate ability and opportunity factors, yielding the motivation–ability–opportunity (MAO) triad as the joint antecedent of elaboration (Petty, Cacioppo, and Schumann 1983). For review and operationalization, Muehling, Laczniak, and Andrews (1993) is the preferable entry point, with Andrews, Durvasula, and Akhter (1990) a complement and Zaichkowsky (1985) the canonical scale.
The most rigorous theoretical treatment of involvement is Greenwald and Leavitt (1984)’s, which recovers the construct from a cognitive-capacity foundation rather than a managerial one. They distinguish attentional capacity (the limited resource Kahneman calls effort, demanded in increasing amounts as task complexity rises) from attentional arousal (a state of wakefulness that facilitates well-learned responses), and array processing into four levels of increasing depth: preattention (little capacity), focal attention (modest capacity to decipher the message), comprehension (capacity to analyze it), and elaboration (capacity to integrate it into existing knowledge). Their principle of higher-level dominance states that when the effects of different processing levels oppose one another, the highest engaged level dominates the net result—deeper, more deliberate judgment wins. They also draw the conceptual line, since muddied, between involvement (the audience/observer’s depth of processing) and what is better termed engagement (the actor/participant’s behavior). That distinction has acquired fresh empirical content online, where Schivinski, Christodoulides, and Dabrowski (2016), building on Muntinga, Moorman, and Smit (2011), scale consumer engagement along three escalating tiers—consumption (viewing, reading), contribution (liking, sharing, commenting), and creation (posting, producing content).
Involvement also moderates the affective route. Batra and Stayman (1990) show that positive mood reduces cognitive elaboration and biases the perceived quality of arguments, allowing mood to influence brand attitudes peripherally. Macinnis, Rao, and Weiss (2002) add a maturity condition: because affective processing requires no motivation while the persuasive use of endorsers does, affectively based executional cues most reliably lift sales for mature, familiar brands whose consumers already possess the relevant product knowledge. A standing caution from the meta-analytic literature is that laboratory advertising studies diverge from the field precisely on the dimensions that involvement implicates: forced exposure, unmeasured choice, and the absence of competitive clutter, decay, repeated exposure, and brand maturity.
13.1.6 Visual Cues
Finally, the visual construal of an advertisement interacts with the brand’s positioning to determine willingness to pay. Chu, Chang, and Lee (2021) show that the effect of the visual distance between the depicted product and the consumer flips with brand image: for prestige brands associated with status and luxury, attitudes and willingness to pay a premium rise as that distance grows (distance connotes exclusivity), whereas for popular brands associated with broad appeal and social connectedness, closeness is what lifts attitudes and willingness to pay. Visual strategy is therefore not separable from positioning—the same image is an asset for one brand archetype and a liability for the other.
13.1.7 Theoretical Foundations
The behavioral results above are not a loose collection of findings but the surface of a small number of governing theories, and naming them clarifies why the mechanisms recur. The organizing theory is dual-process persuasion, formalized for marketing by the elaboration likelihood model (Petty and Cacioppo 1986; Petty, Cacioppo, and Schumann 1983) and its heuristic-systematic cousin (Chaiken 1980): an exposure is processed either through an effortful central route that scrutinizes argument quality or through a peripheral route that leans on cues, and which route engages depends on the receiver’s motivation and ability. This is the same fast/slow distinction that Zajonc (1980) invokes when affect precedes cognition, and it explains why involvement moderates nearly every behavioral result in the chapter. The hierarchy-of-effects tradition supplies the complementary idea that persuasion is a sequence of separately measurable conditional stages (Equation 13.1), so that a single aggregate effect decomposes into stages each governed by its own lever; the contingent-route view of Maclnnis and Jaworski (1989) and Vakratsas and Ambler (1999) is what frees that hierarchy from a single fixed ordering.
Three further theories account for effects the route models do not. Mere exposure holds that repetition alone raises liking even absent recognition (Zajonc 1980), grounding the wear-in half of the repetition dynamics. Attitude theory, in the reasoned-action and planned-behavior tradition, links the attitudes an ad forms to intentions and ultimately behavior (Ajzen 1991), supplying the causal chain that the funnel and the processing models presuppose. And attention economics treats the consumer’s processing capacity as the genuinely scarce resource (Greenwald and Leavitt 1984): a message competes not against rival arguments but against everything else contending for finite attention, which is why competitive interference degrades recall and why attention metrics anchor modern diagnostics. Finally, the behavioral-economics strand enters advertising through salience and framing: decision weights tilt toward attributes that stand out in the context (Bordalo, Gennaioli, and Shleifer 2013), and logically equivalent gain- versus loss-framed appeals move choice differently because outcomes are judged against a reference point rather than in absolute terms (Tversky and Kahneman 1981; Kahneman and Tversky 1979). Framing is thus the behavioral-economics formalization of the valence lever that Equation 13.1 locates at the yielding stage.
13.2 The Theoretical Approach
The behavioral models describe a single mind; the theoretical approach asks what happens when consumers, advertisers, and intermediaries are strategic. The exemplar here is the economics of privacy and targeting. Choi, Jerath, and Sarvary (2023) build an analytical model with three classes of actor—consumers, an advertiser, and an ad network—in which advertising is genuinely valuable (it conveys product information that guides the purchase journey) but also costly to the consumer along two margins: a distaste for online tracking and ad wear-out from repetition. Tracking lets the advertiser infer a consumer’s journey stage and target accordingly, so the consumer’s opt-in decision trades the benefit of more relevant information against the cost of wear-out.
The model’s value is that it generates non-obvious comparative statics that no purely behavioral account would predict. Consumers opt in to tracking when ad effectiveness is intermediate or when their sensitivity to wear-out is low; and—counterintuitively— opting in can reduce the number of repeat ads a consumer receives when effectiveness is intermediate, because targeting substitutes relevance for repetition. Most strikingly, higher ad effectiveness, though good for the advertiser per exposure, can lower the ad network’s profit by reducing the number of repeated ads it can sell. These results carry directly into policy: by deriving why a rational consumer would or would not consent to tracking, the model gives regulators contemplating consent requirements a structural account of who opts in and what it does to surplus on all sides of the market. The general lesson is that targeting and repetition are equilibrium objects—choosing them well requires modeling the consumer’s reaction, not just the firm’s.
13.3 The Econometric Approach
The behavioral and theoretical lenses generate mechanisms; the econometric lens asks which of them survive contact with real, non-experimental data. Its defining problem is endogeneity. Firms do not assign advertising at random—they spend because they forecast demand, target their best markets and best-timed weeks, and cut spending when they expect to sell poorly anyway. Let \(y_{it}\) be a sales outcome for unit \(i\) in period \(t\) and \(a_{it}\) advertising. The naïve regression
\[ y_{it} = \beta\, a_{it} + \mathbf{x}_{it}^{\top}\boldsymbol{\gamma} + \varepsilon_{it} \tag{13.3}\]
recovers the causal advertising effect \(\beta\) only if \(\mathbb{E}[\varepsilon_{it}\,a_{it}]=0\). That condition almost never holds: any demand shock the firm anticipates and responds to (a holiday, a competitor’s exit, favorable weather) enters \(\varepsilon_{it}\) and is correlated with \(a_{it}\), biasing \(\hat\beta\). Because firms target favorable conditions, the bias is typically positive, so naïve estimates overstate effectiveness—and the bias can be large relative to the true effect.
The benchmark that disciplines this literature is the randomized experiment, and where experiments exist they are sobering. Lewis and Rao (2015) and Gordon et al. (2017) demonstrate that advertising effects are so small relative to the variance of sales that observational methods routinely fail to recover them—producing estimates that may be too large or too small by orders of magnitude—and that even large field experiments require enormous samples to detect plausible effects. The correct conceptual target, following Gordon et al. (2019), is the incremental effect: the lift in behavior above the baseline the consumer would have exhibited absent the ad. When randomization is infeasible, the practical remedy is to construct a credible baseline—for instance, panel data across geographic markets that lets one estimate each market’s no-advertising counterfactual and read the advertising effect as the deviation from it. Table 13.2 summarizes the principal strategies and what breaks each.
| Strategy | Identifying assumption | What breaks it |
|---|---|---|
| Naïve regression (Equation 13.3) | Advertising uncorrelated with demand shocks | Targeting/timing of spend (almost always violated) |
| Randomized experiment | Exposure assigned at random | Infeasibility, scale, non-compliance, spillovers |
| Panel / fixed effects + geo baseline | Unobservables constant within unit; cross-market counterfactual valid | Time-varying shocks correlated with spend |
| Instrumental variables | Instrument moves ads but not demand directly | Weak or invalid (non-excludable) instruments |
| Quasi-experiment / regulatory shock | Treatment timing as-good-as-random | Confounding co-events (e.g., algorithm changes) |
| Border / discontinuity designs | Outcomes continuous across an ad-market boundary | Sorting or other discontinuities at the border |
13.3.1 A Simulation of the Endogeneity Problem
The bias in Equation 13.3 is concrete enough to reproduce. The following simulation constructs markets in which a manager sets advertising as an increasing function of a demand shock she observes but the analyst does not. The naïve regression then recovers a coefficient far above the true effect, while a specification that conditions on the shock recovers the truth—illustrating why a credible baseline is the whole game.
Code
set.seed(10)
n_markets <- 2000
true_beta <- 0.5 # the causal advertising effect we want back
# Demand shock the MANAGER sees but the ANALYST does not (the confounder)
demand_shock <- rnorm(n_markets, 0, 1)
# Manager targets favorable markets: spend rises with the anticipated shock
advertising <- 2 + 1.2 * demand_shock + rnorm(n_markets, 0, 0.5)
# Sales depend on advertising (true_beta) AND the same hidden shock
sales <- 5 + true_beta * advertising + 2.0 * demand_shock + rnorm(n_markets, 0, 1)
naive <- coef(lm(sales ~ advertising))["advertising"]
adjusted <- coef(lm(sales ~ advertising + demand_shock))["advertising"]
cat("True effect :", true_beta, "\n")
#> True effect : 0.5
cat("Naive estimate :", round(naive, 3), " (biased upward)\n")
#> Naive estimate : 1.915 (biased upward)
cat("Baseline-adjusted :", round(adjusted, 3), " (recovers the truth)\n")
#> Baseline-adjusted : 0.484 (recovers the truth)The naïve estimate is biased upward by exactly the mechanism Gordon et al. (2017) warn about: the manager’s targeting loads the omitted shock onto the advertising coefficient. Only when the analyst controls the demand baseline—the empirical analogue of a randomized control group or a cross-market counterfactual—does the estimate return to the truth.
13.3.2 Estimating Advertising Effects and Elasticities
Once identification is taken seriously, the estimated magnitudes shrink. B. T. Shapiro, Hitsch, and Tuchman (2021) estimate the distribution of TV advertising elasticities and returns across a large set of product categories and find elasticities below prior benchmarks, a large share of estimates statistically insignificant or negative, and—on the return side—over 80% of brands facing negative marginal returns, with only about a third earning a positive return from their observed schedule. The pattern is robust to functional form and is not an artifact of low power or measurement error, echoing the prior-distribution evidence in B. Shapiro, Hitsch, and Tuchman (2018) that a substantial share of consumer-packaged-goods elasticities are insignificant or negative. The collected experimental and quasi-experimental program— B. Shapiro (2016), Blake and Coey (2014), Lewis, Rao, and Reiley (2011), Guitart and Stremersch (2021)—reinforces the same conclusion: credibly identified advertising effects are small, heterogeneous, and frequently unprofitable at the margin.
These small short-run elasticities are consistent with the behavioral prediction of Vakratsas and Ambler (1999) that elasticities decline over the life cycle, and they motivate the search for effects on outcomes other than immediate own-brand sales—word of mouth, website traffic, firm value—where advertising’s influence may be larger or more reliably detected.
13.3.3 Carryover and the Optimal Data Interval
Every elasticity in the previous section presumes an answer to a prior question: over what horizon does advertising act? Advertising’s effect is not contemporaneous. An exposure today raises sales today and—through memory, deferred purchase, and word of mouth—raises them tomorrow, next week, and progressively less thereafter. The canonical representation of this decay is the geometric distributed lag, in which a single immediate coefficient \(\beta\) governs the impact of current advertising \(a_t\) and a single carryover parameter \(\lambda \in [0,1)\) geometrically discounts all past exposures,
\[ s_t = \mu + \beta \sum_{k=0}^{\infty} \lambda^{k} a_{t-k} + \varepsilon_t . \tag{13.4}\]
The infinite sum in Equation 13.4 is not directly estimable, but the Koyck transformation—subtracting \(\lambda s_{t-1}\) from \(s_t\)—collapses it into a finite autoregressive form,
\[ s_t = \mu(1-\lambda) + \lambda\, s_{t-1} + \beta\, a_t + \underbrace{\varepsilon_t - \lambda \varepsilon_{t-1}}_{\text{MA(1) error}} . \tag{13.5}\]
1 is the workhorse of advertising dynamics: a regression of sales on its own lag and current advertising recovers the carryover \(\lambda\) as the coefficient on \(s_{t-1}\) and the immediate effect \(\beta\) as the coefficient on \(a_t\), and the long-run multiplier—the total cumulative effect of a permanent one-unit increase in advertising—is \(\beta/(1-\lambda)\). The price of the transformation is the moving-average error it induces, which makes ordinary least squares biased in finite samples and is the reason instrumental-variables or maximum-likelihood corrections appear throughout this literature.
A subtler and more consequential problem precedes estimation, however: at what data interval should 1 be fit? Scanner systems deliver sales and advertising at intervals from seconds to months, and the carryover estimate is not invariant to the choice. Tellis and Franses (2006) confront the conventional wisdom directly. It had been “long known in marketing and econometrics that temporally aggregated data biases estimates of the duration of advertising’s effect” (Clarke 1976; Leone 1995), and the received remedy, seeded by Clarke (1976) and hardened into an axiom over the following decades, held three things: (i) the optimal interval is the interpurchase time, (ii) data more disaggregate than this causes a “disaggregation bias,” and (iii) recovering the true parameters requires assuming a model of the underlying advertising process. Tellis and Franses (2006) overturn all three. The optimal interval is instead what they call the unit exposure time—“the largest calendar period in the time frame under study such that advertising exposure occurs at most once in that period, and if it occurs, it does so at the same time in that period.” Data more disaggregate than the unit exposure time carries no disaggregation bias (it merely costs more to store and process); and recovery of the true microparameters requires only data at the unit exposure time together with the correctly adjusted model—no assumption about the advertising process itself. The result holds for any linear dynamic model linking sales to current and past advertising.
The unit exposure time is governed by the advertising schedule, not the purchase cycle. If a brand runs one television flight a week, the largest period containing at most one exposure is the week, so the unit exposure time is one week; if it advertises on fixed days, the unit exposure time is a day; for monthly magazine insertions it is a month. The mechanism that makes this the right interval is misattribution under aggregation: a data interval coarser than the unit exposure time lumps several exposures into one bucket, assigns them all to a single arbitrary point, and attributes the blended decay of all of them as if it were one average effect—“inappropriate attribution [that] exaggerates the duration interval.” A data interval at or finer than the unit exposure time keeps successive exposures in separate buckets and so is free of this misattribution.
Crucially, recovering the true microparameters from data at the unit exposure time requires the adjusted model, not the naïve one. Aggregating 1 over \(K\) microperiods (with the single pulse occurring at position \(i\) within each period) yields an extended Koyck model carrying an additional lagged-advertising term,
\[ S_T = \lambda^{K} S_{T-1} + \beta_1 A_T + \beta_2 A_{T-1} + \varepsilon_T - \lambda^{K}\varepsilon_{T-1}, \qquad \beta_1 = \beta\!\!\sum_{k=0}^{K-i}\!\lambda^{k},\;\; \beta_2 = \beta\!\!\sum_{k=K-i+1}^{K-1}\!\lambda^{k}, \tag{13.6}\]
from which the true microparameters are recovered as \(\lambda = \widehat{\lambda^{K}}^{\,1/K}\) and \(\beta = (\beta_1+\beta_2)(1-\lambda)/(1-\lambda^{K})\). Fitting the original Koyck 1—the same model used on the microdata—to aggregated data omits the \(A_{T-1}\) term and is exactly the misspecification that produces the classic data-interval bias. The following replication of Tellis and Franses (2006)’s DGP1 makes both the bias and its repair concrete. A daily Koyck process is simulated with a true current effect \(\beta = 1\) and a true daily carryover \(\lambda = 0.9\) (long-run multiplier \(\beta/(1-\lambda)=10\)), with a single advertising pulse each week (every Thursday)—so the unit exposure time is one week. The series is then estimated three ways: the original Koyck on the daily microdata; the same (wrong) Koyck on the weekly data; the extended Koyck Equation 13.6 on the weekly data (the unit exposure time); and the original Koyck on monthly data (coarser than the unit exposure time).
Code
library(tidyverse)
set.seed(1)
# --- True DAILY data-generating process: DGP1 of Tellis & Franses (2006) ---
beta <- 1; lambda <- 0.9 # current effect; daily carryover
daily <- expand.grid(
day = c("mon", "tue", "wed", "thu", "fri", "sat", "sun"),
week = 1:4, month = 1:12, year = 1:30
) |>
tibble::rowid_to_column("t") |>
mutate(
ad = if_else(day == "thu", runif(n()), 0), # ONE pulse per week (Thursday)
e = rnorm(n(), 0, 0.1),
sales = 0
)
for (t in 2:nrow(daily)) { # generate the Koyck dynamics
daily$sales[t] <- lambda * daily$sales[t - 1] + beta * daily$ad[t] +
daily$e[t] - lambda * daily$e[t - 1]
}
daily <- slice(daily, -1) # drop the seeded first row
# Aggregate to weekly and monthly buckets, in true time order
weekly <- daily |> group_by(year, month, week) |>
summarise(sales = sum(sales), ad = sum(ad), .groups = "drop") |>
arrange(year, month, week)
monthly <- daily |> group_by(year, month) |>
summarise(sales = sum(sales), ad = sum(ad), .groups = "drop") |>
arrange(year, month)
# (a) ORIGINAL Koyck: sales_t = lambda*sales_{t-1} + beta*ad_t
koyck <- function(d) {
s <- summary(lm(sales ~ lag(sales) + ad - 1, data = d))$coefficients
c(lambda_hat = s[1, 1], beta_hat = s[2, 1])
}
# (b) EXTENDED Koyck at the unit exposure time (K microperiods per UET):
# S_T = lambda^K S_{T-1} + b1 A_T + b2 A_{T-1}, then recover
# lambda = phi^(1/K) and beta = (b1 + b2)(1 - lambda) / (1 - phi).
koyck_ext <- function(d, K) {
s <- summary(lm(sales ~ lag(sales) + ad + lag(ad) - 1, data = d))$coefficients
phi <- s[1, 1]; b1 <- s[2, 1]; b2 <- s[3, 1]
lam <- phi^(1 / K); bet <- (b1 + b2) * (1 - lam) / (1 - phi)
c(lambda_hat = lam, beta_hat = bet)
}
results <- bind_rows(
c(interval = "daily (microdata, orig. Koyck)", koyck(daily)),
c(interval = "weekly (orig. Koyck = wrong model)", koyck(weekly)),
c(interval = "weekly (extended Koyck = UET)", koyck_ext(weekly, 7)),
c(interval = "monthly (orig. Koyck, > UET)", koyck(monthly))
) |>
mutate(
across(c(lambda_hat, beta_hat), \(x) round(as.numeric(x), 3)),
long_run_hat = round(beta_hat / (1 - lambda_hat), 2),
lambda_true = lambda, beta_true = beta, long_run_true = beta / (1 - lambda)
)
results
#> # A tibble: 4 × 7
#> interval lambda_hat beta_hat long_run_hat lambda_true beta_true long_run_true
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 daily (… 0.885 1.01 8.81 0.9 1 10
#> 2 weekly (… 0.664 3.37 10.0 0.9 1 10
#> 3 weekly (… 0.895 1.04 9.93 0.9 1 10
#> 4 monthly … 0.27 7.26 9.95 0.9 1 10The pattern reproduces Tellis and Franses (2006)’s Table 1. On the daily microdata the original Koyck returns the truth (\(\hat\lambda \approx 0.89\), \(\hat\beta \approx 1.0\)). Fitting that same model to the weekly data is badly biased—\(\hat\lambda\) collapses toward zero (to \(\approx 0.66\)) and \(\hat\beta\) inflates several-fold (to \(\approx 3.4\))—which is precisely the “data-interval bias” that Clarke (1976) documented and that the older literature wrongly blamed on choosing too fine an interval. The repair is not finer data but the right model at the unit exposure time: the extended Koyck Equation 13.6 fitted to the same weekly data recovers the true microparameters (\(\hat\lambda \approx 0.90\), \(\hat\beta \approx 1.0\)). This is Tellis and Franses (2006)’s central result—the week, not some interpurchase cycle, is the optimal interval, and one need not assume anything about the advertising process to get there. Daily data would also recover the truth (with no disaggregation bias), but the week is optimal because it is the largest, and therefore the cheapest to collect, interval that still admits unbiased recovery. Aggregating beyond the unit exposure time, however, is fatal: a month holds four Thursday pulses, so even the extended single-lag model cannot disentangle them, and the monthly original Koyck is severely biased (\(\hat\beta \approx 7\)). One quantity is notably robust—the long-run multiplier \(\beta/(1-\lambda)\) stays near its true value of \(10\) at every interval, because the cumulative effect is conserved even as its split into a quick hit and a slow decay is scrambled. The managerial lesson is sharp: a firm that fits a naïve carryover model to monthly data will conclude that advertising has almost no memory and a large instantaneous punch—precisely backwards—and will flight its budget wrongly, even though its estimate of advertising’s total return is roughly right. The fix is to collect at the unit exposure time and adjust the model, not to chase ever-finer data or to default to the purchase cycle.
13.3.4 Advertising’s Effects on Word of Mouth, Traffic, and Firm Value
A productive response to small sales elasticities is to widen the dependent variable. Advertising shapes price sensitivity through the consideration set and the relative strength of preference (Mitra and Lynch, Jr. 1995), and informative advertising reduces the probability that a horizontally differentiated consumer fails to choose her best alternative by roughly ten percent (Anand and Shachar 2011). A large body of work then traces advertising into online outcomes. TV advertising measurably moves online word of mouth (WOM) about both the brand and the program—the phenomenon of social TV, the joint consumption of a program and social-media activity about it (Fossen and Schweidel 2017)—and ads aired in programs with higher social-TV activity drive more online shopping, with affective (funny, emotional) ads most effective (Fossen and Schweidel 2019b). Crucially, the most-discussed programs are not always the best vehicles for brand engagement, and online program engagement predicts a larger ad audience because engaged viewers are less likely to change channels during breaks (Fossen and Bleier 2021). Product placement follows the same logic: prominent, especially verbal, placements raise online conversation and web traffic, with decreasing returns at high prominence and little enhancement from nearby TV advertising (Fossen and Schweidel 2019a).
When advertising and consumer-generated media are modeled jointly, content classification becomes decisive. For movies, pre-release blog volume and advertising drive opening-day box office while post-release blog valence and advertising drive later performance, with effects heterogeneous across markets in ways tied to demographics and identifiable through instrumental variables (Gopinath, Chintagunta, and Venkataraman 2013). Over a product’s life, only recommendation-oriented WOM valence directly moves sales—not sheer WOM volume—and its influence grows over time while both attribute- and emotion-based advertising wear out, with rational attribute messages wearing out faster than emotional ones (Gopinath, Thomas, and Krishnamurthi 2014). The same architecture lets firms be classified as consumer-driven (WOM dominant) or firm-driven (advertising dominant), which dictates where the marketing dollar should go. The macro view—Hewett et al. (2016)—finds that advertising does not directly move traditional media, online WOM, or consumer sentiment, yet does directly affect firm outcomes, underscoring that the channel through which advertising pays varies by context.
Advertising’s reach extends to capital markets and even to public welfare. Fich, Starks, and Tran (2024) show that target-firm advertising raises takeover premiums and negotiating power while lowering acquirer announcement returns—consistent with targets using advertising to communicate value to investors and acquirers—so that even failed bids leave a permanent value gain. Ownership structure feeds back onto advertising: under common ownership, competing firms with shared institutional blockholders cut advertising, an effect identified using mutual-fund mergers as an exogenous shock and concentrated in competitive, advertising-intensive industries (Lu et al. 2022). The spillovers of a rival’s bankruptcy depend on a firm’s own marketing investments, moderated by industry growth (for advertising) and concentration (for R&D) (Jindal and Slotegraaf 2023). And advertising can move societal outcomes: during the COVID-19 pandemic, TV ads carrying COVID narratives raised social-distancing behavior—eleven times more in counties without government mandates—identified through a border strategy exploiting discontinuities across TV markets, evidence that brand-sponsored advertising can substitute for absent public policy (Ghosh Dastidar, Sunder, and Shah 2023).
13.3.5 Spillovers and Attribution
Two econometric problems deserve separate treatment because they recur across digital channels. The first is spillover: an ad’s effect is not confined to its target. Fossen, Mallapragada, and De (2021) show that political TV ads exert positive ad-to-ad spillovers—the ad following a political ad suffers an 89% smaller audience decline and gains positive online chatter—filling a gap in research that had largely ignored how one ad conditions response to the next. A rich literature on search and display spillovers (Kitts et al. 2014; Joo et al. 2014; Sahni 2016; Yang and Ghose 2010; Ghose and Todri-Adamopoulos 2016; Rutz and Bucklin 2011) establishes that estimating an ad’s effect while ignoring its interaction with adjacent advertising biases the estimate, often badly.
The second is attribution: assigning credit for a conversion across the many touchpoints that preceded it (Simonson et al. 2001; Lambrecht and Tucker 2013; Li and Kannan 2014; Blake, Nosko, and Tadelis 2015; Zantedeschi, Feit, and Bradlow 2017). Attribution is hard for the same reason Equation 13.3 is biased—the touchpoints a consumer encounters are selected, not assigned—so last-click and similar heuristics confound correlation with cause. The methodological lesson is uniform across spillover and attribution: the unit of analysis is rarely a single isolated exposure, and treating it as one reintroduces the endogeneity the field works so hard to remove.
13.3.6 Advertising Content and Consumer Demand for Ads
Two further econometric streams round out the picture. A content literature (Xu et al. 2014; Teixeira, Picard, and el Kaliouby 2014; Tucker 2015; Liaukonyte, Teixeira, and Wilbur 2015; Rao and Wang 2015; Sudhir, Roy, and Cherian 2016) measures how what an ad says and shows—its emotional arc, its information content, its persuasive devices—maps to attention, sharing, and purchase, connecting the behavioral creative-strategy results of Dall’Olio and Vakratsas (2022) to field outcomes. And a demand-for-ads literature (Goldstein et al. 2014; Wilbur 2016; Tuchman, Nair, and Gardete 2017) treats advertising as something consumers themselves have preferences over—tolerating, avoiding, or even seeking it— which closes the loop back to the theoretical model of Choi, Jerath, and Sarvary (2023): if consumers choose their exposure, advertising volume and targeting are equilibrium outcomes, not exogenous treatments.
13.3.7 Deceptive Advertising
The welfare and regulatory edge of advertising is sharpest where messages are deceptive. Regulatory reports of deceptive advertising depress stock-market performance, an effect that strong brand reputation partially buffers (Wiles et al. 2010). Rao (2022) study a particularly modern form—fake-news advertising, promotional content that mimics the format of surrounding news without disclosure—and ask what happens when the Federal Trade Commission shuts the channel down. The accounting framework decomposes a merchant’s demand into organic visits, fake-news-ad referrals, and regular-ad referrals, allowing direct effects and cross-channel spillovers to be separated:
\[ \begin{aligned} Q_{\text{merchant}} &= Q_{\text{org}} + Q_{\text{fn}} + Q_{\text{reg}}, \\ Q_{\text{org}} &= \alpha_{\text{org}} + \gamma_1^{\text{fn}} Ad_{\text{fn}} + \gamma_1^{\text{reg}} Ad_{\text{reg}}, \\ Q_{\text{fn}} &= \alpha_{\text{fn}} Ad_{\text{fn}} + \gamma_2^{\text{reg}} Ad_{\text{reg}}\, \mathbb{I}(Ad_{\text{fn}} > 0), \\ Q_{\text{reg}} &= \alpha_{\text{reg}} Ad_{\text{reg}} + \gamma_3^{\text{fn}} Ad_{\text{fn}}\, \mathbb{I}(Ad_{\text{reg}} > 0), \end{aligned} \tag{13.7}\]
where the \(\alpha\) terms are direct effects and the \(\gamma\) terms cross-channel spillovers, with \(fn\) denoting fake news and \(reg\) regular advertising. Two effects are of interest: a treatment effect (fake news directly drives \(Q_{\text{fn}}\) and spills over to organic and regular channels through \(\gamma_1^{\text{fn}}\) and \(\gamma_3^{\text{fn}}\)) and a selection effect (in the channel’s absence, some consumers substitute to other channels while others stop searching entirely). After the shutdown, regular-ad referrals rise—fake-news and regular ads are substitutes—and the probability that a merchant draws a complaint falls by about 8%, though somewhat mechanically. The selection effect is small: substitution to legitimate channels is dominated by the decline in organic demand and fake-news referrals. The estimated decline is best read as an upper bound on the regulation’s true impact, because a contemporaneous search-algorithm change cannot be ruled out—a textbook illustration of the quasi-experimental confound flagged in Table 13.2. With outcomes that are 95% zeros, the design sits near the boundary where rare-events corrections to logistic regression become relevant (King and Zeng 2001).
13.3.8 A Worked Estimate: Geo-Experimental Lift
To make the recommended remedy concrete, the following example implements the geo-baseline logic of Gordon et al. (2019): estimate each market’s counterfactual from a holdout of untreated markets, then read the advertising effect as the lift above that baseline. The simulation embeds a true incremental effect of a 6% sales lift and recovers it via a difference-in-differences contrast between treated and control geographies.
Code
set.seed(2024)
n_geo <- 200
true_lift <- 0.06 # true incremental effect: +6%
# Baseline sales differ across markets and across pre/post periods
geo_base <- rnorm(n_geo, 100, 15)
period_shock <- 4 # common seasonal lift, both arms
treated <- rep(c(TRUE, FALSE), each = n_geo / 2) # randomized at the GEO level
pre <- geo_base + rnorm(n_geo, 0, 3)
post <- geo_base + period_shock +
ifelse(treated, true_lift * geo_base, 0) + rnorm(n_geo, 0, 3)
did <- data.frame(
geo = 1:n_geo, treated = treated,
delta = post - pre # within-geo change
)
# Difference-in-differences: treated change minus control change
est_lift_pct <- (mean(did$delta[did$treated]) -
mean(did$delta[!did$treated])) / mean(geo_base)
cat("True incremental lift :", paste0(round(100 * true_lift, 1), "%"), "\n")
#> True incremental lift : 6%
cat("Estimated lift (DiD) :", paste0(round(100 * est_lift_pct, 1), "%"), "\n")
#> Estimated lift (DiD) : 6.4%By differencing each market against its own pre-period and then against untreated markets, the design strips out both fixed cross-market heterogeneity and the common seasonal shock, isolating the incremental effect the naïve regression could not. This is the empirical embodiment of the chapter’s central claim: advertising’s effect is a credibly constructed increment above a baseline, and the quality of that baseline, not the sophistication of the regression, is what makes the estimate believable.
13.4 Key Takeaways
- Advertising’s effect is best defined as an increment above a baseline, not a raw correlation between spending and sales; the entire empirical enterprise turns on how credibly that baseline is built (Gordon et al. 2019).
- The behavioral tradition decomposes persuasion into conditional, separately measurable stages (Equation 13.1) and shows the route—central versus peripheral, cognitive versus affective—is contingent on involvement, motivation, ability, and opportunity (Petty, Cacioppo, and Schumann 1983; Maclnnis and Jaworski 1989; Vakratsas and Ambler 1999).
- Affect can precede and bias cognition rather than follow it (Zajonc 1980), and ad-evoked feelings move brand attitudes robustly, especially for hedonic products (Pham, Geuens, and De Pelsmacker 2013).
- Targeting and repetition are equilibrium objects: modeling consumers’ strategic response to tracking and wear-out yields non-obvious comparative statics with direct policy content (Choi, Jerath, and Sarvary 2023).
- Credibly identified advertising effects are small, heterogeneous, and often unprofitable at the margin; naïve regressions overstate them because firms target favorable conditions (B. T. Shapiro, Hitsch, and Tuchman 2021; Gordon et al. 2017; Lewis and Rao 2015).
- Advertising’s effect is dynamic: the geometric distributed lag and its Koyck form
- split it into a current effect and a carryover, and the optimal data interval for recovering that split is the unit exposure time—the largest period holding at most one ad pulse—not the interpurchase time; at that interval the adjusted (extended) Koyck model (Equation 13.6) recovers the true microparameters, finer data adds cost but no bias, and aggregating beyond it biases the carryover while roughly preserving the long-run multiplier (Tellis and Franses 2006; Clarke 1976).
- Widening the dependent variable—to word of mouth, traffic, firm value, even public health—and respecting spillover and attribution often reveals effects that immediate own-brand sales elasticities miss (Hewett et al. 2016; Fossen, Mallapragada, and De 2021; Ghosh Dastidar, Sunder, and Shah 2023).