69 Reporting Research Results

A research result that no one can read, reuse, or act on is not yet a result; it is private knowledge. Reporting is the discipline that converts an estimate, a test, or a model into an object that a reader can evaluate and a decision-maker can use. This chapter treats reporting as a first-class methodological problem rather than a clerical afterthought. The choices made here—which numbers reach a table, how an effect is drawn, what is held back, how the whole pipeline can be re-run—determine whether a finding survives scrutiny and whether it changes behavior. Two audiences read marketing research and they read it differently: a scholarly audience that wants identification, uncertainty, and the means to replicate, and a managerial audience that wants a decision, its expected payoff, and the risk of being wrong. A good report serves both without lying to either.

The chapter proceeds from substance to surface. It first fixes the object of communication—an estimate together with its uncertainty—and the principle that governs all honest reporting: show the reader enough to second-guess you. It then treats the two workhorses of static communication, tables and figures, formally enough to say when each is the right instrument and what makes one misleading. From there it builds the case for reproducible reporting, where the document and the computation are one artifact, and gives runnable machinery for it. It contrasts the academic and managerial registers, which differ not in honesty but in what they foreground. It closes with dashboards—live, interactive reporting—and the metric-governance problems they create, because a number shown continuously to a manager is a number that will be optimized, sometimes against the firm’s interest.

69.1 What Reporting Communicates

The atomic unit of a quantitative report is not a number but an estimate with its uncertainty. Let $\theta$ be a population quantity of interest—an elasticity, a treatment effect, a market share, a lift. An estimator $\hat\theta$ computed from data is a random variable; a single report of $\hat\theta$ alone is therefore incomplete, because it suppresses the sampling distribution that tells the reader how seriously to take it. The minimal honest report is the triple

\[ \bigl(\hat\theta,\ \widehat{\operatorname{se}}(\hat\theta),\ \text{the model and data that produced them}\bigr), \tag{69.1}\]

from which a reader can reconstruct confidence intervals, test statistics, and— crucially—decide whether the precision is adequate for their decision, which is rarely the author’s decision. Reporting only $\hat\theta$ invites the reader to treat noise as signal; reporting only a $p$-value invites them to treat “significant” as “large,” a confusion the field has spent two decades trying to unlearn.¹ The governing principle is sufficiency for second-guessing: a report should contain enough for a competent skeptic to redo the inference, vary the assumptions, and reach their own verdict.

This reframing has an immediate consequence for what belongs in a report. Three quantities travel together and should be reported together: a point estimate (the best single guess), a measure of dispersion (standard error, posterior standard deviation, or interval), and a specification (the estimator, the controls, the sample). A fourth—an economic translation of the estimate into units the audience cares about (dollars, share points, conversions) —is what separates a report that informs from one that merely records. The managerial value of marketing research rests on making that translation explicit and defensible (Hanssens and Pauwels 2016).

A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. The motto of the statistician should be de omnibus dubitandum—everything is to be doubted. Reporting is where that doubt is made visible to the reader rather than hidden from them.

— after the spirit of the exploratory-data-analysis tradition

69.1.1 The Two-Audience Problem, Formalized

Reporting is a communication channel, and like any channel it has a sender, a message, and a receiver who decodes under noise (Chapter 23 develops the marketing–finance translation; the information-theoretic framing recurs in the diffusion and word-of-mouth literatures). The sender (the researcher) holds a posterior over $\theta$; the receiver (reader) holds a prior and a loss function that prices the consequences of acting on a wrong belief. The two canonical receivers price errors differently:

The scholarly receiver is loss-averse over false claims. The cost of asserting an effect that is not there (a Type I error that enters the literature) is high and durable, because others will build on it. This receiver wants conservative inference, full disclosure of researcher degrees of freedom, and replicability—precisely so the channel does not transmit noise as if it were signal.
The managerial receiver is loss-averse over bad decisions. The cost is the expected regret of choosing the wrong action under the estimate, integrated over the estimate’s uncertainty. This receiver may rationally act on a noisy estimate if the downside of inaction is worse, and wants the estimate translated into the decision’s own currency.

These are different loss functions, not different standards of honesty. The same $(\hat\theta, \widehat{\operatorname{se}})$ feeds both; what changes is what is foregrounded and what is relegated to an appendix. A report that confuses the two—burying the decision under robustness tables for a manager, or suppressing identification concerns to impress a manager—fails the audience it was written for. The sophistication of marketing research has risen precisely as the field learned to hold rigor and relevance simultaneously rather than trading one for the other (Lehmann, McAlister, and Staelin 2011).

69.2 Tables and Figures

Tables and figures are the two static encodings of quantitative results, and they are not interchangeable. The operative distinction is between lookup and comparison. A table is a lookup structure: it preserves exact values and lets a reader retrieve a specific number (“what was the coefficient on price?”). A figure is a comparison structure: it maps numbers to position, length, or color so the reader’s visual system can perceive relations—trends, gaps, interactions— that a table forces them to compute by hand. The design rule follows directly: use a table when the reader needs values; use a figure when the reader needs to see a pattern.

69.2.1 When a Table

A table is the right instrument when (i) exact magnitudes matter and will be quoted or recomputed, (ii) the entries are heterogeneous in kind (estimates, standard errors, fit statistics, sample sizes) and cannot share a single visual scale, or (iii) the number of items is small enough that the eye can scan them. Regression results are the paradigm case: a reader needs the coefficient, its standard error, and the model’s fit, to four meaningful digits, in a form they can transcribe. The canonical estimation table reports, for each specification, the coefficient vector $\hat{\boldsymbol\beta}$, dispersion beneath each estimate, and goodness-of-fit at the foot, with specifications across columns so the reader can trace how an estimate moves as controls are added—the single most informative robustness display, because a coefficient that swings when a plausible control enters is a coefficient that is not identified.

A table also imposes obligations. Precision should be meaningful, not maximal: reporting eight digits of a quantity estimated to two is a form of false confidence, and rounding to the precision the data support is itself an honesty requirement. Units, sample size, and the estimation window belong in the caption or notes, not in the reader’s memory. And tables in a published artifact must be generated by code, never typed, because a hand-transcribed table is a table with an unverifiable error rate.

The example below fits two nested specifications and renders a publication-style comparison table directly from the model objects, so the printed numbers are, by construction, the numbers the model produced.

Code

library(dplyr)
library(broom)
library(knitr)
library(kableExtra)
set.seed(43)

# Simulated brand-choice-style data: price, a loyalty cue, and a promotion flag.
n <- 600
promo   <- rbinom(n, 1, 0.4)
loyalty <- rnorm(n)
price   <- 3 + 0.5 * promo + rnorm(n, sd = 0.4)        # promos lower observed price below
# True data-generating process for log-sales:
log_sales <- 6 - 1.2 * price + 0.8 * loyalty + 0.5 * promo + rnorm(n, sd = 0.5)
dat <- data.frame(log_sales, price, loyalty, promo = factor(promo))

m1 <- lm(log_sales ~ price, data = dat)
m2 <- lm(log_sales ~ price + loyalty + promo, data = dat)

Code

fmt <- function(model, label) {
  # Keep cells plain text: the book renders to HTML, PDF, and EPUB from one
  # execution, so HTML-only markup (e.g., <br>) would leak into LaTeX.
  # "(Intercept)" is renamed because booktabs' \midrule treats a row that
  # begins with a literal "(" as a malformed trim spec, breaking the PDF.
  tidy(model) |>
    transmute(term = sub("^\\(Intercept\\)$", "Intercept", term),
              !!label := sprintf("%.3f (%.3f)", estimate, std.error))
}
tab <- full_join(fmt(m1, "Sparse"), fmt(m2, "Full"), by = "term")
gof <- tibble(term = "Adj. R-squared",
              Sparse = sprintf("%.3f", summary(m1)$adj.r.squared),
              Full   = sprintf("%.3f", summary(m2)$adj.r.squared))
bind_rows(tab, gof) |>
  kable(escape = FALSE, align = "lrr",
        caption = "Log-sales regressions. Standard errors in parentheses. Adding the loyalty cue and promotion control shifts the price coefficient, the first thing a careful reader checks.") |>
  kable_styling(full_width = FALSE)

Log-sales regressions. Standard errors in parentheses. Adding the loyalty cue and promotion control shifts the price coefficient, the first thing a careful reader checks.
term	Sparse	Full
Intercept	5.073 (0.270)	5.878 (0.163)
price	-0.852 (0.083)	-1.164 (0.054)
loyalty	NA	0.774 (0.021)
promo1	NA	0.485 (0.051)
Adj. R-squared	0.147	0.752

Notice what the two-column layout buys: the price coefficient is visible before and after controls enter, and its stability (or movement) is the report’s most honest robustness signal—far more informative than any single starred number.

69.2.2 When a Figure

A figure is the right instrument when the message is a relationship: a trend over time, a dose–response curve, an interaction, a distribution, an uncertainty band. The reason is perceptual. Position along a common scale is the visual encoding humans judge most accurately; length and angle are next; area, color saturation, and hue are progressively less accurate.² A figure that encodes its key comparison as position on a shared axis lets the reader perceive the result; one that encodes it as color or area forces them to estimate, badly. The design corollary is that the variable carrying the result should be encoded by position, and decoration that competes for that channel should be removed.

Three failure modes recur and each has a name and a fix. A truncated axis— starting a bar chart’s value axis above zero—exaggerates differences by mapping a small ratio onto a large length; bars must baseline at zero because their meaning is length. Overplotting—thousands of points collapsing into an ink blob—hides the distribution it purports to show; the fix is transparency, binning, or a density encoding. Uncertainty erasure—drawing a point estimate as a clean line with no band—is the graphical analogue of reporting $\hat\theta$ without $\widehat{\operatorname{se}}(\hat\theta)$, and is arguably the most common dishonesty in applied figures, because a clean line feels certain. The honest default is to draw the interval, not just the estimate.

The figure below makes the last point operational. It plots the model-implied relationship between price and log-sales together with a 95% confidence band, so the reader sees both the estimated slope and how well-pinned it is.

Code

library(ggplot2)
grid <- data.frame(price = seq(min(dat$price), max(dat$price), length.out = 100),
                   loyalty = 0, promo = factor(0, levels = c(0, 1)))
pred <- predict(m2, newdata = grid, interval = "confidence")
grid <- cbind(grid, pred)

ggplot(grid, aes(price, fit)) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.2) +
  geom_line(linewidth = 1) +
  geom_point(data = dat, aes(price, log_sales), alpha = 0.15, inherit.aes = FALSE) +
  labs(x = "Price", y = "Log-sales (predicted)") +
  theme_minimal(base_size = 12)

Figure 69.1: Model-implied price response with a 95% confidence band. The band, not the line, is the honest object: it shows the reader how precisely the slope is identified.

69.2.3 Choosing Between Them

The choice is not stylistic but functional, and it can be stated as a short decision rule. The table below contrasts the two instruments on the dimensions that actually decide the call.

Table 69.1: Tables versus figures: the choice follows from what the reader must do with the result.

Dimension	Table	Figure
Reader's task	Look up a value	See a relationship
Encodes	Symbols (digits)	Position, length, color
Best when	Few heterogeneous numbers	Many values or a trend
Exact values	Preserved	Approximate
Patterns/interactions	Reader must compute	Perceived directly
Failure mode	False precision; hand-typed errors	Truncated axis; hidden uncertainty

A useful heuristic: if you find yourself asking the reader to compare across rows or columns of a table, the comparison wants to be a figure; if you find yourself annotating a figure with exact numbers the reader will quote, those numbers want to be a table. Many reports need both—a figure for the pattern, a table in the appendix for the values—and that redundancy is a feature, because it serves the two audiences at once.

69.3 Reproducible Reporting

The deepest reporting failure is not a bad table or a misleading axis; it is a result no one can regenerate, including its author six months later. Reproducible reporting is the practice of binding the narrative (prose, tables, figures) to the computation (data, code, environment) in a single executable artifact, so that the reported numbers are produced by running the document rather than pasted into it. Formally, let a report be a function

\[ R = f(D, C, E), \tag{69.2}\]

where $D$ is the data, $C$ the analysis code, $E$ the computational environment (language and package versions, seeds), and $f$ the build process. A report is reproducible when re-evaluating $f$ on the same $(D, C, E)$ returns a byte-identical $R$, and robust when small, defensible perturbations to $C$ or $E$ leave the substantive conclusions intact. The two are distinct: reproducibility is about the pipeline, robustness about the science. A result can be perfectly reproducible and entirely fragile.

This framing dissolves a category of error that copy-and-paste workflows make nearly inevitable: the transcription gap, where the number in the manuscript no longer matches the number the code produces because the data updated, the model changed, or a digit was mistyped. In a literate-programming workflow—the model this book is written in, where prose and {r}/{python} chunks live in one .qmd file and the build executes the code to insert results—the transcription gap cannot open, because there is no transcription. Marketing’s move into data-rich, code-intensive analysis makes this non-negotiable: when the analysis is a pipeline of joins, models, and simulations, the only credible report is one the pipeline emits (Wedel and Kannan 2016).

Figure 69.2 lays out the dependency structure such a pipeline must make explicit and re-runnable.

flowchart LR
  A["Raw data<br/>(immutable)"] --> B["Cleaning / joins<br/>(scripted)"]
  B --> C["Analysis dataset<br/>(versioned)"]
  C --> D["Models & estimates<br/>(seeded)"]
  D --> E["Tables & figures<br/>(code-generated)"]
  E --> F["Rendered report<br/>(prose + results)"]
  G["Environment<br/>(versions, seeds)"] -.-> B
  G -.-> D
  style A fill:#e8e8e8,stroke:#333
  style F fill:#d8efd8,stroke:#333

Figure 69.2: A reproducible reporting pipeline. Each artifact depends only on declared inputs; re-running from raw data regenerates every downstream table and figure, closing the transcription gap.

69.3.1 The Pillars of Reproducibility

Four practices make Equation 69.2 hold in practice, and each closes a specific leak.

Determinism. Every stochastic step is seeded (set.seed() in R, an explicit RNG seed elsewhere), so that simulations, bootstraps, cross-validation splits, and posterior draws return the same values on re-run. Without seeding, $f$ is not a function and reproducibility is impossible by definition. Every stochastic example in this book is seeded for exactly this reason.

Immutable raw data and scripted transformation. Raw data is never edited in place; all cleaning is code that reads raw inputs and writes derived outputs, so the path from source to analysis dataset is auditable and re-runnable. The analysis dataset becomes a computed artifact, not a hand-curated one.

Environment capture. The language and package versions are recorded (a lockfile, a container, or at minimum a sessionInfo() printed in the appendix), because the same code can give different numbers under different versions. The environment $E$ in Equation 69.2 is part of the report, not background.

Single-source build. The document and the computation are one file; rendering is re-running. This is the practice that eliminates the transcription gap outright rather than merely policing it.

The minimal, self-contained reproducibility footer below records seed and environment so a reader can reconstruct $E$. It is deliberately boring; that is the point.

Code

set.seed(43)            # determinism: re-runs reproduce every stochastic result
cat("R version:", R.version.string, "\n")
#> R version: R version 4.4.3 (2025-02-28 ucrt)
cat("Key packages:\n")
#> Key packages:
for (p in c("dplyr", "ggplot2", "broom")) {
  if (requireNamespace(p, quietly = TRUE))
    cat(sprintf("  %-8s %s\n", p, as.character(packageVersion(p))))
}
#>   dplyr    1.1.4
#>   ggplot2  4.0.0
#>   broom    1.0.10

A subtle but consequential point: reproducibility is a precondition for, not a guarantee of, credibility. A reproducible analysis can still be overfit to researcher degrees of freedom—the many defensible choices of sample, controls, and functional form that, searched over, manufacture significance. The honest defense is to report the specification curve or multiverse: the distribution of the estimate across the reasonable analytic choices, so the reader sees whether the result is a robust feature of the data or an artifact of one path through it. Reproducibility makes the multiverse cheap to compute; reporting it is what turns reproducibility into credibility.

69.4 Academic versus Managerial Reporting

The same study, honestly reported, looks different to the two audiences because they price errors differently (the formalization in Chapter 69’s two-audience discussion above). The differences are systematic enough to tabulate, and the tabulation is itself a reporting decision: it tells an author what to foreground.

Table 69.2: Academic and managerial registers differ in foregrounding, not in honesty. The same estimate and its uncertainty feed both reports.

Element	Academic	Managerial
Primary question	Is the effect real and why?	What should we do, and what is the payoff?
What leads	Theory and contribution	The decision and its $ impact
Uncertainty	Intervals, tests, full disclosure	Risk of the recommended action
Identification	Foregrounded and defended	Stated as a caveat, not derived
Effect units	Standardized / elasticities	Dollars, share points, conversions
Robustness	Extensive, in-text	Summarized; details on request
Length	Long; appendices expected	Short; one-page summary first
Success criterion	Survives peer review & replication	Changes a decision; ROI realized

The managerial register inverts the academic document’s structure. An academic paper builds to its contribution: motivation, theory, data, identification, results, robustness, then implications. A managerial report leads with the answer—the recommendation and its expected return—and relegates method to an appendix, following the inverted-pyramid logic that the busiest reader should extract the decision from the first paragraph. This is not dumbing down; it is re-sequencing for a different loss function. The discipline of translating an estimate into a decision with a quantified payoff is exactly what gives marketing research standing in the firm, and its absence is why analytically sound work so often fails to move resources (Hanssens and Pauwels 2016). The presence of a senior marketing voice in the firm measurably improves performance, which raises the stakes of reporting in a register that voice can use (Germann, Ebbes, and Grewal 2015).

Three translation errors recur when crossing the academic–managerial boundary, and each has a discipline-specific antidote.

First, statistical significance is reported as managerial importance. A precisely estimated but tiny elasticity is “significant” and irrelevant; the managerial report must state the effect in decision units and let the manager judge materiality. Second, a metric is reported without its construct validity. Many managerial dashboards rest on accounting approximations that do not measure what they claim—Tobin’s $q$ proxies and similar shortcuts can mislead precisely because they are reported as if they were the construct of interest, when they are noisy, biased stand-ins for it (Bendle and Butt 2018b, 2018a). Reporting the proxy without flagging the gap transmits a measurement error as a managerial fact. Third, correlation is reported as a lever. A managerial recommendation implicitly claims that acting on a variable will move the outcome—a causal claim—so a report that recommends action on the basis of a correlational estimate must say so plainly and price the risk that the relationship is not causal. The honest managerial report does not hide the identification problem; it states it as the risk attached to the recommendation.

69.5 Dashboards

A dashboard is reporting made continuous and interactive: a live interface that renders current values of a fixed set of metrics, refreshed as data arrives, and typically lets the user filter or drill down. Where a static report answers a question once, a dashboard answers a standing question repeatedly, which changes both its value and its hazards. Its value is monitoring—detecting when a metric leaves its expected range fast enough to act. Its hazard is optimization pressure: a number shown continuously to people whose performance is judged by it will be moved, and not always by improving the underlying reality.

The design problem is to choose a small set of metrics that are decision-relevant, timely, and jointly hard to game, and to present each with enough context that a value is interpretable on sight. A bare number (“conversion: 3.2%”) is nearly useless; the same number against a target, a trend, and a confidence band is a decision input. The reporting principle from Equation 69.1 survives the move to real time: a live point estimate without a sense of its variability invites the user to chase noise, mistaking the ordinary jitter of a ratio for a signal worth acting on.

Figure 69.3 sketches the layering that distinguishes a dashboard from a wall of numbers.

flowchart TD
  S1["Data streams<br/>(sales, web, CRM, spend)"] --> M["Metric layer<br/>(defined, versioned KPIs)"]
  M --> C["Context layer<br/>(target, trend, uncertainty band)"]
  C --> A["Alert layer<br/>(out-of-range triggers)"]
  A --> D["Decision<br/>(act / investigate / ignore)"]
  M -.governance.-> G["Metric definitions<br/>owned & documented"]
  style D fill:#d8efd8,stroke:#333
  style G fill:#fff3cd,stroke:#333

Figure 69.3: Anatomy of a decision-oriented dashboard. Each tier narrows attention: from raw streams, to a few governed metrics, to the small set that triggers action when out of range.

69.5.1 The Metric-Governance Problem

The defining risk of dashboards is captured by a regularity old enough to be folklore and sharp enough to be a design constraint:

When a measure becomes a target, it ceases to be a good measure.

— Goodhart’s law, as commonly stated

The mechanism is mundane. A metric is a proxy for an unobserved goal; agents optimize the proxy; the gap between proxy and goal widens exactly where the incentive bites. A dashboard that surfaces “leads generated” will produce more leads, of falling quality; one that surfaces “average review rating” will produce review-solicitation tactics that move the rating without moving satisfaction. The defense is not to hide metrics but to govern them: pair every gameable metric with a guardrail metric that moves in the opposite direction under gaming (leads and lead-to-sale conversion; rating and response rate), so that gaming is visible in the pairing. Reporting metrics in tension, not in isolation, is the dashboard analogue of reporting an estimate with its uncertainty—both refuse to let a single number stand unchallenged.

Three further governance practices keep a dashboard honest. Definitions are owned and versioned: every metric has a single documented definition and a person accountable for it, because the most common dashboard failure is two teams reading the same label as two different quantities. Latency is disclosed: a metric that is two days stale must say so, or it will be acted on as if current. And the metric set is small: a dashboard that shows everything directs attention to nothing, defeating the monitoring purpose that justified it. The discipline is the same one that governs a good table—report what supports a decision and suppress what merely fills space.

The simulation below makes the guardrail logic concrete. A team is incentivized on a headline metric; over time they game it, lifting the headline while the guardrail quietly deteriorates. A dashboard that showed only the headline would report improvement; one that reports the pair reveals the truth.

Code

set.seed(43)
weeks <- 1:26
gaming <- pmin((weeks - 1) / 25, 1)            # incentive pressure ramps in
leads  <- 100 + 60 * gaming + rnorm(26, 0, 4)  # headline climbs as it is gamed
conv   <- 0.22 - 0.12 * gaming + rnorm(26, 0, 0.01)  # quality erodes in step

dash <- data.frame(
  week  = rep(weeks, 2),
  value = c(leads / max(leads), conv / max(conv)),  # scaled to a common axis
  metric = rep(c("Leads (headline)", "Conversion (guardrail)"), each = 26)
)
library(ggplot2)
ggplot(dash, aes(week, value, linetype = metric)) +
  geom_line(linewidth = 1) +
  labs(x = "Week", y = "Scaled value", linetype = NULL,
       title = "What the headline hides") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

Figure 69.4: Goodhart’s law on a dashboard. Under incentive pressure the headline metric (leads) rises while the guardrail (lead-to-sale conversion) falls. Reporting the pair, not the headline alone, exposes the gaming.

69.5.2 Static Report or Dashboard?

The choice between a one-time report and a standing dashboard is a choice about the question’s tense. A question asked once—“did this campaign lift sales?”—wants a report: it permits depth, identification, and a considered narrative, and it is read once and archived. A question asked continuously—“is the funnel healthy right now?” —wants a dashboard: it trades depth for timeliness because its value is in early detection. Confusing the two is a common and expensive error. A causal question forced onto a dashboard becomes a wall of correlations that invites spurious action; a monitoring question forced into a quarterly report arrives too late to matter. The deeper analyses that justify changing what a dashboard monitors—why a metric moved, whether an intervention worked—remain the province of the static, reproducible report that this chapter began with, because they require the identification and uncertainty machinery a glance-able interface cannot carry.

69.6 Key Takeaways

The unit of an honest report is an estimate with its uncertainty and its specification (Equation 69.1), not a point estimate or a $p$-value alone; the governing principle is sufficiency for second-guessing.
Tables serve lookup (exact, heterogeneous values); figures serve comparison (patterns via position). Encode the result in the most accurately perceived channel, baseline bars at zero, and never erase uncertainty.
Reproducible reporting binds narrative to computation, $R = f(D, C, E)$ (Equation 69.2), eliminating the transcription gap; determinism, immutable data, environment capture, and single-source builds are its pillars. Reproducibility enables but does not guarantee credibility—report the multiverse.
Academic and managerial reports differ in foregrounding, not honesty: the same uncertainty feeds both, but the managerial report leads with the decision and its payoff and translates effects into decision units (Hanssens and Pauwels 2016).
Dashboards make reporting continuous; their governing hazard is Goodhart’s law. Pair gameable metrics with guardrails, govern definitions, disclose latency, and keep the set small. Use a dashboard for standing questions, a reproducible report for causal ones.

69.7 Further Reading

The reproducibility and literate-programming practices in this chapter sit within the broader move of marketing into data-rich, code-intensive analysis (Wedel and Kannan 2016) and the long argument over how to demonstrate marketing’s value to the firm in a register management can act on (Hanssens and Pauwels 2016; Germann, Ebbes, and Grewal 2015). The measurement-validity cautions that dashboards make urgent connect to the construct-and-scale material in Chapter 35 and to documented misuses of accounting-based proxies (Bendle and Butt 2018b, 2018a). The graphical-perception and exploratory-data-analysis traditions that underwrite the figures section, and the Goodhart/metric-governance literature behind the dashboard section, are flagged above as citation gaps to be filled before publication.

Bendle, Neil Thomas, and Moeen Naseer Butt. 2018a. “The Misuse of Accounting-Based Approximations of Tobin’sq in a World of Market-Based Assets.” Marketing Science 37 (3): 484–504.

———. 2018b. “The Misuse of Accounting-Based Approximations of Tobin’s q in a World of Market-Based Assets.” Marketing Science 37 (3): 484–504. https://doi.org/10.1287/mksc.2018.1093.

Germann, Frank, Peter Ebbes, and Rajdeep Grewal. 2015. “The Chief Marketing Officer Matters!” Journal of Marketing 79 (3): 1–22. https://doi.org/10.1509/jm.14.0244.

Hanssens, Dominique M., and Koen H. Pauwels. 2016. “Demonstrating the Value of Marketing.” Journal of Marketing 80 (6): 173–90. https://doi.org/10.1509/jm.15.0417.

Lehmann, Donald R., Leigh McAlister, and Richard Staelin. 2011. “Sophistication in Research in Marketing.” Journal of Marketing 75 (4): 155–65. https://doi.org/10.1509/jmkg.75.4.155.

Wedel, Michel, and P.K. Kannan. 2016. “Marketing Analytics for Data-Rich Environments.” Journal of Marketing 80 (6): 97–121. https://doi.org/10.1509/jm.15.0413.

A $p$-value answers “how surprising is this data under the null?”—not “how big is the effect?” and not “how probable is the hypothesis?” An effect can be precisely estimated and economically trivial, or imprecisely estimated and economically enormous. Reporting effect sizes with intervals, rather than stars, keeps the magnitude in view. The replication and credibility debates in marketing and the adjacent social sciences turn substantially on having confused the two.↩︎
This ordering of elementary perceptual tasks by accuracy is the empirical foundation of statistical graphics: encode the most important comparison in the most accurately perceived channel. A practical consequence is the suspicion of pie charts and 3-D bars: they encode quantity as angle, area, or foreshortened length—exactly the channels the eye reads worst.↩︎

# Reporting Research Results {#sec-reporting} A research result that no one can read, reuse, or act on is not yet a result; it is private knowledge. *Reporting* is the discipline that converts an estimate, a test, or a model into an object that a reader can evaluate and a decision-maker can use. This chapter treats reporting as a first-class methodological problem rather than a clerical afterthought. The choices made here—which numbers reach a table, how an effect is drawn, what is held back, how the whole pipeline can be re-run—determine whether a finding survives scrutiny and whether it changes behavior. Two audiences read marketing research and they read it differently: a **scholarly** audience that wants identification, uncertainty, and the means to replicate, and a **managerial** audience that wants a decision, its expected payoff, and the risk of being wrong. A good report serves both without lying to either. The chapter proceeds from substance to surface. It first fixes the object of communication—an estimate together with its uncertainty—and the principle that governs all honest reporting: show the reader enough to second-guess you. It then treats the two workhorses of static communication, **tables** and **figures**, formally enough to say when each is the right instrument and what makes one misleading. From there it builds the case for **reproducible reporting**, where the document and the computation are one artifact, and gives runnable machinery for it. It contrasts the **academic** and **managerial** registers, which differ not in honesty but in what they foreground. It closes with **dashboards**—live, interactive reporting—and the metric-governance problems they create, because a number shown continuously to a manager is a number that will be optimized, sometimes against the firm's interest. ## What Reporting Communicates The atomic unit of a quantitative report is not a number but an **estimate with its uncertainty**. Let $\theta$ be a population quantity of interest—an elasticity, a treatment effect, a market share, a lift. An estimator $\hat\theta$ computed from data is a random variable; a single report of $\hat\theta$ alone is therefore incomplete, because it suppresses the sampling distribution that tells the reader how seriously to take it. The minimal honest report is the triple $$ \bigl(\hat\theta,\ \widehat{\operatorname{se}}(\hat\theta),\ \text{the model and data that produced them}\bigr), $$ {#eq-report-triple} from which a reader can reconstruct confidence intervals, test statistics, and— crucially—decide whether the precision is adequate for *their* decision, which is rarely the author's decision. Reporting only $\hat\theta$ invites the reader to treat noise as signal; reporting only a $p$-value invites them to treat "significant" as "large," a confusion the field has spent two decades trying to unlearn.[^pvalue] The governing principle is **sufficiency for second-guessing**: a report should contain enough for a competent skeptic to redo the inference, vary the assumptions, and reach their own verdict. [^pvalue]: A $p$-value answers "how surprising is this data under the null?"—not "how big is the effect?" and not "how probable is the hypothesis?" An effect can be precisely estimated and economically trivial, or imprecisely estimated and economically enormous. Reporting effect sizes with intervals, rather than stars, keeps the magnitude in view. The replication and credibility debates in marketing and the adjacent social sciences turn substantially on having confused the two. This reframing has an immediate consequence for what belongs in a report. Three quantities travel together and should be reported together: a **point estimate** (the best single guess), a **measure of dispersion** (standard error, posterior standard deviation, or interval), and a **specification** (the estimator, the controls, the sample). A fourth—an **economic translation** of the estimate into units the audience cares about (dollars, share points, conversions) —is what separates a report that informs from one that merely records. The managerial value of marketing research rests on making that translation explicit and defensible [@hanssens2016]. > A statistical analysis, properly conducted, is a delicate dissection of > uncertainties, a surgery of suppositions. The motto of the statistician should > be *de omnibus dubitandum*—everything is to be doubted. Reporting is where that > doubt is made visible to the reader rather than hidden from them. > > --- after the spirit of the exploratory-data-analysis tradition  ### The Two-Audience Problem, Formalized Reporting is a communication channel, and like any channel it has a sender, a message, and a receiver who decodes under noise (@sec-marketing-finance develops the marketing–finance translation; the information-theoretic framing recurs in the diffusion and word-of-mouth literatures). The sender (the researcher) holds a posterior over $\theta$; the receiver (reader) holds a prior and a **loss function** that prices the consequences of acting on a wrong belief. The two canonical receivers price errors differently: - The **scholarly receiver** is loss-averse over *false claims*. The cost of asserting an effect that is not there (a Type I error that enters the literature) is high and durable, because others will build on it. This receiver wants conservative inference, full disclosure of researcher degrees of freedom, and replicability—precisely so the channel does not transmit noise as if it were signal. - The **managerial receiver** is loss-averse over *bad decisions*. The cost is the expected regret of choosing the wrong action under the estimate, integrated over the estimate's uncertainty. This receiver may rationally act on a noisy estimate if the downside of inaction is worse, and wants the estimate translated into the decision's own currency. These are different loss functions, not different standards of honesty. The same $(\hat\theta, \widehat{\operatorname{se}})$ feeds both; what changes is what is **foregrounded** and what is **relegated to an appendix**. A report that confuses the two—burying the decision under robustness tables for a manager, or suppressing identification concerns to impress a manager—fails the audience it was written for. The sophistication of marketing research has risen precisely as the field learned to hold rigor and relevance simultaneously rather than trading one for the other [@lehmann2011]. ## Tables and Figures Tables and figures are the two static encodings of quantitative results, and they are not interchangeable. The operative distinction is between **lookup** and **comparison**. A table is a lookup structure: it preserves exact values and lets a reader retrieve a specific number ("what was the coefficient on price?"). A figure is a comparison structure: it maps numbers to position, length, or color so the reader's visual system can perceive *relations*—trends, gaps, interactions— that a table forces them to compute by hand. The design rule follows directly: **use a table when the reader needs values; use a figure when the reader needs to see a pattern.** ### When a Table A table is the right instrument when (i) exact magnitudes matter and will be quoted or recomputed, (ii) the entries are heterogeneous in kind (estimates, standard errors, fit statistics, sample sizes) and cannot share a single visual scale, or (iii) the number of items is small enough that the eye can scan them. Regression results are the paradigm case: a reader needs the coefficient, its standard error, and the model's fit, to four meaningful digits, in a form they can transcribe. The canonical estimation table reports, for each specification, the coefficient vector $\hat{\boldsymbol\beta}$, dispersion beneath each estimate, and goodness-of-fit at the foot, with specifications across columns so the reader can trace how an estimate moves as controls are added—the single most informative robustness display, because a coefficient that swings when a plausible control enters is a coefficient that is not identified. A table also imposes obligations. **Precision should be meaningful, not maximal**: reporting eight digits of a quantity estimated to two is a form of false confidence, and rounding to the precision the data support is itself an honesty requirement. Units, sample size, and the estimation window belong in the caption or notes, not in the reader's memory. And tables in a published artifact must be generated by code, never typed, because a hand-transcribed table is a table with an unverifiable error rate. The example below fits two nested specifications and renders a publication-style comparison table directly from the model objects, so the printed numbers are, by construction, the numbers the model produced. ```{r table-setup, message=FALSE, warning=FALSE} library(dplyr) library(broom) library(knitr) library(kableExtra) set.seed(43) # Simulated brand-choice-style data: price, a loyalty cue, and a promotion flag. n <- 600 promo <- rbinom(n, 1, 0.4) loyalty <- rnorm(n) price <- 3 + 0.5 * promo + rnorm(n, sd = 0.4) # promos lower observed price below # True data-generating process for log-sales: log_sales <- 6 - 1.2 * price + 0.8 * loyalty + 0.5 * promo + rnorm(n, sd = 0.5) dat <- data.frame(log_sales, price, loyalty, promo = factor(promo)) m1 <- lm(log_sales ~ price, data = dat) m2 <- lm(log_sales ~ price + loyalty + promo, data = dat) ``` ```{r table-render, message=FALSE, warning=FALSE} fmt <- function(model, label) { # Keep cells plain text: the book renders to HTML, PDF, and EPUB from one # execution, so HTML-only markup (e.g., ) would leak into LaTeX. # "(Intercept)" is renamed because booktabs' \midrule treats a row that # begins with a literal "(" as a malformed trim spec, breaking the PDF. tidy(model) |> transmute(term = sub("^\$Intercept\$$", "Intercept", term), !!label := sprintf("%.3f (%.3f)", estimate, std.error)) } tab <- full_join(fmt(m1, "Sparse"), fmt(m2, "Full"), by = "term") gof <- tibble(term = "Adj. R-squared", Sparse = sprintf("%.3f", summary(m1)$adj.r.squared), Full = sprintf("%.3f", summary(m2)$adj.r.squared)) bind_rows(tab, gof) |> kable(escape = FALSE, align = "lrr", caption = "Log-sales regressions. Standard errors in parentheses. Adding the loyalty cue and promotion control shifts the price coefficient, the first thing a careful reader checks.") |> kable_styling(full_width = FALSE) ``` Notice what the two-column layout buys: the price coefficient is visible *before and after* controls enter, and its stability (or movement) is the report's most honest robustness signal—far more informative than any single starred number. ### When a Figure A figure is the right instrument when the message is a *relationship*: a trend over time, a dose–response curve, an interaction, a distribution, an uncertainty band. The reason is perceptual. Position along a common scale is the visual encoding humans judge most accurately; length and angle are next; area, color saturation, and hue are progressively less accurate.[^cleveland] A figure that encodes its key comparison as position on a shared axis lets the reader perceive the result; one that encodes it as color or area forces them to estimate, badly. The design corollary is that **the variable carrying the result should be encoded by position**, and decoration that competes for that channel should be removed. [^cleveland]: This ordering of *elementary perceptual tasks* by accuracy is the empirical foundation of statistical graphics: encode the most important comparison in the most accurately perceived channel.  A practical consequence is the suspicion of pie charts and 3-D bars: they encode quantity as angle, area, or foreshortened length—exactly the channels the eye reads worst. Three failure modes recur and each has a name and a fix. A **truncated axis**— starting a bar chart's value axis above zero—exaggerates differences by mapping a small ratio onto a large length; bars must baseline at zero because their meaning is length. **Overplotting**—thousands of points collapsing into an ink blob—hides the distribution it purports to show; the fix is transparency, binning, or a density encoding. **Uncertainty erasure**—drawing a point estimate as a clean line with no band—is the graphical analogue of reporting $\hat\theta$ without $\widehat{\operatorname{se}}(\hat\theta)$, and is arguably the most common dishonesty in applied figures, because a clean line *feels* certain. The honest default is to draw the interval, not just the estimate. The figure below makes the last point operational. It plots the model-implied relationship between price and log-sales together with a 95% confidence band, so the reader sees both the estimated slope and how well-pinned it is. ```{r fig-uncertainty, message=FALSE, warning=FALSE} #| fig-cap: "Model-implied price response with a 95% confidence band. The band, not the line, is the honest object: it shows the reader how precisely the slope is identified." #| fig-width: 6 #| fig-height: 4 library(ggplot2) grid <- data.frame(price = seq(min(dat$price), max(dat$price), length.out = 100), loyalty = 0, promo = factor(0, levels = c(0, 1))) pred <- predict(m2, newdata = grid, interval = "confidence") grid <- cbind(grid, pred) ggplot(grid, aes(price, fit)) + geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.2) + geom_line(linewidth = 1) + geom_point(data = dat, aes(price, log_sales), alpha = 0.15, inherit.aes = FALSE) + labs(x = "Price", y = "Log-sales (predicted)") + theme_minimal(base_size = 12) ``` ### Choosing Between Them The choice is not stylistic but functional, and it can be stated as a short decision rule. The table below contrasts the two instruments on the dimensions that actually decide the call. ```{r tbl-table-vs-figure, echo=FALSE, message=FALSE, warning=FALSE} library(knitr); library(kableExtra) data.frame( Dimension = c("Reader's task", "Encodes", "Best when", "Exact values", "Patterns/interactions", "Failure mode"), Table = c("Look up a value", "Symbols (digits)", "Few heterogeneous numbers", "Preserved", "Reader must compute", "False precision; hand-typed errors"), Figure = c("See a relationship", "Position, length, color", "Many values or a trend", "Approximate", "Perceived directly", "Truncated axis; hidden uncertainty"), check.names = FALSE ) |> kable(caption = "Tables versus figures: the choice follows from what the reader must do with the result.") |> kable_styling(full_width = FALSE) ``` A useful heuristic: if you find yourself asking the reader to *compare across rows or columns* of a table, the comparison wants to be a figure; if you find yourself *annotating a figure with exact numbers* the reader will quote, those numbers want to be a table. Many reports need both—a figure for the pattern, a table in the appendix for the values—and that redundancy is a feature, because it serves the two audiences at once. ## Reproducible Reporting The deepest reporting failure is not a bad table or a misleading axis; it is a result no one can regenerate, including its author six months later. **Reproducible reporting** is the practice of binding the *narrative* (prose, tables, figures) to the *computation* (data, code, environment) in a single executable artifact, so that the reported numbers are produced by running the document rather than pasted into it. Formally, let a report be a function $$ R = f(D, C, E), $$ {#eq-repro} where $D$ is the data, $C$ the analysis code, $E$ the computational environment (language and package versions, seeds), and $f$ the build process. A report is **reproducible** when re-evaluating $f$ on the same $(D, C, E)$ returns a byte-identical $R$, and **robust** when small, defensible perturbations to $C$ or $E$ leave the substantive conclusions intact. The two are distinct: reproducibility is about the pipeline, robustness about the science. A result can be perfectly reproducible and entirely fragile. This framing dissolves a category of error that copy-and-paste workflows make nearly inevitable: the **transcription gap**, where the number in the manuscript no longer matches the number the code produces because the data updated, the model changed, or a digit was mistyped. In a literate-programming workflow—the model this book is written in, where prose and `{r}`/`{python}` chunks live in one `.qmd` file and the build executes the code to insert results—the transcription gap cannot open, because there is no transcription. Marketing's move into data-rich, code-intensive analysis makes this non-negotiable: when the analysis is a pipeline of joins, models, and simulations, the only credible report is one the pipeline emits [@wedel2016]. @fig-repro-pipeline lays out the dependency structure such a pipeline must make explicit and re-runnable. ```{mermaid} %%| label: fig-repro-pipeline %%| fig-cap: "A reproducible reporting pipeline. Each artifact depends only on declared inputs; re-running from raw data regenerates every downstream table and figure, closing the transcription gap." flowchart LR A["Raw data (immutable)"] --> B["Cleaning / joins (scripted)"] B --> C["Analysis dataset (versioned)"] C --> D["Models & estimates (seeded)"] D --> E["Tables & figures (code-generated)"] E --> F["Rendered report (prose + results)"] G["Environment (versions, seeds)"] -.-> B G -.-> D style A fill:#e8e8e8,stroke:#333 style F fill:#d8efd8,stroke:#333 ``` ### The Pillars of Reproducibility Four practices make @eq-repro hold in practice, and each closes a specific leak. **Determinism.** Every stochastic step is seeded (`set.seed()` in R, an explicit RNG seed elsewhere), so that simulations, bootstraps, cross-validation splits, and posterior draws return the same values on re-run. Without seeding, $f$ is not a function and reproducibility is impossible by definition. Every stochastic example in this book is seeded for exactly this reason. **Immutable raw data and scripted transformation.** Raw data is never edited in place; all cleaning is code that reads raw inputs and writes derived outputs, so the path from source to analysis dataset is auditable and re-runnable. The analysis dataset becomes a *computed artifact*, not a hand-curated one. **Environment capture.** The language and package versions are recorded (a lockfile, a container, or at minimum a `sessionInfo()` printed in the appendix), because the same code can give different numbers under different versions. The environment $E$ in @eq-repro is part of the report, not background. **Single-source build.** The document and the computation are one file; rendering *is* re-running. This is the practice that eliminates the transcription gap outright rather than merely policing it. The minimal, self-contained reproducibility footer below records seed and environment so a reader can reconstruct $E$. It is deliberately boring; that is the point. ```{r repro-footer} set.seed(43) # determinism: re-runs reproduce every stochastic result cat("R version:", R.version.string, "\n") cat("Key packages:\n") for (p in c("dplyr", "ggplot2", "broom")) { if (requireNamespace(p, quietly = TRUE)) cat(sprintf(" %-8s %s\n", p, as.character(packageVersion(p)))) } ``` A subtle but consequential point: reproducibility is a precondition for, not a guarantee of, credibility. A reproducible analysis can still be **overfit to researcher degrees of freedom**—the many defensible choices of sample, controls, and functional form that, searched over, manufacture significance. The honest defense is to report the **specification curve** or **multiverse**: the distribution of the estimate across the reasonable analytic choices, so the reader sees whether the result is a robust feature of the data or an artifact of one path through it. Reproducibility makes the multiverse cheap to compute; reporting it is what turns reproducibility into credibility. ## Academic versus Managerial Reporting The same study, honestly reported, looks different to the two audiences because they price errors differently (the formalization in @sec-reporting's two-audience discussion above). The differences are systematic enough to tabulate, and the tabulation is itself a reporting decision: it tells an author what to foreground. ```{r tbl-aud, echo=FALSE, message=FALSE, warning=FALSE} library(knitr); library(kableExtra) data.frame( Element = c("Primary question", "What leads", "Uncertainty", "Identification", "Effect units", "Robustness", "Length", "Success criterion"), Academic = c("Is the effect real and why?", "Theory and contribution", "Intervals, tests, full disclosure", "Foregrounded and defended", "Standardized / elasticities", "Extensive, in-text", "Long; appendices expected", "Survives peer review & replication"), Managerial = c("What should we do, and what is the payoff?", "The decision and its $ impact", "Risk of the recommended action", "Stated as a caveat, not derived", "Dollars, share points, conversions", "Summarized; details on request", "Short; one-page summary first", "Changes a decision; ROI realized"), check.names = FALSE ) |> kable(caption = "Academic and managerial registers differ in foregrounding, not in honesty. The same estimate and its uncertainty feed both reports.") |> kable_styling(full_width = FALSE) |> column_spec(1, bold = TRUE) ``` The managerial register inverts the academic document's structure. An academic paper builds to its contribution: motivation, theory, data, identification, results, robustness, then implications. A managerial report **leads with the answer**—the recommendation and its expected return—and relegates method to an appendix, following the inverted-pyramid logic that the busiest reader should extract the decision from the first paragraph. This is not dumbing down; it is re-sequencing for a different loss function. The discipline of translating an estimate into a decision with a quantified payoff is exactly what gives marketing research standing in the firm, and its absence is why analytically sound work so often fails to move resources [@hanssens2016]. The presence of a senior marketing voice in the firm measurably improves performance, which raises the stakes of reporting in a register that voice can use [@germann2015]. Three translation errors recur when crossing the academic–managerial boundary, and each has a discipline-specific antidote. First, **statistical significance is reported as managerial importance.** A precisely estimated but tiny elasticity is "significant" and irrelevant; the managerial report must state the effect in decision units and let the manager judge materiality. Second, **a metric is reported without its construct validity.** Many managerial dashboards rest on accounting approximations that do not measure what they claim—Tobin's $q$ proxies and similar shortcuts can mislead precisely because they are reported as if they were the construct of interest, when they are noisy, biased stand-ins for it [@bendle2018; @bendle2018misuse]. Reporting the proxy without flagging the gap transmits a measurement error as a managerial fact. Third, **correlation is reported as a lever.** A managerial recommendation implicitly claims that acting on a variable will move the outcome—a causal claim—so a report that recommends action on the basis of a correlational estimate must say so plainly and price the risk that the relationship is not causal. The honest managerial report does not hide the identification problem; it states it as the risk attached to the recommendation. ## Dashboards A **dashboard** is reporting made continuous and interactive: a live interface that renders current values of a fixed set of metrics, refreshed as data arrives, and typically lets the user filter or drill down. Where a static report answers a question once, a dashboard answers a standing question repeatedly, which changes both its value and its hazards. Its value is **monitoring**—detecting when a metric leaves its expected range fast enough to act. Its hazard is **optimization pressure**: a number shown continuously to people whose performance is judged by it will be moved, and not always by improving the underlying reality. The design problem is to choose a small set of metrics that are *decision-relevant*, *timely*, and *jointly hard to game*, and to present each with enough context that a value is interpretable on sight. A bare number ("conversion: 3.2%") is nearly useless; the same number against a target, a trend, and a confidence band is a decision input. The reporting principle from @eq-report-triple survives the move to real time: a live point estimate without a sense of its variability invites the user to chase noise, mistaking the ordinary jitter of a ratio for a signal worth acting on. @fig-dashboard-anatomy sketches the layering that distinguishes a dashboard from a wall of numbers. ```{mermaid} %%| label: fig-dashboard-anatomy %%| fig-cap: "Anatomy of a decision-oriented dashboard. Each tier narrows attention: from raw streams, to a few governed metrics, to the small set that triggers action when out of range." flowchart TD S1["Data streams (sales, web, CRM, spend)"] --> M["Metric layer (defined, versioned KPIs)"] M --> C["Context layer (target, trend, uncertainty band)"] C --> A["Alert layer (out-of-range triggers)"] A --> D["Decision (act / investigate / ignore)"] M -.governance.-> G["Metric definitions owned & documented"] style D fill:#d8efd8,stroke:#333 style G fill:#fff3cd,stroke:#333 ``` ### The Metric-Governance Problem The defining risk of dashboards is captured by a regularity old enough to be folklore and sharp enough to be a design constraint: > When a measure becomes a target, it ceases to be a good measure. > > --- Goodhart's law, as commonly stated  The mechanism is mundane. A metric is a *proxy* for an unobserved goal; agents optimize the proxy; the gap between proxy and goal widens exactly where the incentive bites. A dashboard that surfaces "leads generated" will produce more leads, of falling quality; one that surfaces "average review rating" will produce review-solicitation tactics that move the rating without moving satisfaction. The defense is not to hide metrics but to **govern** them: pair every gameable metric with a **guardrail metric** that moves in the opposite direction under gaming (leads *and* lead-to-sale conversion; rating *and* response rate), so that gaming is visible in the pairing. Reporting metrics in tension, not in isolation, is the dashboard analogue of reporting an estimate with its uncertainty—both refuse to let a single number stand unchallenged. Three further governance practices keep a dashboard honest. **Definitions are owned and versioned**: every metric has a single documented definition and a person accountable for it, because the most common dashboard failure is two teams reading the same label as two different quantities. **Latency is disclosed**: a metric that is two days stale must say so, or it will be acted on as if current. And **the metric set is small**: a dashboard that shows everything directs attention to nothing, defeating the monitoring purpose that justified it. The discipline is the same one that governs a good table—report what supports a decision and suppress what merely fills space. The simulation below makes the guardrail logic concrete. A team is incentivized on a headline metric; over time they game it, lifting the headline while the guardrail quietly deteriorates. A dashboard that showed only the headline would report improvement; one that reports the pair reveals the truth. ```{r fig-goodhart, message=FALSE, warning=FALSE} #| fig-cap: "Goodhart's law on a dashboard. Under incentive pressure the headline metric (leads) rises while the guardrail (lead-to-sale conversion) falls. Reporting the pair, not the headline alone, exposes the gaming." #| fig-width: 6.2 #| fig-height: 4 set.seed(43) weeks <- 1:26 gaming <- pmin((weeks - 1) / 25, 1) # incentive pressure ramps in leads <- 100 + 60 * gaming + rnorm(26, 0, 4) # headline climbs as it is gamed conv <- 0.22 - 0.12 * gaming + rnorm(26, 0, 0.01) # quality erodes in step dash <- data.frame( week = rep(weeks, 2), value = c(leads / max(leads), conv / max(conv)), # scaled to a common axis metric = rep(c("Leads (headline)", "Conversion (guardrail)"), each = 26) ) library(ggplot2) ggplot(dash, aes(week, value, linetype = metric)) + geom_line(linewidth = 1) + labs(x = "Week", y = "Scaled value", linetype = NULL, title = "What the headline hides") + theme_minimal(base_size = 12) + theme(legend.position = "bottom") ``` ### Static Report or Dashboard? The choice between a one-time report and a standing dashboard is a choice about the *question's tense*. A question asked once—"did this campaign lift sales?"—wants a report: it permits depth, identification, and a considered narrative, and it is read once and archived. A question asked continuously—"is the funnel healthy right now?" —wants a dashboard: it trades depth for timeliness because its value is in early detection. Confusing the two is a common and expensive error. A causal question forced onto a dashboard becomes a wall of correlations that invites spurious action; a monitoring question forced into a quarterly report arrives too late to matter. The deeper analyses that justify *changing* what a dashboard monitors—why a metric moved, whether an intervention worked—remain the province of the static, reproducible report that this chapter began with, because they require the identification and uncertainty machinery a glance-able interface cannot carry. ## Key Takeaways - The unit of an honest report is an estimate *with its uncertainty and its specification* (@eq-report-triple), not a point estimate or a $p$-value alone; the governing principle is sufficiency for second-guessing. - **Tables** serve lookup (exact, heterogeneous values); **figures** serve comparison (patterns via position). Encode the result in the most accurately perceived channel, baseline bars at zero, and never erase uncertainty. - **Reproducible reporting** binds narrative to computation, $R = f(D, C, E)$ (@eq-repro), eliminating the transcription gap; determinism, immutable data, environment capture, and single-source builds are its pillars. Reproducibility enables but does not guarantee credibility—report the multiverse. - **Academic and managerial** reports differ in foregrounding, not honesty: the same uncertainty feeds both, but the managerial report leads with the decision and its payoff and translates effects into decision units [@hanssens2016]. - **Dashboards** make reporting continuous; their governing hazard is Goodhart's law. Pair gameable metrics with guardrails, govern definitions, disclose latency, and keep the set small. Use a dashboard for standing questions, a reproducible report for causal ones. ## Further Reading The reproducibility and literate-programming practices in this chapter sit within the broader move of marketing into data-rich, code-intensive analysis [@wedel2016] and the long argument over how to demonstrate marketing's value to the firm in a register management can act on [@hanssens2016; @germann2015]. The measurement-validity cautions that dashboards make urgent connect to the construct-and-scale material in @sec-measurement-scales and to documented misuses of accounting-based proxies [@bendle2018; @bendle2018misuse]. The graphical-perception and exploratory-data-analysis traditions that underwrite the figures section, and the Goodhart/metric-governance literature behind the dashboard section, are flagged above as citation gaps to be filled before publication.