> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dema.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Methodology: from raw orders to lift

> A walkthrough of how Dema turns raw orders into an incrementality result — the level inference runs at, how regional data is aggregated, how each metric is measured, and why some numbers can look different depending on where you read them.

<Info>
  This page explains *how* Dema computes an incrementality result, end to end. It
  answers three questions customers ask most often: **at what level is inference
  applied**, **how is regional data aggregated**, and **why can numbers differ
  depending on the metric or the chart you're looking at**.
</Info>

## From raw orders to a result

Every incrementality result is produced by the same pipeline. Understanding the
steps makes the outputs — and any apparent discrepancies — much easier to reason
about.

<Steps>
  <Step title="Orders are located">
    Each order is mapped to a geographic location from its delivery or billing
    postal code. Postal codes are normalized first (for example, UK postcodes
    are reduced to the outward code) so that every order lands in a consistent
    geographic unit. There is also an option to map the orders by the location
    of their placement, which, in some cases, may improve the model quality.
  </Step>

  <Step title="Locations are grouped into zones">
    Postal codes are clustered into **zones** — commute zones, DMAs, or the
    regional units appropriate for your market. Zones reflect real-world
    shopping behavior (people shop across city boundaries), so they make better
    treatment and control units than rigid administrative lines.
  </Step>

  <Step title="Data is aggregated by zone and period">
    Within each zone, orders are summed into periods for the metric being
    measured — **daily by default**, with a weekly option. The result is a
    balanced panel: one value per zone per period. Periods with no orders in a
    zone are filled with zero so the time series is continuous.
  </Step>

  <Step title="A synthetic control is built">
    The treatment zones are combined into a single treated series. Dema then
    builds a **synthetic control**: a weighted blend of the remaining (control)
    zones chosen to track the treated series as closely as possible during the
    pre-test period. See [How incrementality testing
    works](/guides/incrementality-testing/how-it-works) for the intuition behind
    synthetic control.
  </Step>

  <Step title="Lift is measured">
    During the test, lift for each time unit is the gap between what the treatment
    zones actually did and what the synthetic control predicts they *would* have
    done without the change. Summed over the test, that gap is your
    **incremental value**; expressed against spend it becomes your incremental
    ROAS / epROAS / CAC.
  </Step>
</Steps>

## At what level inference runs

Inference always runs at the **geographic-zone level**, on data aggregated by
period (daily by default, with a weekly option). It does **not** run at the
individual-user or individual-campaign level.

This is deliberate. A geo experiment compares whole regions, so it captures the
*total* effect of a marketing change — including effects that user-level platform
studies miss (people who were influenced but never clicked, cross-device journeys,
and offline or word-of-mouth spillover). For why this is more trustworthy than
platform-reported, user-level lift studies, see [Platform lift
studies](/guides/incrementality-testing/platform-lift-studies).

## Each metric is measured independently

When you switch the metric dropdown (Gross Sales, Net Sales, Net Gross Profit 2,
New / Returning Customer profit, New Customer Count), Dema does **not** re-scale a
single shared result. Each metric is measured by its **own, independent synthetic
control model**, fitted on that metric's own zone-period panel.

<Note>
  Because each metric is a separate model, the lift, confidence interval, and
  p-value for one metric do **not** mechanically follow from another. It is normal
  and expected for, say, Gross Sales and Net Gross Profit 2 to show different
  magnitudes — and occasionally different signs — for the same test. A profit
  metric can move differently from a revenue metric because returns, discounts,
  and margins behave differently across regions. This is a feature of measuring
  what you actually care about, not an inconsistency in the data.
</Note>

## How significance is established

A measured gap between treatment and control is only meaningful if it is unlikely
to have happened by chance. Dema establishes this with a **randomization
(permutation) test** rather than a textbook formula that assumes a particular data
distribution:

* Dema repeatedly reshuffles the timing of the observed differences to simulate
  what "no real effect" would look like for your specific data.
* The **p-value** is the share of those random arrangements that look at least as
  extreme as your actual result. A low p-value means a gap this large rarely
  appears by chance.
* The **confidence interval** shown on the charts expresses the same uncertainty
  as a range around the estimate. See [Understand test
  results](/guides/incrementality-testing/understand-test-results#what-is-a-confidence-interval)
  for how to read it.

The quality of the underlying match is reported separately, so you can judge
*before* trusting a result whether the synthetic control was a good fit. Those
diagnostics are covered in [Analyze a suggested
experiment](/guides/incrementality-testing/analyze-suggestion#model-quality-diagnostics).

## Why numbers can look different

It's common to compare two views and notice they don't tie out exactly. In almost
every case this traces to one of the following — none of which means a result is
wrong.

### The chart is a focused view; the model learns from much more

The treatment and control lines on the results charts are a **zoom around the test
window** — they show roughly the test period plus a short lead-in, because that's
the part you want to inspect. The synthetic control model itself is fitted on a far
longer stretch of history (on the order of a year) so it can learn each zone's
seasonality and trend.

So if you sum the values visible on the chart, you are summing the *display
window*, not the full history the model used. The headline incremental value is
computed by the model over the test window using that longer-trained baseline — it
is not meant to equal a hand-sum of the visible chart points.

### Results settle for a few days after the test

Order data is not final the moment an order is placed. Refunds, cancellations,
late-arriving conversions, and geographic enrichment all continue to settle for a
few days. As a result:

* **Pre-test history is stable** — it has long since finalized, so the model's
  baseline and its fit quality barely move between runs.
* **The most recent test/post-test days keep adjusting** as data finalizes, which
  is why the incremental value and ROAS can shift slightly if you re-open a result
  immediately after the test versus a few days later.

<Tip>
  If you need to compare two numbers exactly, compare results computed at the same
  time and over the same window. Reading the headline metric a few days after the
  test period closes gives the most stable figure.
</Tip>

### Aggregation cadence

Inference aggregates orders into periods (daily by default, or weekly). If you
compare against a report built on a different cadence, totals can differ slightly
even though they describe the same underlying orders.

### Each metric is its own model

As covered [above](#each-metric-is-measured-independently), switching the metric
re-fits a separate model. Differences between metrics are expected, not a sign of
instability.

<Note>
  For any result you're unsure about, the fastest check is the model-quality
  diagnostics on the experiment — they tell you directly whether the synthetic
  control was a trustworthy match. See [Analyze a suggested
  experiment](/guides/incrementality-testing/analyze-suggestion#model-quality-diagnostics).
</Note>
