What is Bayesian hierarchical modeling for marketing forecasts?
Bayesian hierarchical modeling treats each client's effect as a draw from a shared population distribution. Two extremes bracket the design space: complete pooling (all clients share one effect) and complete separation (each client is modeled in isolation). Hierarchical modeling sits in between — partial pooling — and the data decides where on the spectrum each client lands.
For NoptiK, the simplest version of the model looks like this:
θi ∼ Normal(μ, τ²)
μ ∼ Normal(0, 1) τ ∼ HalfNormal(1)
Three properties matter. First, partial pooling shares strength across clients where data is sparse — a brand-new client benefits immediately from the agency's portfolio history. Second, full separation kicks in where data is rich — established clients with their own pattern aren't dragged toward the population mean. Third, heteroscedastic likelihoods mean noisy measurements don't pretend to be clean ones, which matters in marketing where data quality varies wildly across platforms and verticals.
Inference is built on NumPyro — NUTS sampling for production runs, Pathfinder VI for warm starts and rapid iteration. Both are deterministic given fixed random seeds.
What is conformal prediction calibration?
Bayesian credible intervals reflect the model's beliefs about uncertainty. They are calibrated only if the model is correctly specified. In practice, marketing models are misspecified — the world is non-stationary, the data-generating process drifts, and the model's posterior may be over- or under-confident.
Conformal prediction is a distribution-free framework that wraps any underlying model and produces prediction intervals with guaranteed marginal coverage:
NoptiK's calibration layer is built on split conformal prediction with a hybrid weighting scheme that combines mild time decay (recent observations weighted slightly higher) and moment-based distributional similarity (observations from regimes similar to the current one weighted higher). Geometric time decay alone is insufficient for marketing data — old observations from a similar past regime can be more relevant than recent observations from a different regime. The hybrid scheme handles both.
The coverage guarantee is what enables the calibration plot. Coverage will be measured on held-out data — whether the 90% interval actually contains the truth 90% of the time. If it doesn't, that's a bug, not a footnote.
How does NoptiK detect regime shifts?
Marketing data is non-stationary. Platform algorithms change. Audiences saturate. Macro conditions shift. Creatives fatigue. Forecasting through these regime changes with a stale prior gives confident wrong answers — the worst possible failure mode.
Bayesian online change-point detection (BOCPD) maintains a probability distribution over the time since the last regime change. As evidence accumulates that a change has occurred — observations that are improbable under the current regime — the posterior probability of "we just had a change-point" rises. When it crosses a threshold, the pipeline re-anchors the prior and widens the interval.
The hazard parameter — the prior probability of a regime change at any time step — will be calibrated against labeled out-of-sample events: a manually curated registry of real regime changes (verified platform changes, documented macro events) versus data artifacts and transient promotions. Building that registry is part of v1 work. The calibration itself is designed to be audit-traceable end-to-end.
Calibrated honestly, this means the forecast widens when the world has changed, not just when the data is noisy. Distinguishing these two cases is non-trivial and is one of the harder pieces of the system to get right.
How does the hive-mind work without leaking client data?
Three architectural commitments enforce tenant isolation:
Database-layer isolation. The architecture provisions each agency tenant with its own Postgres schema. Row-level security policies are enforced at the database layer, not at the application layer. There is no application code path that can query across tenants.
Aggregation-layer privacy. The hive-mind doesn't pool raw observations. It pools aggregated posterior parameters — population-level effects, hierarchical hyperpriors — that pass through a differentially-private aggregation layer with bounded epsilon. The DP guarantee is mathematical: no individual record can be reconstructed from the shared posterior parameters.
Audit trail. Every cross-tenant flow is logged. Every aggregation step records the epsilon budget consumed. Every model update is reproducible from the audit trail.
The DP aggregation layer and the full audit trail are part of active development. The tenant provisioning, schema isolation, and row-level security are the foundation we are building on. If you need a deeper read on the privacy architecture — specific epsilon values, the noise mechanisms used, current implementation state — we provide it under NDA during the design partner onboarding.
What does byte-exact reproducible actually mean?
Given the same inputs, NoptiK produces bit-identical outputs. This is a strong claim and requires four things:
Deterministic inference. NumPyro with fixed seeds for both NUTS and Pathfinder VI. No nondeterministic GPU operations in the prediction loop.
Frozen feature extraction. Language and vision models for creative feature extraction run with frozen DSPy prompts (optimized via MIPROv2 and GEPA, then committed). No live prompt tuning at inference time.
Complete audit-trail logging. Inputs, priors, posterior samples, conformal calibration data, and feature extractions are all logged. A forecast can be replayed end-to-end from the audit trail alone.
CI tests for determinism. Every commit runs a determinism test that re-executes a fixed forecast and asserts byte-exact output match against a committed reference.
Honest disclosure: we are pre-launch. The architecture supports byte-exact reproducibility; the CI enforcement is being built. We will not claim shipped what isn't shipped. Ask us about current state during onboarding.
References
Jin, Y., Wang, Y., Sun, Y., Chan, D., Koehler, J. (2017). Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects. Google Inc.
Google Research →Ng, E., Wang, Z., Dai, A. (2021). Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling.
arXiv:2106.03322 →Vovk, V., Gammerman, A., Shafer, G. (2005). Algorithmic Learning in a Random World. Springer.
Springer →Romano, Y., Patterson, E., Candès, E. (2019). Conformalized Quantile Regression.
arXiv:1905.03222 →Adams, R. P., MacKay, D. J. C. (2007). Bayesian Online Changepoint Detection.
arXiv:0710.3742 →Piironen, J., Vehtari, A. (2017). Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.
arXiv:1707.01694 →Zhang, L., Carpenter, B., Gelman, A., Vehtari, A. (2022). Pathfinder: Parallel Quasi-Newton Variational Inference.
arXiv:2108.03782 →Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2013). Bayesian Data Analysis, 3rd ed. CRC Press.
stat.columbia.edu →Dwork, C., Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in TCS.
PDF →