Forecasting Demand with (un)Common Logic

Posted on 2026-04-10 04:55:03

Reliable demand forecasts are not born from clever math alone. They come from teams that ask precise questions, clean messy inputs, and accept that uncertainty never fully yields. Forecasting lives at the intersection of statistics, business rhythm, and operational constraint. The logic looks obvious on a whiteboard, then promptly falls apart when holiday calendars shift, a channel runs out of stock, or a promotion pulls volume forward and poisons next month’s baseline. The uncommon part of demand planning is not the model, it is the discipline to build a system that keeps learning.

I have yet to see a forecasting challenge that failed purely because the chosen algorithm was imperfect. More often, the trouble started earlier: sales data did not align to the right sell-in or sell-through definition, returns were netted at the wrong time, promotions were logged after the fact with coded descriptions, or price changes were stored in a spreadsheet tab hidden behind a password. When you clear those flags, even a plain model will perform decently. When you ignore them, even a sophisticated one will look silly.

The trap of averages and the myth of one number

Many teams aim for a single number per SKU per week. It feels decisive. But a single point hides risk. The retailer that orders to a point forecast gets burned in volatile weeks. The manufacturer that staffs to a point forecast spends overtime when the high side hits and eats idle time when the low side shows up. Good forecasts respect spread. The next logical step is to generate a distribution, not just a mean, and to tie inventory and staffing decisions to service levels. When a team moves from a single weekly number to a P70 and P90 band, planners stop debating whose number is “right” and start discussing outcomes.

Averages also mask shape. Consider a seasonal business where average weekly demand is 1,000 units. Across 52 weeks, it peaks at 2,500 for six weeks, hovers around 1,200 for twenty weeks, then droops to 400 for the remainder. Planning around 1,000 units dilutes the very edge cases that hurt the most. Be explicit about the weeks that make or break the year.

What actually moves demand

Demand responds to a bundle of causes that often pull in opposite directions. Some are slow and structural, others are sharp and episodic. The trick is to isolate them enough to model and monitor them separately.

Price elasticity sits at the core. If you raise price by 5 percent, do you expect unit demand to fall by 1 percent, 3 percent, or 8 percent? The answer varies by category, channel, and customer segment. It is also asymmetric. Unit demand tends to react more strongly to price increases than to equivalent decreases, especially when substitutes are easy to find.

Promotions matter, but not just through uplift. They create pull-forward and post-promo dips. A calendar packed with discounts misleads naive models into seeing a higher baseline. If your system treats every spike as a permanent level shift, it will keep overshooting after the promotion and get “fixed” with manual overrides that hide the core issue.

Availability drives demand more than many admit. Stockouts depress reported sales whether or not customers buy the product elsewhere. A top seller that is unavailable for three days can halve the week’s sales, then show a compensating spike next week as inventory returns and backorders flow. Without a reliable indicator for stock status, your model learns the wrong lessons.

Competitor actions, delivery fees, and shipping thresholds sway behavior as well. A region that suddenly offers free next-day shipping will steal share from standard options even if your price stands still. Marketing reach, creative quality, and channel mix tilt conversion long before the shopper hits a cart.

Macro factors and weather show up, though less consistently than people think. Ice melt sells in cold snaps, but so do comfort foods and pipe insulation. Heat waves push air conditioner demand, then strain installation capacity, which caps sales independent of intent. Filtering out one-off meteorological events from real seasonal shape takes patience and several years of aligned data.

Put the calendar and granularity right before anything else

I once walked into a forecasting review and found three teams arguing about accuracy. Each used a different calendar. Finance closed on a 4-4-5 retail calendar, supply chain tracked ISO weeks, and marketing ran monthly. Their models were fine, but their dates did not line up. Reconciliation devoured hours. When they aligned to a common calendar and pushed all data into one grain, MAPE dropped without a single model change. Calendar debt is real and expensive.

Granularity cuts two ways. Finer grain allows you to pick up dynamics like weekday shape or intra-month push. Yet too fine, and noise swamps signal. A good rule is to model at the grain where your decisions happen, not where your data is available. If your production slots are weekly, model weekly and only use daily data as features if it consistently improves weekly forecasts. If your e-commerce promotions flex hour by hour, keep a separate short-term layer for intraday shape that flows into your weekly plan rather than trying to make one model do both jobs.

Hierarchy matters too. You sell items into stores, clusters, regions, and countries. You manage to categories, brands, and lines. Forecast accuracy tends to climb as you aggregate, while usefulness often lives in the detail. You want a system that honors both. Bottom-up, top-down, and middle-out approaches each have trade-offs. Bottom-up captures item-level quirks but can be fragile for slow movers. Top-down is stable but can hide mix shifts. Reconciliation methods like MinT or Bayesian hierarchical models blend information so that child and parent sums align while preserving as much signal as possible.

The dull work of clean features beats dazzling algorithms

You cannot model what you cannot see. Key features that should be painstakingly curated rather than casually inferred include:

A definitive promotion calendar with type, depth, mechanic, and expected uplift. Treat it like a product, not a file. Without type codes, your model cannot learn that BOGO behaves differently from 20 percent off, or that a display endcap decays more slowly than a search ad. Net price the customer sees, not the list price, including fees, shipping thresholds, and discounts. Many teams model price as a single field and wonder why lift curves wobble. Stock availability flags and lost sales estimates. If you are blind to outages, you will systematically understate demand during constrained periods. Competitor price indices, even if approximate. A simple ratio of your price to a basket of alternatives does more than an absolute price line. Channel-level traffic and conversion, preferably at weekly cadence. For online sales, sessions and conversion rate explain more variance than you might think. For retail, footfall proxies and basket size help.

None of this sounds glamorous. It pays. I have watched a model’s WAPE improve by ten points after the team fixed net price and promotion coding, before any change to the algorithm.

Causal, time series, or both

Pure time series methods like exponential smoothing and ARIMA variants handle seasonality and trend with elegance. They work well when demand is steady and promotions are rare. Causal or machine learning models shine when exogenous variables drive a large share of the variance. The sweet spot for many businesses is a hybrid. Decompose demand into a baseline and event layers, fit a robust time series to the baseline, then superimpose causal effects for price, promotions, and media. Let the error terms teach you where the structure is incomplete.

Avoid the temptation to feed every available feature into a complex model and declare victory. High dimensional setups often chase noise, inflate variance, and rot when inputs drift. Parsimony forces clarity. Keep the backbone simple, then add features deliberately and monitor their incremental value over rolling windows.

The promotion problem and its quiet aftershocks

Promotions create artificial mountains and valleys. Two pitfalls show up repeatedly. First, models confuse the uplift with a true shift in baseline, especially if the promotion repeats in a similar week each year. Second, planners overestimate the halo and cannibalization effects. A deep discount on a 12-pack will spike that SKU, but will it pull volume from the 6-pack enough to reduce category volume? Or did you just accelerate purchases, leaving the next week thin?

A practical approach is to tag promotions as exogenous events, fit uplift coefficients by type and depth at the right level of aggregation, and force post-event decay terms. When a brand ran an every-third-week discount cadence, their baseline began to drift upward in naive models, then sagged without the promo. After they layered a learned decay that tapered 50 to 80 percent of uplift over a 2 to 3 week range, the baseline stabilized and planners stopped panic-overriding.

Cross-elasticity is tricky but important. Introduce it only where products are clear substitutes and keep it sparse. For many categories, a small set of cross effects at the brand or size level captures most of the action without exploding complexity.

Intermittent and long-tail demand deserves its own toolkit

A catalog with thousands of SKUs will have a fat head and a long tail. Tail items are intermittent. You can go weeks with zeros, then ship a batch of 30 units to a single customer. Classical methods that assume normal error behave badly here. Tools like Croston’s method, SBA adjustment, or newer bootstrapped intermittent models perform better because they separate the size of a demand event from the time between events.

For spare parts, industrial supplies, or specialty SKUs, err toward probabilistic forecasts and inventory policies that target service levels explicitly. If item XYZ sells four times a year in bursts of 5 to 20 units, treat it like a Bernoulli arrival with a size distribution. Safety stock on a normal approximation will fool you in both directions.

Reconcile the hierarchy without breaking signals

Once you have reasonable item-level and aggregate forecasts, you need them to add up. Reconciliation ensures that the store forecasts roll to the region, regions to the country, and items to categories. Simple proportional scaling back to parent totals is blunt and often damages well-performing children. Statistical reconciliation methods distribute adjustments based on historical covariance, effectively preserving stronger signals and nudging weaker ones. The results feel more natural to planners, which reduces the urge to “fix” the numbers by hand.

Forecast distributions, not just means

Operations live in the tails. The difference between the 50th and 90th percentile of demand for a promoted week might be twice the gap in a normal week. Your planning system should generate and store forecast distributions or, at the very least, prediction intervals. Quantile regression, bootstrapping residuals, or Bayesian models can produce these bands. With percentiles in hand, safety stock becomes a business choice: pick a target service level by item class, then compute required buffers given lead time and forecast error. When a retailer switched to percentile-based planning, they cut stockouts on A items while reducing overall inventory by a mid-single digit percent because they stopped hoarding on C items with wide but inconsequential uncertainty.

Measure accuracy in ways that promote learning

MAPE is a blunt instrument. It over-penalizes misses on small denominators and can be gamed by sandbagging. WAPE and bias complement it. If you stock to a service level, the weighted absolute percentage error aligns better with cost, and bias tells you whether you are consistently over or under. Segment accuracy by item velocity and margin. It is often acceptable to have higher error on C items if your inventory policy reflects that tolerance. Time-align the measurement window with lead times. A one-week-ahead forecast has different value than a twelve-week-ahead forecast.

A good practice is to host a monthly forecast review that asks three questions. Where did we miss most on absolute dollars? Where are we persistently biased? Where did the shape change from what we had learned, and why? The output should be defect fixes such as a mis-coded promotion type, not finger-pointing at a single bad week.

Human judgment as a feature, not a panic button

Judgment matters. A top account sends an email hinting at a reset. A competitor’s plant goes offline. A weather forecast leans toward a hurricane track. Models do not see these fast enough. Build a structured override mechanism where planners can apply annotated adjustments with expected duration and magnitude. Force a sunset date and require a reason code. Feed those overrides back into the feature store as candidate signals if they recur.

Here is where (un)Common Logic earns its place. The common logic says accept the model where it performs and override the rest. The uncommon logic says design the override to teach the model, not silence it. Treat human insight as data. If the sales team calls for a 15 percent lift in the Southeast for eight weeks due to a regional display program, log it, track its realization, and let the system learn whether similar programs in the future warrant a prior of 10, 15, or 20 percent. Over time, human judgment migrates from ad hoc nudges to codified signals.

Scenario planning beats perfect precision

No model will predict the exact path of demand during a supply disruption or a viral social trend. You can still prepare to make better decisions. Build a small set of coherent scenarios that stress the assumptions that matter: lead times, substitution rates, promotion cadence, and channel mix.

A beverage brand once laid out three scenarios for a summer heat wave, each with a distribution by region and week, plus constraints from bottling and trucking. When the heat arrived, they were wrong on magnitude but right on ordering. They pulled forward packaging, flexed co-packers, and allocated by projected margin contribution rather than last year’s share. Their service level dipped for two weeks instead of six.

Signals can speed your movement between scenarios. Web search trends, add-to-cart rates, and competitor stock status provide early warnings. Calibrate their thresholds with backtests. Avoid the trap of treating every blip as actionable; you want a small number of robust triggers.

From forecast to action: inventory, capacity, and service

A forecast that lives in a slide deck does not change outcomes. The forecast should feed inventory targets, staffing schedules, production plans, and procurement. Tie each decision to a forecast horizon. A 26-week horizon informs long-lead packaging orders. A 12-week horizon sets production plans. A 2-week horizon governs labor and logistics. Each horizon has different accuracy, so each decision should use different percentiles or buffers.

Translate service levels into cost and margin. An extra point of service on a high-margin A item https://www.uncommonlogic.com/contact-us/ pays for more safety stock than the same point on a low-margin C item. Use a simple value segmentation like ABC by margin dollars and an uncertainty segmentation like XYZ by coefficient of variation, then set differentiated policies. You do not need a complicated matrix. You need a clear rule that says what you will protect and what you will risk.

Do not forget capacity. If your plant can swing only 15 percent week to week, a forecast that hops by 40 percent is of little operational use. Apply smoothing or freeze windows where appropriate. If the demand curve calls for more than you can produce, face that gap early and allocate with intention. Nothing frustrates teams more than scrambling in the last two weeks to rebalance orders they could have shaped with promotions or pricing months earlier.

A short field story about deny, detect, and decide

A consumer electronics company launched a variant with a new colorway. Demand surprised to the upside on launch week, then fell off a cliff. The initial model treated the spike as the new baseline. Warehouses filled. Weeks later, the team flipped to manual overrides, but they disagreed on how fast the novelty would fade. Then came returns, which posted with a lag and corrupted net sales for a month.

What worked was simple. They separated sell-in from sell-through, tracked returns in a different stream, and excluded the first two weeks from baseline estimation. They introduced a decay curve on launch events calibrated from prior colorways, which suggested 60 to 70 percent of the initial lift would evaporate in four weeks. They set inventory targets to the P60, not the mean, for eight weeks while they learned. The result was not perfect, but it avoided another two months of overproduction. The uncommon logic was not the model choice. It was the refusal to chase the first spike and the discipline to codify learning for the next launch.

Two compact tools you can apply this quarter

Checklist for data and feature hygiene before modeling:

Align on a single business calendar and grain across teams, with explicit time zone and week close rules. Build a canonical promotion table with type, depth, mechanic, and start and stop times, and keep it versioned. Store net transaction price per SKU and channel, including discounts, fees, and shipping thresholds, not just list price. Record stock availability and estimated lost sales where possible, with clear flags for constrained periods. Capture competitor price indices and traffic or conversion proxies at the same grain as the forecast.

A stepwise path to a more robust forecasting system in six sprints:

Sprint 1: Clean and align the calendar, net price, and promotion tables, then re-baseline a simple seasonal model to establish a fresh benchmark. Sprint 2: Layer structured promotion effects with post-event decay, and implement a lightweight override mechanism with reason codes and sunset dates. Sprint 3: Introduce probabilistic outputs, at least P50, P70, and P90, and connect these percentiles to inventory targets by item class. Sprint 4: Reconcile forecasts across the product and geography hierarchy using a statistical method, and publish both child and parent views. Sprint 5: Segment items into ABC by margin dollars and XYZ by variability, then assign differentiated service levels and safety stock rules. Sprint 6: Stand up a monthly forecast review ritual focused on error by dollars, bias, and shape changes, and feed recurring overrides back into the feature store.

Judgment, humility, and the habit of postmortem

Forecasting rewards teams that treat misses as data. After a quarter, pick three large deviations and dissect them. Was the cause an input defect, a modeling miss, a process gap, or a late decision elsewhere in the chain? Write down what you will change. Feed the change into the system. The hardest part is resisting the urge to personalize the miss. The second hardest is avoiding silver bullets that promise universal fixes.

I prefer the frame of deny, detect, and decide. Deny bad inputs from entering the system by hardening the data contracts. Detect shifts with monitors on baseline, uplift, and residual variance. Decide with explicit policies tied to service levels, capacity, and margin. A forecast is not a prophecy. It is a disciplined starting point for action.

That is the spirit behind forecasting with (un)Common Logic. Common logic says get a better algorithm. Uncommon logic says start by making hidden assumptions explicit, then make uncertainty visible, then make decisions that respect both. When you do, the numbers begin to tell the truth, and the business gets quieter in the best possible way.