Restaurant Demand Forecasting. From Spreadsheets to Production

March 10, 2025

Problem

A restaurant chain was planning prep and ordering using spreadsheets and gut feel. Waste was high and stockouts happened on busy days. We needed a system that: (1) predicted demand (by category or key items) for the next 1–3 days, (2) ran daily with minimal manual work, (3) integrated with existing POS and inventory data, and (4) produced outputs that managers could use (prep lists, suggested order quantities) without needing to understand the model.

Architecture

Historical data (sales, events, day of week, weather if available) is aggregated to daily or meal-period level. Features are built in a batch job; the model (gradient boosting) is trained periodically (e.g. weekly) and used for inference in a daily run. Predictions are written to a store that the ordering/prep UI and reports consume via a simple REST API. No real-time inference, batch is sufficient and keeps the system simple.

Flow: POS/Inventory → Aggregate → Features → Model (train weekly, infer daily) → Predictions → Prep/Order UI.

Tech stack

Data: POS and inventory exports (CSV/API) pulled into a staging DB or files; Python + Pandas for aggregation and feature engineering. Day of week, lagged sales, rolling averages, and (where available) events or weather.
Model: LightGBM for regression (demand per category or item). Trained weekly on a sliding window (e.g. last 12 months); inference run daily for the next 1–3 days.
Scheduling: Cron or a lightweight scheduler for daily batch (aggregate → features → predict → write). Training on a weekly schedule.
Serving: Predictions stored in DB or blob store; REST API for the prep/order UI to fetch by location and date. No real-time serving required.

Tradeoffs

Granularity vs. noise: Item-level prediction was noisier; category-level was more stable. We shipped category-level first and added item-level for a subset of high-volume items once we had enough history.
Freshness vs. cost: Daily batch was enough for prep and ordering; real-time would have added complexity without clear benefit. We kept the pipeline simple.
Override and trust: Managers could override predictions. We made overrides visible and fed them back (as “actuals”) for future training so the model could learn from corrections.

Metrics

Accuracy: MAPE (mean absolute percentage error) < 18% on holdout; varied by category and location.
Runtime: Daily batch (aggregate + features + predict) under 5 minutes; weekly training under 30 minutes.
Adoption: Rolled out to 40+ locations; prep and order suggestions used daily. Qualitative feedback: less waste, fewer stockouts on peak days.

Lessons

Start with category-level. Item-level forecasting is harder and noisier; category-level gave most of the value with less complexity.
Data quality is the bottleneck. Aligning POS and inventory data, handling missing days, and defining “demand” (sold? requested?) took more time than the model. Invest in the pipeline first.
Human override is a feature. Let managers adjust; surface overrides and use them as signal. The system is decision support, not autopilot.
Simple models first. LightGBM with a small feature set beat our early attempts with fancier models. Ship simple, iterate with data.