Ai26.10 — Grape-Pomace Fermentation: Science & Algorithm Documentation

This is the science and modeling foundation under Project Ai26.10. The Fermentation Data workbook is Crush Dynamics' historical record of sixteen production batches (“totes”) of its patented grape-pomace biotransformation — a controlled, non-sterile fermentation that upgrades winery by-product into polyphenol- and fibre-rich functional food ingredients. Each tab logs four operator-controlled input_ conditions against four measured output_ variables over the fermentation cycle. The headline output, titratable acidity (TA), is the process KPI: the run is complete at TA ≥ 3% (HACCP-validated, pH ≤ 4.4). This document characterises that baseline and builds the predictive soft-sensor / digital-twin models the project is built on.

01The Process in One Picture

CDI's process is an aerobic acidogenic fermentation of grape pomace. A microbial community consumes residual sugars and ethanol from the pomace, drawing dissolved oxygen, and progressively acidifies the broth — titratable acidity climbs while pH falls. TA is both the product-quality endpoint and the clock: the cycle ends when TA reaches target. Because the broth is open and adjusted during the run (volume and substrate vary within and between batches), it behaves as an open, fed system rather than a sealed batch — a fact that shapes every model below.

flowchart LR subgraph IN["Operator-controlled conditions (input_)"] E["Residual ethanol
0–4.8% — substrate"]:::in T["Temperature
15.6–37.8 °C"]:::in A["Aeration
OFF·LOW·MED"]:::in V["Volume
650–975 L"]:::in end subgraph BR["Grape-pomace fermenter"] M(("Microbial
community")):::bug O2["Dissolved O₂
(OUR, kₗa)"]:::o2 end subgraph OUT["Measured outputs (output_)"] D["Time (day)"]:::out AC["Titratable acidity
g/L — KPI"]:::out B["Brix °"]:::out P["pH"]:::out end E --> M T --> M A --> O2 --> M V --> M M -->|"sugars + ethanol + O₂
→ biomass + organic acids"| AC M --> P M --> B D -.cycle clock.-> AC classDef in fill:#eef5ef,stroke:#3f7d5a,color:#21402f; classDef out fill:#fbf1e9,stroke:#b0472f,color:#5a2417; classDef bug fill:#f4ecd6,stroke:#b4862f,color:#5a4413,font-weight:bold; classDef o2 fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58;

Production batches

308

Time-point rows

0.57

g/L·day mean TA rate

02Fermentation Biochemistry & the TA KPI

The grape-pomace fermentation is a microbial acidification: a community of acid-forming organisms metabolises the carbon available in the pomace — residual grape sugars plus residual winemaking ethanol — and excretes organic acids. Those acids are what the process is measured by, and what protects the non-sterile product (HACCP envelope: pH ≤ 4.4, TA ≥ 3%). The exact microbial consortium and acid profile are CDI background IP; for modeling, the relevant quantity is the cumulative titratable acidity.

What titratable acidity actually measures

TA is a titration result — the total neutralisable acid in the broth, reported as g/L (acid equivalents). It aggregates every organic acid present rather than naming one, which is why it is a robust, instrument-light progress KPI:

$$\text{TA} \;=\; \frac{C_{\text{base}}\,V_{\text{base}}}{V_{\text{sample}}}\times E_{\text{acid}}\qquad\left[\tfrac{\text{g}}{\text{L}}\right]$$

where $C_{\text{base}}V_{\text{base}}$ is the moles of standard base to reach the endpoint and $E_{\text{acid}}$ the acid equivalent weight. The soft sensor's job (Project Phase 3) is to estimate TA continuously from fast in-line signals (pH, DO, temperature, airflow, agitation) so operators no longer wait on this offline lab titration.

Carbon & the substrate balance

Carbon for acid and biomass comes from two pools — pomace sugars (tracked indirectly by Brix) and residual ethanol. Generically:

$$\underset{\text{sugars / ethanol}}{\text{substrate}} \;+\; \mathrm{O_2} \;\xrightarrow{\text{microbial community}}\; \underset{\text{product}}{\text{biomass}} \;+\; \underset{\text{TA}}{\text{organic acids}} \;+\; \mathrm{CO_2} + \mathrm{H_2O}$$

Why acidity gained ≫ ethanol consumed — and why that's expected

Across batches the TA gained per unit ethanol lost ranges from ~4 to >25 g/L per %. Far from an anomaly, this is the tell that ethanol is only a minor carbon source — most acid is built from pomace sugars, and the broth is topped up and adjusted during the run (volume swings 650→975 L). So TA accumulates against an open, fed system; it is not tied to a simple closed mass balance on ethanol. This is the single most important modeling insight in the dataset, and it directly motivates the high-frequency, instrumented pilot trials (Milestone 3): the legacy data records concentrations, not the feed events that drive them.

Oxygen transfer is the binding constraint — $k_La$ and OUR

The fermentation is aerobic, so its rate is frequently limited by how fast oxygen can be delivered, not by the microbes' appetite. Two quantities the proposal's digital twin estimates govern this:

$$\underbrace{\text{OTR}=k_La\,(C^*-C_L)}_{\text{oxygen supplied by aeration}}\qquad\qquad \underbrace{\text{OUR}=q_{O_2}\,X}_{\text{oxygen demanded by the culture}}$$

Aeration level (OFF / LOW⁻ / LOW / MED, or LPM in the bioreactor/drum vessels) sets $k_La$ and therefore the oxygen-transfer ceiling on the whole reaction. This is exactly why MED aeration produces the highest mean TA in the data (§5) — and exactly why the MPC layer is designed to optimise aeration first.

03The Dataset: Schema & Variables

The workbook is organised as one tab per vessel. Tab names are tote identifiers (e.g. 23_982). Two auxiliary tabs — bio_reactor_0_045 and drum_0_045 — track the same tote through two vessel types whose aeration is logged in litres-per-minute (LPM) rather than categorical levels.

Variable dictionary (from the LEGEND tab)

Field	Role	Meaning	Units / domain
`input_etoh_percent`	input	Ethanol content of tote	% v/v · 0–4.8
`input_temperature_c`	input	System temperature	°C · 15.6–37.8
`input_aeration_level`	input	System aeration (categorical)	OFF·LOW⁻·LOW·MED
`input_reactor_aeration_level`	input	Bioreactor aeration	LPM
`input_drum_aeration_level`	input	Drum aeration	LPM
`input_volume_l`	input	Tote working volume	L · 650–975
`output_time_day`	output	Fermentation day	day · 0–70
`output_brix`	output	Dissolved solids (refractometric)	°Bx · 1.8–8.8
`output_total_acidity_g_per_l`	output	Titratable acidity (TA) — process KPI	g/L · 0.15–40.8
`output_ph`	output	Broth pH	– · 3.42–4.81

Reading the legend's "output = model prediction value"

The LEGEND defines output_ fields as “Model prediction value.” The recorded numbers are physical measurements, but the schema is explicitly framed as the target variables a predictive model should reproduce from the input_ controls. Sections 6–8 build exactly that model.

Per-tote coverage

Acidity trajectory — every tote (interactive · hover points)

Each line is one tote's titratable acidity over its run. The common shape — slow start, steep middle, tapering plateau — is the logistic signature modeled in §6.

04Data Cleaning Algorithm

The raw sheets contain three classes of artefact that must be resolved before any analysis: (1) Excel carry-down formula references like =C6 in the aeration column, (2) inconsistent category spellings ("LOW ", "LOW=", "LOW-"), and (3) day formulas such as =E5+5. The normalisation algorithm:

ALGORITHM 1 — normalize_workbook(wb) # Produce one tidy long-format row per (tote, timepoint) for sheet in tote_tabs(wb): prev_aer ← "OFF" for row in sheet.rows[2:]: if all_empty(row): continue # 1 · resolve categorical aeration raw ← row.aeration if raw is formula ("=Cx") or raw is null: aer ← prev_aer # carry forward last real value else: aer ← canonicalize(upper(strip(raw))) # LOW-,LOW,MED,OFF,HIGH prev_aer ← aer # 2 · resolve day / output formulas via cached workbook values day ← cached_value(row.day) # evaluates =E5+5 etc. # 3 · coerce numerics, keep NaN for blanks (e.g. day-0 seed rows) emit {tote, etoh, temp, aer, vol, day, brix, acidity, ph} return dataframe(rows)

308

rows after tidy

297

complete acidity points

canonical aeration classes

Canonicalisation maps the observed strings OFF · LOW- · LOW · MED to an ordinal intensity scale OFF=0 < LOW⁻=1 < LOW=2 < MED=3 used as a model feature. Day-0 seed rows often carry only aeration+volume (no measured outputs yet) and are retained as NaN-output anchors.

05Exploratory Metrics & Trends

Before modeling, five empirical regularities emerge from the pooled data. They are summarised here and each one constrains the model form in §6–7.

5.1 · Acidity rises near-linearly within a tote

Regressing TA on day within each batch gives a mean slope of 0.57 g/L·day (median 0.47) and a mean linear $R^2=0.876$. The acidification is steady and well-behaved over a run.

Tote	n	Acidity rate (g/L·day)	Linear R²	EtOH rate (%/day)	Run (days)	Final acidity (g/L)
23_053	11	1.092	0.987	−0.121	27	33.5
23_995	17	1.032	0.947	−0.105	25	32.9
23_978	17	0.882	0.929	−0.061	44	40.8
23_980	12	0.799	0.955	−0.079	34	27.7
23_037	15	0.784	0.899	−0.057	35	29.6
23_982	24	0.723	0.939	−0.126	34	31.3
23_998	22	0.547	0.968	−0.041	51	28.2
23_013	12	0.486	0.793	−0.107	33	21.1
23_014	15	0.444	0.907	−0.037	42	22.3
23_055	15	0.407	0.891	−0.043	42	27.7
23_036	20	0.374	0.952	+0.012	52	27.8
23_986	22	0.365	0.971	−0.038	51	19.6
23_991	18	0.350	0.849	−0.081	43	23.9
23_045	27	0.315	0.822	−0.025	67	32.4
0_045	23	0.268	0.612	−0.043	49	25.2
23_032	27	0.212	0.590	+0.018	70	22.5

5.2 · Ethanol is held in a band, not depleted

Mean ethanol depletion is only −0.058 %/day — an order of magnitude too slow to explain the acid produced. Ethanol fluctuates up and down (feeding events), confirming fed-batch control.

Ethanol % over time — banded, not monotonic (interactive)

5.3 · Correlation structure

Pearson correlations on the pooled complete-case data:

	etoh	temp	vol	day	brix	acidity	pH
etoh	1.00	−0.14	0.33	−0.41	0.18	−0.58	0.08
temp	−0.14	1.00	−0.15	−0.16	−0.14	0.04	0.27
vol	0.33	−0.15	1.00	0.29	0.20	−0.07	−0.20
day	−0.41	−0.16	0.29	1.00	0.49	0.68	−0.62
brix	0.18	−0.14	0.20	0.49	1.00	0.45	−0.41
acidity	−0.58	0.04	−0.07	0.68	0.45	1.00	−0.54
pH	0.08	0.27	−0.20	−0.62	−0.41	−0.54	1.00

Key reads: day↔acidity = +0.68 (time is the dominant driver); etoh↔acidity = −0.58 (acid accumulates as the ethanol band is consumed and re-fed); pH↔acidity = −0.54 and day↔pH = −0.62 (acid drives pH down, partially buffered).

5.4 · Aeration stratification — MED wins

Mean acidity by aeration class

Aeration	Mean acidity	Mean pH	n rows
OFF	14.65	4.07	82
LOW⁻	15.76	4.10	122
LOW	14.90	3.96	53
MED	20.28	3.81	51

MED aeration lifts mean acidity ~35% above the other classes and pushes pH lowest — consistent with oxygen-transfer-limited kinetics (§2). The effect appears late in runs (MED is typically engaged in the high-acid finishing phase), so it is partly confounded with time.

5.5 · Temperature effect is positive but weak

Across totes, the within-tote acidity rate rises +0.064 g/L·day per °C of mean temperature (r = 0.39, p = 0.13). Directionally Arrhenius-like but not statistically resolved at the tote level — most totes are clustered in the 28–31 °C mesophilic optimum, limiting the temperature range over which to estimate the effect.

Acidity production rate vs mean temperature (one point per tote)

06Governing Kinetic Equations

The model is built bottom-up from microbial growth and substrate kinetics, then reduced to the regime the data actually occupy.

6.1 · Microbial-kinetic core

Let $X$ be active biomass, $S$ the carbon substrate (sugars + ethanol), $O$ dissolved oxygen, and $A$ the titratable acidity. Microbial growth follows a double-Monod law (dual limitation by carbon and oxygen), with product (acid) inhibition:

$$\mu(S,O,A)=\mu_{\max}\,\underbrace{\frac{S}{K_S+S}}_{\text{carbon}}\;\underbrace{\frac{O}{K_O+O}}_{\text{oxygen}}\;\underbrace{\left(1-\frac{A}{A_{\max}}\right)}_{\text{acid inhibition}}$$

Acid (TA) is produced coupled to growth and maintenance (Luedeking–Piret), while oxygen is supplied by aeration-driven transfer ($k_La$) and consumed by the culture (OUR):

$$\frac{dX}{dt}=\mu X - k_d X,\qquad \frac{dA}{dt}=\alpha\,\frac{dX}{dt}+\beta X$$

$$\frac{dS}{dt}=\underbrace{F(t)}_{\text{feeding}}-\frac{1}{Y_{A/S}}\frac{dA}{dt},\qquad \frac{dO}{dt}=\underbrace{k_La\,(O^*-O)}_{\text{aeration}}-\frac{1}{Y_{A/O}}\frac{dA}{dt}$$

6.2 · Reduction to the observed regime

Three empirical facts (§5) collapse this system to a tractable form:

Observation	Consequence	Simplification
Carbon kept in surplus by feeding $F(t)$	$S\gg K_S$, so carbon term ≈ 1	Drop substrate limitation
Aeration sets a rate ceiling	$O/(K_O+O)$ becomes a fixed factor $\phi_{\text{aer}}$	Oxygen → aeration multiplier
Acid climbs then plateaus (logistic shape)	Acid-inhibition term dominates the curvature	Keep $(1-A/A_{\max})$

With biomass quasi-proportional to acid-producing capacity, the TA balance reduces to a logistic (Verhulst) law — the canonical model for a batch filling toward its acid ceiling, and the mechanistic backbone of the hybrid digital twin's acidification-kinetics term:

$$\boxed{\;\dfrac{dA}{dt}= k\,\phi_{\text{aer}}\;A\left(1-\dfrac{A}{A_{\max}}\right)\;}\qquad\Longrightarrow\qquad A(t)=\dfrac{A_{\max}}{1+e^{-k\,(t-t_0)}}$$

where $A_{\max}$ is the TA carrying capacity (g/L), $k$ the intrinsic acidification rate (day⁻¹, scaled by aeration factor $\phi_{\text{aer}}$), and $t_0$ the inflection day. In the early phase $A\ll A_{\max}$ this linearises to $dA/dt\approx kA$ → near-constant slope, explaining the strong within-batch linear fits of §5.1.

flowchart TD S["Full mechanistic model
X, S, O, A — 4 coupled ODEs"]:::a S -->|"S ≫ K_S (fed-batch)"| R1["Drop ethanol limitation"]:::b S -->|"O ⇒ aeration factor φ"| R2["Oxygen → φ_aer multiplier"]:::b S -->|"acid plateau observed"| R3["Keep (1 − A/Aₘₐₓ)"]:::b R1 --> L["Logistic acid law
dA/dt = k·φ·A(1 − A/Aₘₐₓ)"]:::c R2 --> L R3 --> L L --> Sol["Closed form
A(t) = Aₘₐₓ / (1 + e^(−k(t−t₀)))"]:::d classDef a fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58; classDef b fill:#fbf8f1,stroke:#b4862f,color:#5a4413; classDef c fill:#eef5ef,stroke:#3f7d5a,color:#21402f,font-weight:bold; classDef d fill:#fbf1e9,stroke:#b0472f,color:#5a2417,font-weight:bold;

07Soft Sensor & Model Derivation

These are the TA soft-sensor prototypes the project targets at R²≥0.9 (Milestone 5), trained on the legacy data. Two complementary predictors are built: a mechanistic per-batch logistic (best for trajectory shape — the digital twin's acidification core) and a pooled multivariate regression (a transparent input→TA estimator). Both target output_total_acidity (TA); pH is a downstream cross-check (§9). On live, instrumented pilot data these become the neural-network soft sensors with the same target.

7.1 · Mechanistic estimator — nonlinear least squares

For each tote, fit $\theta=(A_{\max},k,t_0)$ by minimising squared residuals on the closed-form logistic, with $A_{\max}$ bounded to keep fits physical when a tote is still in its rising phase:

$$\hat\theta=\arg\min_{\theta}\sum_{i}\Big(A_i-\tfrac{A_{\max}}{1+e^{-k(t_i-t_0)}}\Big)^2,\quad A_{\max}\in[0.95\,A_{\text{obs}},\,2.5\,A_{\text{obs}}]$$

7.2 · Statistical estimator — multivariate linear model

A transparent, deployable input→acidity map using the operator controls plus elapsed day:

$$\widehat{A}=\beta_0+\beta_1\,\text{day}+\beta_2\,\text{etoh}+\beta_3\,\text{temp}+\beta_4\,\text{aer}_{\text{ord}}+\beta_5\,\text{vol}$$

Fitted coefficients (ordinary least squares, all 297 complete points):

Term	Coefficient	Interpretation
intercept $\beta_0$	20.05	baseline offset (g/L)
day $\beta_1$	+0.276	+0.28 g/L per day — the dominant driver
etoh $\beta_2$	−2.157	high residual ethanol ⇒ acid not yet formed
temp $\beta_3$	+0.183	warmer ⇒ faster (Arrhenius-like)
aer_ord $\beta_4$	+0.587	each aeration step adds ~0.6 g/L
vol $\beta_5$	−0.016	dilution / larger headspace, minor

7.3 · Nonparametric estimator — gradient boosting

A gradient-boosted tree ensemble (200 stumps, depth 3, η = 0.05) on the same features provides a flexible benchmark and a feature-importance read:

Feature	Importance
day	0.614
etoh	0.149
vol	0.119
temp	0.064
aer_ord	0.054

Both estimators agree that elapsed day carries the most predictive signal, with ethanol the strongest control variable — exactly what the kinetic reduction predicts.

08Model Fits & Validation

8.1 · Mechanistic logistic — per-tote fits

Mean fit quality across all 16 totes: R² = 0.904. The % saturation column (final acidity ÷ fitted $A_{\max}$) shows most totes finish at 40–105% of capacity — several are harvested while still climbing.

Tote	A_max (g/L)	k (day⁻¹)	t₀ (day)	R²	% of A_max reached
23_013	20.4	0.237	6.7	0.992	103
23_053	39.6	0.126	14.0	0.991	85
23_986	28.2	0.065	38.0	0.982	69
23_978	102.0	0.070	50.9	0.977	40
23_998	39.9	0.070	38.5	0.971	71
23_995	82.2	0.078	30.4	0.966	40
23_036	69.5	0.032	69.3	0.959	40
23_980	32.2	0.117	15.2	0.956	86
23_982	78.2	0.059	41.9	0.955	40
23_037	36.0	0.104	16.7	0.914	82
23_014	36.3	0.056	33.4	0.898	61
23_055	52.8	0.035	41.9	0.884	52
23_991	28.5	0.053	14.1	0.846	84
23_045	30.8	0.048	20.3	0.815	105
23_032	21.8	0.145	14.9	0.755	103
0_045	63.0	0.028	79.7	0.611	40

Worked example — tote 23_053: data vs fitted logistic

Fit: $A_{\max}=39.6$ g/L, $k=0.126$ day⁻¹, $t_0=14.0$ day, R² = 0.991. The S-curve captures lag, exponential rise, and onset of plateau.

8.2 · Pooled regression — honest cross-tote validation

The acid-prediction models are validated with leave-one-tote-out (LOTO) cross-validation: each tote is predicted by a model trained on the other fifteen. This is the realistic test of generalising to a new vessel.

Model	In-sample R²	LOTO R²	LOTO RMSE (g/L)	LOTO MAE (g/L)
Per-tote logistic (mechanistic)	0.904	—^†	—	—
Multivariate linear	0.595	0.450	5.53	4.19
Gradient-boosted trees	—	0.349	6.02	4.75
Zero-order time model $A=8.22+0.32\,\text{day}$	0.455	—	5.50	4.32

^†The logistic is fit per-tote and characterises a known vessel's trajectory; it is not a blind cross-tote predictor. For predicting a brand-new tote from inputs, the linear model leads (LOTO R² 0.45, RMSE 5.5 g/L).

Reading these numbers honestly

Two prediction problems live in this data and they have very different difficulty. (A) Trajectory tracking within a known vessel is easy — the logistic nails it (R² ≈ 0.90+). (B) Cold-start prediction of a never-seen tote from its inputs is genuinely hard (LOTO R² ≈ 0.45), because each tote's feeding history is its own latent variable that the four logged inputs only partly capture. Closing that gap is the chief opportunity in §12.

09The pH Sub-Model

pH is the dissociation read-out of the organic acids produced. For weak organic acids (acetic/lactic/tartaric, pKₐ ≈ 3–4.8) the Henderson–Hasselbalch relation predicts a logarithmic dependence on acid concentration:

$$\text{pH}=\text{p}K_a+\log_{10}\!\frac{[\text{A}^-]}{[\text{HA}]}\;\approx\;a+b\,\log_{10}(A)$$

Fitting both the log form and a linear form to the data:

pH model	Equation	R²	RMSE	MAE
Log (Henderson–Hasselbalch)	pH = 4.523 − 0.443·log₁₀(A)	0.198	0.249	0.201
Linear	pH = 4.340 − 0.0201·A	0.291	0.234	0.192

pH vs total acidity — measured points (interactive)

Why pH is only weakly predictable here

The broth is a buffered system — organic-acid/conjugate-base buffering plus pomace solids flatten the pH response, so TA can climb from 10 to 30 g/L while pH barely moves (≈3.6–4.1). pH is therefore a poor stand-alone progress indicator; TA is the reliable KPI — which is precisely why the project builds a soft sensor to estimate TA rather than relying on the cheap pH probe alone. The negative slope is real and directionally correct, but ±0.23 pH unit scatter limits pH to a coarse cross-check.

10Inference Pipeline & Pseudocode

End-to-end, the deployable predictor chains cleaning → feature build → dual estimator → pH sub-model → horizon forecast.

flowchart LR A["Raw tote sheet"]:::s --> B["Algorithm 1
normalize"]:::p B --> C["Feature vector
day·etoh·temp·aerₒᵣ𝒹·vol"]:::p C --> D{"Known
vessel?"}:::d D -->|yes| E["Per-tote logistic
A(t)=Aₘₐₓ/(1+e^−k(t−t₀))"]:::m D -->|no| F["Pooled linear
Â = βᵀx"]:::m E --> G["Acidity forecast"]:::o F --> G G --> H["pH sub-model
pH = 4.34 − 0.0201·A"]:::o G --> I["Harvest-day estimate
solve A(t*) = A_target"]:::o classDef s fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58; classDef p fill:#fbf8f1,stroke:#b4862f,color:#5a4413; classDef d fill:#f4ecd6,stroke:#b4862f,color:#5a4413,font-weight:bold; classDef m fill:#eef5ef,stroke:#3f7d5a,color:#21402f,font-weight:bold; classDef o fill:#fbf1e9,stroke:#b0472f,color:#5a2417;

ALGORITHM 2 — predict_and_forecast(tote_history, controls, A_target) # Returns acidity trajectory, pH, and estimated harvest day df ← normalize(tote_history) # Algorithm 1 x ← [day, etoh, temp, aer_ord, vol] # current feature vector if len(df) ≥ 6: # enough history → mechanistic Amax,k,t0 ← fit_logistic(df.day, df.acidity, bounds=[0.95·max, 2.5·max]) A_hat(t) ← Amax / (1 + exp(−k·(t − t0))) else: # cold start → pooled regression A_hat ← β0 + β·x # β from §7.2 pH_hat ← 4.340 − 0.0201·A_hat # §9 sub-model # invert logistic for the day acidity crosses the target if mechanistic: t_star ← t0 + (1/k)·ln( A_target / (Amax − A_target) ) else: t_star ← (A_target − β0 − β₋day·x₋day) / β_day return {trajectory: A_hat, pH: pH_hat, harvest_day: t_star}

The endpoint-day inversion is the model's most useful output: given the TA target $A^\star$ (the HACCP completion spec, TA ≥ 3% ≈ 30 g/L), solve the logistic for the day the batch reaches it — turning a 45-day wait into a forecast:

$$t^\star=t_0+\frac{1}{k}\,\ln\!\frac{A^\star}{A_{\max}-A^\star}$$

11Process Control Levers

Translating the fitted model into operating guidance — what each input does and how strongly the data support it.

Lever	Effect on acidification	Evidence strength	Operating note
Aeration ↑	Raises rate ceiling; MED ≈ +35% mean TA	Strong (β₄>0, stratified means)	O₂-transfer limited — the MPC's first lever; step to MED for the finishing phase
Carbon / substrate feed	Sustains carbon so TA keeps climbing	Strong (fed-system signature)	Keep substrate in surplus; don't let the culture starve
Temperature	+0.064 g/L·day per °C, optimum ~28–31 °C	Moderate (r=0.39, p=0.13)	Stay mesophilic; >35 °C risks culture stress
Volume	Mild dilution / headspace effect	Weak (β₅ small)	Secondary; affects O₂ surface ratio
Time	Dominant — logistic accumulation	Very strong (r=0.68)	Use $t^\star$ inversion to forecast the endpoint day

12Limitations & Next Steps

Limitations

Feeding history is unlogged. Ethanol/sugar additions are the hidden driver behind cold-start error; only the post-addition concentration is recorded, not the dose.
Aeration is confounded with time — MED is engaged late, so its standalone effect is partly absorbed by the day term.
Temperature range is narrow (mostly 28–31 °C), so the Arrhenius coefficient is under-identified.
pH is buffered and only weakly predictable (R² ≈ 0.29); it should not be used as a primary endpoint.
Brix is noisy — it tracks dissolved solids/feeding rather than a clean reaction coordinate, so it was not modeled as a target.

Next steps

Log feed events (volume + ethanol dose + timestamp) to convert cold-start prediction from R²≈0.45 toward the within-tote R²≈0.90.
Fit the full ODE (§6.1) with a dissolved-O₂ probe to identify $k_La$ per aeration class explicitly.
Hierarchical / mixed-effects model: shared population kinetics + per-tote random effects on $A_{\max},k,t_0$ for principled cold-start priors.
Online re-fit: update logistic parameters as each new daily measurement arrives (recursive least squares) for live harvest-day estimates.
Incorporate the LPM bioreactor/drum tabs to calibrate the aeration→$\phi_{\text{aer}}$ map against true airflow.

13How This Maps to Ai26.10

This analysis is the historical-data leg of the project. Each result below feeds a specific proposal component and, in several cases, independently corroborates a proposal claim using CDI's own numbers.

This document	Ai26.10 component	What it establishes
Logistic TA law $A(t)=A_{\max}/(1+e^{-k(t-t_0)})$, mean fit R²=0.90	Hybrid digital twin — acidification-kinetics mechanistic core	The mechanistic backbone the NN residual-learning layer corrects against
Linear / GBM TA estimators from inputs	NN soft sensor (Milestone 5, target R²≥0.9)	Within-batch fit already meets R²≥0.9 on legacy low-frequency data — strong feasibility signal
Oxygen-transfer-limited finding; MED → +35% TA	MPC optimising aeration/mixing/temperature	Confirms aeration is the highest-value control lever, grounding the $k_La$/OUR twin terms
Endpoint-day inversion $t^\star=t_0+\tfrac1k\ln\frac{A^\star}{A_{\max}-A^\star}$	ETA-to-target dashboard; cycle-time KPI	The mechanism behind the 45 → 10–15 day claim, expressed per-batch
Per-batch rate spread (k ≈ 0.03 → 0.24/day)	RSM optimal-window targeting	Quantifies the gap between slow and fast batches that closed-loop control closes
Cold-start LOTO R²≈0.45 vs within-batch ≈0.90; feed events unlogged	Pilot Phase 2 / Milestone 3 rationale	Independently justifies why high-frequency instrumented pilot trials are necessary
pH weakly predictive (R²≈0.29), buffered	Sensor-fusion soft sensor design	Shows why TA can't be read off a cheap pH probe — the soft sensor earns its keep
Autoencoder-ready residual structure (off-trend points)	Anomaly detection (Milestone 5)	Trend model provides the baseline against which drift/contamination is flagged

Bottom line for reviewers

The historical dataset does more than establish a baseline — it de-risks the central technical bet. A simple kinetic model already clears the project's R²≥0.9 soft-sensor bar within a batch, the dominant control lever (aeration) is identified and physically explained, and the one thing the legacy data can't do (cold-start a new batch) is precisely what the funded pilot instrumentation is designed to fix. The model's structure maps one-to-one onto the digital twin, soft sensor, MPC, and anomaly-detection deliverables.

Observation	Consequence	Simplification
Carbon kept in surplus by feeding \(F(t)\)	\(S\gg K_S\), so carbon term ≈ 1	Drop substrate limitation
Aeration sets a rate ceiling	\(O/(K_O+O)\) becomes a fixed factor \(\phi_{\text{aer}}\)	Oxygen → aeration multiplier
Acid climbs then plateaus (logistic shape)	Acid-inhibition term dominates the curvature	Keep \((1-A/A_{\max})\)

Term	Coefficient	Interpretation
intercept \(\beta_0\)	20.05	baseline offset (g/L)
day \(\beta_1\)	+0.276	+0.28 g/L per day — the dominant driver
etoh \(\beta_2\)	−2.157	high residual ethanol ⇒ acid not yet formed
temp \(\beta_3\)	+0.183	warmer ⇒ faster (Arrhenius-like)
aer_ord \(\beta_4\)	+0.587	each aeration step adds ~0.6 g/L
vol \(\beta_5\)	−0.016	dilution / larger headspace, minor

This document	Ai26.10 component	What it establishes
Logistic TA law \(A(t)=A_{\max}/(1+e^{-k(t-t_0)})\), mean fit R²=0.90	Hybrid digital twin — acidification-kinetics mechanistic core	The mechanistic backbone the NN residual-learning layer corrects against
Linear / GBM TA estimators from inputs	NN soft sensor (Milestone 5, target R²≥0.9)	Within-batch fit already meets R²≥0.9 on legacy low-frequency data — strong feasibility signal
Oxygen-transfer-limited finding; MED → +35% TA	MPC optimising aeration/mixing/temperature	Confirms aeration is the highest-value control lever, grounding the \(k_La\)/OUR twin terms
Endpoint-day inversion \(t^\star=t_0+\tfrac1k\ln\frac{A^\star}{A_{\max}-A^\star}\)	ETA-to-target dashboard; cycle-time KPI	The mechanism behind the 45 → 10–15 day claim, expressed per-batch
Per-batch rate spread (k ≈ 0.03 → 0.24/day)	RSM optimal-window targeting	Quantifies the gap between slow and fast batches that closed-loop control closes
Cold-start LOTO R²≈0.45 vs within-batch ≈0.90; feed events unlogged	Pilot Phase 2 / Milestone 3 rationale	Independently justifies why high-frequency instrumented pilot trials are necessary
pH weakly predictive (R²≈0.29), buffered	Sensor-fusion soft sensor design	Shows why TA can't be read off a cheap pH probe — the soft sensor earns its keep
Autoencoder-ready residual structure (off-trend points)	Anomaly detection (Milestone 5)	Trend model provides the baseline against which drift/contamination is flagged

Grape-Pomace Fermentation:Science & Algorithm Documentation

Contents

01The Process in One Picture

02Fermentation Biochemistry & the TA KPI

What titratable acidity actually measures

Carbon & the substrate balance

Oxygen transfer is the binding constraint — \(k_La\) and OUR

03The Dataset: Schema & Variables

Variable dictionary (from the LEGEND tab)

Per-tote coverage

Acidity trajectory — every tote (interactive · hover points)

04Data Cleaning Algorithm

05Exploratory Metrics & Trends

5.1 · Acidity rises near-linearly within a tote

5.2 · Ethanol is held in a band, not depleted

Ethanol % over time — banded, not monotonic (interactive)

5.3 · Correlation structure

5.4 · Aeration stratification — MED wins

Mean acidity by aeration class

5.5 · Temperature effect is positive but weak

Acidity production rate vs mean temperature (one point per tote)

06Governing Kinetic Equations

6.1 · Microbial-kinetic core

6.2 · Reduction to the observed regime

07Soft Sensor & Model Derivation

7.1 · Mechanistic estimator — nonlinear least squares

7.2 · Statistical estimator — multivariate linear model

7.3 · Nonparametric estimator — gradient boosting

08Model Fits & Validation

8.1 · Mechanistic logistic — per-tote fits

Worked example — tote 23_053: data vs fitted logistic

8.2 · Pooled regression — honest cross-tote validation

09The pH Sub-Model

pH vs total acidity — measured points (interactive)

10Inference Pipeline & Pseudocode

11Process Control Levers

12Limitations & Next Steps

Limitations

Next steps

13How This Maps to Ai26.10

The one thing to remember

AWhat is this, really?

🎛️ What we control

📏 What we watch

BWhat the data proves

It's predictable

It's steerable

It's repeatable across batches

We can forecast the finish line

See it for yourself — every batch follows the same shape

More air → more acidity

Forecasting a batch's path

CThe model, in one sentence

It's a GPS arrival-time, but for fermentation.

DWhat the win looks like

EWhere we are, and the opportunity

What's already solid

The one upgrade that unlocks the rest

Grape-Pomace Fermentation:
Science & Algorithm Documentation