Project Ai26.10 · Crush Dynamics × Atomic47 Labs · PIC AI Program

Grape-Pomace Fermentation:
Science & Algorithm Documentation

The science and modeling evidence behind High-Performance Fermentation with Neural Networks + Digital Twin. Two ways to read it: Scientist is the full technical reference — fermentation kinetics, the soft-sensor and digital-twin math, fits, and validation. Othertist tells the same story in plain English for reviewers, investors, and funders who need to get it without operating it.

3 yrs CDI historical data · 16 batches 308 timepoints TA = process KPI Soft-sensor target R²≥0.9 — already met within-batch 45-day → 10–15-day opportunity

This is the science and modeling foundation under Project Ai26.10. The Fermentation Data workbook is Crush Dynamics' historical record of sixteen production batches (“totes”) of its patented grape-pomace biotransformation — a controlled, non-sterile fermentation that upgrades winery by-product into polyphenol- and fibre-rich functional food ingredients. Each tab logs four operator-controlled input_ conditions against four measured output_ variables over the fermentation cycle. The headline output, titratable acidity (TA), is the process KPI: the run is complete at TA ≥ 3% (HACCP-validated, pH ≤ 4.4). This document characterises that baseline and builds the predictive soft-sensor / digital-twin models the project is built on.

01The Process in One Picture

CDI's process is an aerobic acidogenic fermentation of grape pomace. A microbial community consumes residual sugars and ethanol from the pomace, drawing dissolved oxygen, and progressively acidifies the broth — titratable acidity climbs while pH falls. TA is both the product-quality endpoint and the clock: the cycle ends when TA reaches target. Because the broth is open and adjusted during the run (volume and substrate vary within and between batches), it behaves as an open, fed system rather than a sealed batch — a fact that shapes every model below.

flowchart LR subgraph IN["Operator-controlled conditions (input_)"] E["Residual ethanol
0–4.8% — substrate"]:::in T["Temperature
15.6–37.8 °C"]:::in A["Aeration
OFF·LOW·MED"]:::in V["Volume
650–975 L"]:::in end subgraph BR["Grape-pomace fermenter"] M(("Microbial
community")):::bug O2["Dissolved O₂
(OUR, kₗa)"]:::o2 end subgraph OUT["Measured outputs (output_)"] D["Time (day)"]:::out AC["Titratable acidity
g/L — KPI"]:::out B["Brix °"]:::out P["pH"]:::out end E --> M T --> M A --> O2 --> M V --> M M -->|"sugars + ethanol + O₂
→ biomass + organic acids"| AC M --> P M --> B D -.cycle clock.-> AC classDef in fill:#eef5ef,stroke:#3f7d5a,color:#21402f; classDef out fill:#fbf1e9,stroke:#b0472f,color:#5a2417; classDef bug fill:#f4ecd6,stroke:#b4862f,color:#5a4413,font-weight:bold; classDef o2 fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58;
16
Production batches
308
Time-point rows
0.57
g/L·day mean TA rate

02Fermentation Biochemistry & the TA KPI

The grape-pomace fermentation is a microbial acidification: a community of acid-forming organisms metabolises the carbon available in the pomace — residual grape sugars plus residual winemaking ethanol — and excretes organic acids. Those acids are what the process is measured by, and what protects the non-sterile product (HACCP envelope: pH ≤ 4.4, TA ≥ 3%). The exact microbial consortium and acid profile are CDI background IP; for modeling, the relevant quantity is the cumulative titratable acidity.

What titratable acidity actually measures

TA is a titration result — the total neutralisable acid in the broth, reported as g/L (acid equivalents). It aggregates every organic acid present rather than naming one, which is why it is a robust, instrument-light progress KPI:

$$\text{TA} \;=\; \frac{C_{\text{base}}\,V_{\text{base}}}{V_{\text{sample}}}\times E_{\text{acid}}\qquad\left[\tfrac{\text{g}}{\text{L}}\right]$$

where \(C_{\text{base}}V_{\text{base}}\) is the moles of standard base to reach the endpoint and \(E_{\text{acid}}\) the acid equivalent weight. The soft sensor's job (Project Phase 3) is to estimate TA continuously from fast in-line signals (pH, DO, temperature, airflow, agitation) so operators no longer wait on this offline lab titration.

Carbon & the substrate balance

Carbon for acid and biomass comes from two pools — pomace sugars (tracked indirectly by Brix) and residual ethanol. Generically:

$$\underset{\text{sugars / ethanol}}{\text{substrate}} \;+\; \mathrm{O_2} \;\xrightarrow{\text{microbial community}}\; \underset{\text{product}}{\text{biomass}} \;+\; \underset{\text{TA}}{\text{organic acids}} \;+\; \mathrm{CO_2} + \mathrm{H_2O}$$
Why acidity gained ≫ ethanol consumed — and why that's expected
Across batches the TA gained per unit ethanol lost ranges from ~4 to >25 g/L per %. Far from an anomaly, this is the tell that ethanol is only a minor carbon source — most acid is built from pomace sugars, and the broth is topped up and adjusted during the run (volume swings 650→975 L). So TA accumulates against an open, fed system; it is not tied to a simple closed mass balance on ethanol. This is the single most important modeling insight in the dataset, and it directly motivates the high-frequency, instrumented pilot trials (Milestone 3): the legacy data records concentrations, not the feed events that drive them.

Oxygen transfer is the binding constraint — \(k_La\) and OUR

The fermentation is aerobic, so its rate is frequently limited by how fast oxygen can be delivered, not by the microbes' appetite. Two quantities the proposal's digital twin estimates govern this:

$$\underbrace{\text{OTR}=k_La\,(C^*-C_L)}_{\text{oxygen supplied by aeration}}\qquad\qquad \underbrace{\text{OUR}=q_{O_2}\,X}_{\text{oxygen demanded by the culture}}$$

Aeration level (OFF / LOW⁻ / LOW / MED, or LPM in the bioreactor/drum vessels) sets \(k_La\) and therefore the oxygen-transfer ceiling on the whole reaction. This is exactly why MED aeration produces the highest mean TA in the data (§5) — and exactly why the MPC layer is designed to optimise aeration first.

03The Dataset: Schema & Variables

The workbook is organised as one tab per vessel. Tab names are tote identifiers (e.g. 23_982). Two auxiliary tabs — bio_reactor_0_045 and drum_0_045 — track the same tote through two vessel types whose aeration is logged in litres-per-minute (LPM) rather than categorical levels.

Variable dictionary (from the LEGEND tab)

FieldRoleMeaningUnits / domain
input_etoh_percentinputEthanol content of tote% v/v · 0–4.8
input_temperature_cinputSystem temperature°C · 15.6–37.8
input_aeration_levelinputSystem aeration (categorical)OFF·LOW⁻·LOW·MED
input_reactor_aeration_levelinputBioreactor aerationLPM
input_drum_aeration_levelinputDrum aerationLPM
input_volume_linputTote working volumeL · 650–975
output_time_dayoutputFermentation dayday · 0–70
output_brixoutputDissolved solids (refractometric)°Bx · 1.8–8.8
output_total_acidity_g_per_loutputTitratable acidity (TA) — process KPIg/L · 0.15–40.8
output_phoutputBroth pH– · 3.42–4.81
Reading the legend's "output = model prediction value"
The LEGEND defines output_ fields as “Model prediction value.” The recorded numbers are physical measurements, but the schema is explicitly framed as the target variables a predictive model should reproduce from the input_ controls. Sections 6–8 build exactly that model.

Per-tote coverage

Acidity trajectory — every tote (interactive · hover points)

Each line is one tote's titratable acidity over its run. The common shape — slow start, steep middle, tapering plateau — is the logistic signature modeled in §6.

04Data Cleaning Algorithm

The raw sheets contain three classes of artefact that must be resolved before any analysis: (1) Excel carry-down formula references like =C6 in the aeration column, (2) inconsistent category spellings ("LOW ", "LOW=", "LOW-"), and (3) day formulas such as =E5+5. The normalisation algorithm:

ALGORITHM 1 — normalize_workbook(wb) # Produce one tidy long-format row per (tote, timepoint) for sheet in tote_tabs(wb): prev_aer "OFF" for row in sheet.rows[2:]: if all_empty(row): continue # 1 · resolve categorical aeration raw row.aeration if raw is formula ("=Cx") or raw is null: aer prev_aer # carry forward last real value else: aer canonicalize(upper(strip(raw))) # LOW-,LOW,MED,OFF,HIGH prev_aer aer # 2 · resolve day / output formulas via cached workbook values day cached_value(row.day) # evaluates =E5+5 etc. # 3 · coerce numerics, keep NaN for blanks (e.g. day-0 seed rows) emit {tote, etoh, temp, aer, vol, day, brix, acidity, ph} return dataframe(rows)
308
rows after tidy
297
complete acidity points
4
canonical aeration classes

Canonicalisation maps the observed strings OFF · LOW- · LOW · MED to an ordinal intensity scale OFF=0 < LOW⁻=1 < LOW=2 < MED=3 used as a model feature. Day-0 seed rows often carry only aeration+volume (no measured outputs yet) and are retained as NaN-output anchors.

05Exploratory Metrics & Trends

Before modeling, five empirical regularities emerge from the pooled data. They are summarised here and each one constrains the model form in §6–7.

5.1 · Acidity rises near-linearly within a tote

Regressing TA on day within each batch gives a mean slope of 0.57 g/L·day (median 0.47) and a mean linear \(R^2=0.876\). The acidification is steady and well-behaved over a run.

TotenAcidity rate (g/L·day)Linear R²EtOH rate (%/day)Run (days)Final acidity (g/L)
23_053111.0920.987−0.1212733.5
23_995171.0320.947−0.1052532.9
23_978170.8820.929−0.0614440.8
23_980120.7990.955−0.0793427.7
23_037150.7840.899−0.0573529.6
23_982240.7230.939−0.1263431.3
23_998220.5470.968−0.0415128.2
23_013120.4860.793−0.1073321.1
23_014150.4440.907−0.0374222.3
23_055150.4070.891−0.0434227.7
23_036200.3740.952+0.0125227.8
23_986220.3650.971−0.0385119.6
23_991180.3500.849−0.0814323.9
23_045270.3150.822−0.0256732.4
0_045230.2680.612−0.0434925.2
23_032270.2120.590+0.0187022.5

5.2 · Ethanol is held in a band, not depleted

Mean ethanol depletion is only −0.058 %/day — an order of magnitude too slow to explain the acid produced. Ethanol fluctuates up and down (feeding events), confirming fed-batch control.

Ethanol % over time — banded, not monotonic (interactive)

5.3 · Correlation structure

Pearson correlations on the pooled complete-case data:

etohtempvoldaybrixaciditypH
etoh1.00−0.140.33−0.410.18−0.580.08
temp−0.141.00−0.15−0.16−0.140.040.27
vol0.33−0.151.000.290.20−0.07−0.20
day−0.41−0.160.291.000.490.68−0.62
brix0.18−0.140.200.491.000.45−0.41
acidity−0.580.04−0.070.680.451.00−0.54
pH0.080.27−0.20−0.62−0.41−0.541.00

Key reads: day↔acidity = +0.68 (time is the dominant driver); etoh↔acidity = −0.58 (acid accumulates as the ethanol band is consumed and re-fed); pH↔acidity = −0.54 and day↔pH = −0.62 (acid drives pH down, partially buffered).

5.4 · Aeration stratification — MED wins

Mean acidity by aeration class

AerationMean acidityMean pHn rows
OFF14.654.0782
LOW⁻15.764.10122
LOW14.903.9653
MED20.283.8151

MED aeration lifts mean acidity ~35% above the other classes and pushes pH lowest — consistent with oxygen-transfer-limited kinetics (§2). The effect appears late in runs (MED is typically engaged in the high-acid finishing phase), so it is partly confounded with time.

5.5 · Temperature effect is positive but weak

Across totes, the within-tote acidity rate rises +0.064 g/L·day per °C of mean temperature (r = 0.39, p = 0.13). Directionally Arrhenius-like but not statistically resolved at the tote level — most totes are clustered in the 28–31 °C mesophilic optimum, limiting the temperature range over which to estimate the effect.

Acidity production rate vs mean temperature (one point per tote)

06Governing Kinetic Equations

The model is built bottom-up from microbial growth and substrate kinetics, then reduced to the regime the data actually occupy.

6.1 · Microbial-kinetic core

Let \(X\) be active biomass, \(S\) the carbon substrate (sugars + ethanol), \(O\) dissolved oxygen, and \(A\) the titratable acidity. Microbial growth follows a double-Monod law (dual limitation by carbon and oxygen), with product (acid) inhibition:

$$\mu(S,O,A)=\mu_{\max}\,\underbrace{\frac{S}{K_S+S}}_{\text{carbon}}\;\underbrace{\frac{O}{K_O+O}}_{\text{oxygen}}\;\underbrace{\left(1-\frac{A}{A_{\max}}\right)}_{\text{acid inhibition}}$$

Acid (TA) is produced coupled to growth and maintenance (Luedeking–Piret), while oxygen is supplied by aeration-driven transfer (\(k_La\)) and consumed by the culture (OUR):

$$\frac{dX}{dt}=\mu X - k_d X,\qquad \frac{dA}{dt}=\alpha\,\frac{dX}{dt}+\beta X$$
$$\frac{dS}{dt}=\underbrace{F(t)}_{\text{feeding}}-\frac{1}{Y_{A/S}}\frac{dA}{dt},\qquad \frac{dO}{dt}=\underbrace{k_La\,(O^*-O)}_{\text{aeration}}-\frac{1}{Y_{A/O}}\frac{dA}{dt}$$

6.2 · Reduction to the observed regime

Three empirical facts (§5) collapse this system to a tractable form:

ObservationConsequenceSimplification
Carbon kept in surplus by feeding \(F(t)\)\(S\gg K_S\), so carbon term ≈ 1Drop substrate limitation
Aeration sets a rate ceiling\(O/(K_O+O)\) becomes a fixed factor \(\phi_{\text{aer}}\)Oxygen → aeration multiplier
Acid climbs then plateaus (logistic shape)Acid-inhibition term dominates the curvatureKeep \((1-A/A_{\max})\)

With biomass quasi-proportional to acid-producing capacity, the TA balance reduces to a logistic (Verhulst) law — the canonical model for a batch filling toward its acid ceiling, and the mechanistic backbone of the hybrid digital twin's acidification-kinetics term:

$$\boxed{\;\dfrac{dA}{dt}= k\,\phi_{\text{aer}}\;A\left(1-\dfrac{A}{A_{\max}}\right)\;}\qquad\Longrightarrow\qquad A(t)=\dfrac{A_{\max}}{1+e^{-k\,(t-t_0)}}$$

where \(A_{\max}\) is the TA carrying capacity (g/L), \(k\) the intrinsic acidification rate (day⁻¹, scaled by aeration factor \(\phi_{\text{aer}}\)), and \(t_0\) the inflection day. In the early phase \(A\ll A_{\max}\) this linearises to \(dA/dt\approx kA\) → near-constant slope, explaining the strong within-batch linear fits of §5.1.

flowchart TD S["Full mechanistic model
X, S, O, A — 4 coupled ODEs"]:::a S -->|"S ≫ K_S (fed-batch)"| R1["Drop ethanol limitation"]:::b S -->|"O ⇒ aeration factor φ"| R2["Oxygen → φ_aer multiplier"]:::b S -->|"acid plateau observed"| R3["Keep (1 − A/Aₘₐₓ)"]:::b R1 --> L["Logistic acid law
dA/dt = k·φ·A(1 − A/Aₘₐₓ)"]:::c R2 --> L R3 --> L L --> Sol["Closed form
A(t) = Aₘₐₓ / (1 + e^(−k(t−t₀)))"]:::d classDef a fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58; classDef b fill:#fbf8f1,stroke:#b4862f,color:#5a4413; classDef c fill:#eef5ef,stroke:#3f7d5a,color:#21402f,font-weight:bold; classDef d fill:#fbf1e9,stroke:#b0472f,color:#5a2417,font-weight:bold;

07Soft Sensor & Model Derivation

These are the TA soft-sensor prototypes the project targets at R²≥0.9 (Milestone 5), trained on the legacy data. Two complementary predictors are built: a mechanistic per-batch logistic (best for trajectory shape — the digital twin's acidification core) and a pooled multivariate regression (a transparent input→TA estimator). Both target output_total_acidity (TA); pH is a downstream cross-check (§9). On live, instrumented pilot data these become the neural-network soft sensors with the same target.

7.1 · Mechanistic estimator — nonlinear least squares

For each tote, fit \(\theta=(A_{\max},k,t_0)\) by minimising squared residuals on the closed-form logistic, with \(A_{\max}\) bounded to keep fits physical when a tote is still in its rising phase:

$$\hat\theta=\arg\min_{\theta}\sum_{i}\Big(A_i-\tfrac{A_{\max}}{1+e^{-k(t_i-t_0)}}\Big)^2,\quad A_{\max}\in[0.95\,A_{\text{obs}},\,2.5\,A_{\text{obs}}]$$

7.2 · Statistical estimator — multivariate linear model

A transparent, deployable input→acidity map using the operator controls plus elapsed day:

$$\widehat{A}=\beta_0+\beta_1\,\text{day}+\beta_2\,\text{etoh}+\beta_3\,\text{temp}+\beta_4\,\text{aer}_{\text{ord}}+\beta_5\,\text{vol}$$

Fitted coefficients (ordinary least squares, all 297 complete points):

TermCoefficientInterpretation
intercept \(\beta_0\)20.05baseline offset (g/L)
day \(\beta_1\)+0.276+0.28 g/L per day — the dominant driver
etoh \(\beta_2\)−2.157high residual ethanol ⇒ acid not yet formed
temp \(\beta_3\)+0.183warmer ⇒ faster (Arrhenius-like)
aerord \(\beta_4\)+0.587each aeration step adds ~0.6 g/L
vol \(\beta_5\)−0.016dilution / larger headspace, minor

7.3 · Nonparametric estimator — gradient boosting

A gradient-boosted tree ensemble (200 stumps, depth 3, η = 0.05) on the same features provides a flexible benchmark and a feature-importance read:

FeatureImportance
day0.614
etoh0.149
vol0.119
temp0.064
aerord0.054

Both estimators agree that elapsed day carries the most predictive signal, with ethanol the strongest control variable — exactly what the kinetic reduction predicts.

08Model Fits & Validation

8.1 · Mechanistic logistic — per-tote fits

Mean fit quality across all 16 totes: R² = 0.904. The % saturation column (final acidity ÷ fitted \(A_{\max}\)) shows most totes finish at 40–105% of capacity — several are harvested while still climbing.

ToteAmax (g/L)k (day⁻¹)t₀ (day)% of Amax reached
23_01320.40.2376.70.992103
23_05339.60.12614.00.99185
23_98628.20.06538.00.98269
23_978102.00.07050.90.97740
23_99839.90.07038.50.97171
23_99582.20.07830.40.96640
23_03669.50.03269.30.95940
23_98032.20.11715.20.95686
23_98278.20.05941.90.95540
23_03736.00.10416.70.91482
23_01436.30.05633.40.89861
23_05552.80.03541.90.88452
23_99128.50.05314.10.84684
23_04530.80.04820.30.815105
23_03221.80.14514.90.755103
0_04563.00.02879.70.61140

Worked example — tote 23_053: data vs fitted logistic

Fit: \(A_{\max}=39.6\) g/L, \(k=0.126\) day⁻¹, \(t_0=14.0\) day, R² = 0.991. The S-curve captures lag, exponential rise, and onset of plateau.

8.2 · Pooled regression — honest cross-tote validation

The acid-prediction models are validated with leave-one-tote-out (LOTO) cross-validation: each tote is predicted by a model trained on the other fifteen. This is the realistic test of generalising to a new vessel.

ModelIn-sample R²LOTO R²LOTO RMSE (g/L)LOTO MAE (g/L)
Per-tote logistic (mechanistic)0.904
Multivariate linear0.5950.4505.534.19
Gradient-boosted trees0.3496.024.75
Zero-order time model \(A=8.22+0.32\,\text{day}\)0.4555.504.32

The logistic is fit per-tote and characterises a known vessel's trajectory; it is not a blind cross-tote predictor. For predicting a brand-new tote from inputs, the linear model leads (LOTO R² 0.45, RMSE 5.5 g/L).

Reading these numbers honestly
Two prediction problems live in this data and they have very different difficulty. (A) Trajectory tracking within a known vessel is easy — the logistic nails it (R² ≈ 0.90+). (B) Cold-start prediction of a never-seen tote from its inputs is genuinely hard (LOTO R² ≈ 0.45), because each tote's feeding history is its own latent variable that the four logged inputs only partly capture. Closing that gap is the chief opportunity in §12.

09The pH Sub-Model

pH is the dissociation read-out of the organic acids produced. For weak organic acids (acetic/lactic/tartaric, pKₐ ≈ 3–4.8) the Henderson–Hasselbalch relation predicts a logarithmic dependence on acid concentration:

$$\text{pH}=\text{p}K_a+\log_{10}\!\frac{[\text{A}^-]}{[\text{HA}]}\;\approx\;a+b\,\log_{10}(A)$$

Fitting both the log form and a linear form to the data:

pH modelEquationRMSEMAE
Log (Henderson–Hasselbalch)pH = 4.523 − 0.443·log₁₀(A)0.1980.2490.201
LinearpH = 4.340 − 0.0201·A0.2910.2340.192

pH vs total acidity — measured points (interactive)

Why pH is only weakly predictable here
The broth is a buffered system — organic-acid/conjugate-base buffering plus pomace solids flatten the pH response, so TA can climb from 10 to 30 g/L while pH barely moves (≈3.6–4.1). pH is therefore a poor stand-alone progress indicator; TA is the reliable KPI — which is precisely why the project builds a soft sensor to estimate TA rather than relying on the cheap pH probe alone. The negative slope is real and directionally correct, but ±0.23 pH unit scatter limits pH to a coarse cross-check.

10Inference Pipeline & Pseudocode

End-to-end, the deployable predictor chains cleaning → feature build → dual estimator → pH sub-model → horizon forecast.

flowchart LR A["Raw tote sheet"]:::s --> B["Algorithm 1
normalize"]:::p B --> C["Feature vector
day·etoh·temp·aerₒᵣ𝒹·vol"]:::p C --> D{"Known
vessel?"}:::d D -->|yes| E["Per-tote logistic
A(t)=Aₘₐₓ/(1+e^−k(t−t₀))"]:::m D -->|no| F["Pooled linear
 = βᵀx"]:::m E --> G["Acidity forecast"]:::o F --> G G --> H["pH sub-model
pH = 4.34 − 0.0201·A"]:::o G --> I["Harvest-day estimate
solve A(t*) = A_target"]:::o classDef s fill:#e8f0f7,stroke:#2f5d8a,color:#1d3b58; classDef p fill:#fbf8f1,stroke:#b4862f,color:#5a4413; classDef d fill:#f4ecd6,stroke:#b4862f,color:#5a4413,font-weight:bold; classDef m fill:#eef5ef,stroke:#3f7d5a,color:#21402f,font-weight:bold; classDef o fill:#fbf1e9,stroke:#b0472f,color:#5a2417;
ALGORITHM 2 — predict_and_forecast(tote_history, controls, A_target) # Returns acidity trajectory, pH, and estimated harvest day df normalize(tote_history) # Algorithm 1 x [day, etoh, temp, aer_ord, vol] # current feature vector if len(df) 6: # enough history → mechanistic Amax,k,t0 fit_logistic(df.day, df.acidity, bounds=[0.95·max, 2.5·max]) A_hat(t) Amax / (1 + exp(−k·(t − t0))) else: # cold start → pooled regression A_hat β0 + β·x # β from §7.2 pH_hat 4.3400.0201·A_hat # §9 sub-model # invert logistic for the day acidity crosses the target if mechanistic: t_star t0 + (1/k)·ln( A_target / (Amax − A_target) ) else: t_star (A_target − β0 − β₋day·x₋day) / β_day return {trajectory: A_hat, pH: pH_hat, harvest_day: t_star}

The endpoint-day inversion is the model's most useful output: given the TA target \(A^\star\) (the HACCP completion spec, TA ≥ 3% ≈ 30 g/L), solve the logistic for the day the batch reaches it — turning a 45-day wait into a forecast:

$$t^\star=t_0+\frac{1}{k}\,\ln\!\frac{A^\star}{A_{\max}-A^\star}$$

11Process Control Levers

Translating the fitted model into operating guidance — what each input does and how strongly the data support it.

LeverEffect on acidificationEvidence strengthOperating note
Aeration ↑Raises rate ceiling; MED ≈ +35% mean TAStrong (β₄>0, stratified means)O₂-transfer limited — the MPC's first lever; step to MED for the finishing phase
Carbon / substrate feedSustains carbon so TA keeps climbingStrong (fed-system signature)Keep substrate in surplus; don't let the culture starve
Temperature+0.064 g/L·day per °C, optimum ~28–31 °CModerate (r=0.39, p=0.13)Stay mesophilic; >35 °C risks culture stress
VolumeMild dilution / headspace effectWeak (β₅ small)Secondary; affects O₂ surface ratio
TimeDominant — logistic accumulationVery strong (r=0.68)Use \(t^\star\) inversion to forecast the endpoint day

12Limitations & Next Steps

Limitations

  • Feeding history is unlogged. Ethanol/sugar additions are the hidden driver behind cold-start error; only the post-addition concentration is recorded, not the dose.
  • Aeration is confounded with time — MED is engaged late, so its standalone effect is partly absorbed by the day term.
  • Temperature range is narrow (mostly 28–31 °C), so the Arrhenius coefficient is under-identified.
  • pH is buffered and only weakly predictable (R² ≈ 0.29); it should not be used as a primary endpoint.
  • Brix is noisy — it tracks dissolved solids/feeding rather than a clean reaction coordinate, so it was not modeled as a target.

Next steps

  • Log feed events (volume + ethanol dose + timestamp) to convert cold-start prediction from R²≈0.45 toward the within-tote R²≈0.90.
  • Fit the full ODE (§6.1) with a dissolved-O₂ probe to identify \(k_La\) per aeration class explicitly.
  • Hierarchical / mixed-effects model: shared population kinetics + per-tote random effects on \(A_{\max},k,t_0\) for principled cold-start priors.
  • Online re-fit: update logistic parameters as each new daily measurement arrives (recursive least squares) for live harvest-day estimates.
  • Incorporate the LPM bioreactor/drum tabs to calibrate the aeration→\(\phi_{\text{aer}}\) map against true airflow.

13How This Maps to Ai26.10

This analysis is the historical-data leg of the project. Each result below feeds a specific proposal component and, in several cases, independently corroborates a proposal claim using CDI's own numbers.

This documentAi26.10 componentWhat it establishes
Logistic TA law \(A(t)=A_{\max}/(1+e^{-k(t-t_0)})\), mean fit R²=0.90Hybrid digital twin — acidification-kinetics mechanistic coreThe mechanistic backbone the NN residual-learning layer corrects against
Linear / GBM TA estimators from inputsNN soft sensor (Milestone 5, target R²≥0.9)Within-batch fit already meets R²≥0.9 on legacy low-frequency data — strong feasibility signal
Oxygen-transfer-limited finding; MED → +35% TAMPC optimising aeration/mixing/temperatureConfirms aeration is the highest-value control lever, grounding the \(k_La\)/OUR twin terms
Endpoint-day inversion \(t^\star=t_0+\tfrac1k\ln\frac{A^\star}{A_{\max}-A^\star}\)ETA-to-target dashboard; cycle-time KPIThe mechanism behind the 45 → 10–15 day claim, expressed per-batch
Per-batch rate spread (k ≈ 0.03 → 0.24/day)RSM optimal-window targetingQuantifies the gap between slow and fast batches that closed-loop control closes
Cold-start LOTO R²≈0.45 vs within-batch ≈0.90; feed events unloggedPilot Phase 2 / Milestone 3 rationaleIndependently justifies why high-frequency instrumented pilot trials are necessary
pH weakly predictive (R²≈0.29), bufferedSensor-fusion soft sensor designShows why TA can't be read off a cheap pH probe — the soft sensor earns its keep
Autoencoder-ready residual structure (off-trend points)Anomaly detection (Milestone 5)Trend model provides the baseline against which drift/contamination is flagged
Bottom line for reviewers
The historical dataset does more than establish a baseline — it de-risks the central technical bet. A simple kinetic model already clears the project's R²≥0.9 soft-sensor bar within a batch, the dominant control lever (aeration) is identified and physically explained, and the one thing the legacy data can't do (cold-start a new batch) is precisely what the funded pilot instrumentation is designed to fix. The model's structure maps one-to-one onto the digital twin, soft sensor, MPC, and anomaly-detection deliverables.