How to Design a DOE for Bioprocess Optimization (Step-by-Step)

April 2026 18 min read Bioprocess Engineering

Key Takeaways

Contents

  1. What Is Design of Experiments (DOE) in Bioprocessing?
  2. DOE vs. OFAT: Why Traditional Optimization Fails
  3. Stage 1: Screening Designs — Identify Critical Factors
  4. Stage 2: Optimization Designs — Build the Response Surface
  5. Definitive Screening Designs: The Modern Shortcut
  6. Worked Example: DOE for E. coli Fed-Batch Optimization
  7. Stage 3: Model Validation and Confirmation Runs
  8. DOE Software and Tools for Bioprocess Engineers
  9. Frequently Asked Questions

What Is Design of Experiments (DOE) in Bioprocessing?

Design of experiments (DOE) is a systematic statistical approach to planning experiments that simultaneously varies multiple process parameters to identify which factors—and which factor interactions—most significantly affect a bioprocess response such as titer, yield, or product quality. Unlike traditional one-factor-at-a-time optimization, DOE explores the entire design space in fewer experiments.

In bioprocess development, DOE is used across upstream and downstream operations: optimizing media composition, fermentation conditions (temperature, pH, dissolved oxygen, agitation), induction parameters, and purification steps. The ICH Q8(R2) guideline explicitly recommends DOE as part of Quality by Design (QbD) for establishing design spaces in biopharmaceutical manufacturing.

A typical DOE workflow for bioprocess optimization follows three stages: screening to identify critical process parameters (CPPs) from a large initial set, optimization using response surface methodology (RSM) to find the optimal operating conditions, and validation through confirmation runs to verify the model predictions hold.

DOE Workflow for Bioprocess Optimization STAGE 1 Screening 6–15 factors Plackett-Burman Fractional Factorial DSD (combined) 12–20 runs 3–5 significant STAGE 2 Optimization 2–5 key factors CCD (5 levels) Box-Behnken (3 levels) I-Optimal 15–32 runs RSM model STAGE 3 Validation Confirm optimum 3–5 confirmation runs Prediction interval Model diagnostics 3–5 runs ROBUST Design Space Edge-of-failure testing DSD Shortcut: Skip Stage 1 → Direct to validation Only 2k+1 runs for k factors Common Bioprocess DOE Factors Upstream Temperature (30–37 °C) pH (6.5–7.5) DO (20–60%) Agitation (100–400 rpm) Seeding density Media Glucose (5–40 g/L) Glutamine (2–8 mM) Amino acid conc. Feed timing/rate IPTG/inducer conc. Downstream Flow rate (50–300 cm/h) Buffer pH Conductivity Resin loading Column bed height Responses Titer (g/L) Yield (%) Viability (%) Purity (%) CQA profile
Figure 1. DOE workflow for bioprocess optimization showing the three-stage progression from screening through optimization to validation, with common bioprocess factors at each stage. The DSD shortcut (dashed purple line) allows combining screening and optimization in a single step.
Diagram showing the DOE workflow: Stage 1 Screening with 6-15 factors using Plackett-Burman or fractional factorial designs in 12-20 runs, Stage 2 Optimization with 2-5 key factors using CCD or Box-Behnken in 15-32 runs, Stage 3 Validation with 3-5 confirmation runs, and a final Robustness stage for design space establishment. A DSD shortcut arrow shows how Definitive Screening Designs can combine stages 1 and 2.

DOE vs. OFAT: Why Traditional Optimization Fails

One-factor-at-a-time (OFAT) optimization—where you fix all variables except one and vary it systematically—misses factor interactions and typically finds a local optimum rather than the global optimum. DOE overcomes both limitations by varying multiple factors simultaneously according to a mathematical design matrix.

The practical difference is significant. Published bioprocess studies consistently show that DOE achieves 1.3–2× higher yields compared to OFAT approaches. In a 4-factor optimization, OFAT explores only the factor axes while DOE explores the full design space, including corners and center points where interactions occur.

Table 1. DOE vs. OFAT comparison for bioprocess optimization
Criterion OFAT DOE
Factor interactions Not detected Fully estimated
Curvature (quadratic effects) Not detected Estimated in RSM designs
Experiments for 4 factors 20–25 (sequential) 27–30 (CCD with replicates)
Design space coverage < 10% of space > 80% of space
Optimum quality Local (axis-bound) Global (multivariate)
Statistical model None Polynomial regression (R², ANOVA)
Reproducibility evidence Anecdotal Prediction intervals, confirmation runs
Regulatory acceptance (QbD) Not recommended ICH Q8(R2) recommended
Table 1. Comparison of OFAT and DOE approaches for bioprocess optimization. DOE provides statistically rigorous factor interaction estimation and is recommended by ICH Q8(R2) for Quality by Design.

Consider a temperature × pH interaction in CHO cell culture: at pH 7.0, lowering temperature from 37 °C to 33 °C increases titer by 40%. But at pH 6.8, the same temperature shift increases titer by only 10%. OFAT at a single pH would report a misleading temperature effect. DOE detects this interaction and maps the true response surface.

Stage 1: Screening Designs — Identify Critical Factors

Screening is the first stage of DOE where you test 6–15 candidate factors in a small number of runs to identify the 3–5 factors that most significantly affect your response. The goal is factor elimination, not optimization—you only need to detect main effects.

Plackett-Burman Design

Plackett-Burman (PB) designs are Resolution III screening designs that test each factor at two levels (high and low) in a number of runs equal to a multiple of 4. For 7 factors, a PB design requires only 12 runs—compared to 128 runs for a full factorial at 2 levels.

PB designs assume all factor interactions are negligible, which is a reasonable starting assumption when you have many factors. The tradeoff is that any interaction effects get aliased with main effects, so significant factors should be validated in a follow-up design.

Fractional Factorial Design

Fractional factorial designs are more flexible than PB and come in different resolutions. A Resolution IV design (e.g., 27-3) confounds two-factor interactions with each other but not with main effects, giving cleaner main effect estimates at the cost of more runs.

Table 2. Screening design selection guide for bioprocess DOE
Design Factors Levels Runs Resolution Best For
Plackett-Burman 6–15 2 12–20 III Maximum factor screening, minimal runs
2k-p Fractional Factorial 4–8 2 8–32 III–V When some interactions are expected
Definitive Screening (DSD) 5–16 3 2k+1 Combined screening + optimization
Full Factorial 2k 2–4 2 4–16 Full Small factor sets, complete interaction info
Table 2. Guide for selecting screening designs based on the number of factors and desired resolution. DSD designs provide three-level estimation in fewer runs than the traditional two-stage approach.

Fed-Batch Feed Strategy Calculator

Once you’ve identified optimal feeding parameters with DOE, use our calculator to generate time-resolved feed rate schedules for exponential, linear, or constant feeding.

Open Calculator →

Stage 2: Optimization Designs — Build the Response Surface

Response surface methodology (RSM) designs test 2–5 significant factors at 3–5 levels to build a second-order polynomial model that maps the relationship between factors and responses. The model includes linear terms, quadratic terms, and two-factor interaction terms, enabling prediction of the optimum and construction of a design space.

Central Composite Design (CCD)

Central Composite Design is the most widely used RSM design in bioprocessing. It combines a full factorial (or fractional factorial) with axial (star) points and center-point replicates. For 3 factors, a face-centered CCD requires 20 runs: 8 factorial points + 6 axial points + 6 center points.

CCD tests each factor at 5 levels (−α, −1, 0, +1, +α), where α = 2k/4 for rotatability. Face-centered CCD (α = 1) avoids extreme levels but sacrifices rotatability. Choose face-centered CCD when factor ranges have hard constraints (e.g., pH cannot go below 6.0).

Box-Behnken Design (BBD)

Box-Behnken designs test factors at only 3 levels and never run all factors at their extreme values simultaneously, making them safer for bioprocess applications where extreme combinations might kill cells or damage equipment. For 3 factors, BBD requires 15 runs (12 edge midpoints + 3 center points).

The tradeoff: BBD does not include factorial corner points, so it may not predict as accurately near the edges of the design space. Use BBD when factor range extremes are costly or risky to run.

The second-order polynomial model for 3 factors takes the form:

RSM Model Equation

Y = β0 + β1X1 + β2X2 + β3X3 + β12X1X2 + β13X1X3 + β23X2X3 + β11X1² + β22X2² + β33X3²

Where Y is the response (e.g., titer in g/L), X1, X2, X3 are coded factor levels (−1 to +1), β0 is the intercept, βi are linear coefficients, βij are interaction coefficients, and βii are quadratic coefficients.

Figure 2. Response surface contour plot showing the interaction between temperature and pH on E. coli recombinant protein titer (g/L). The contour lines represent predicted titer values from a CCD model. The optimum region (dark teal) is centered around 32–34 °C and pH 6.9–7.1. Data represents typical DOE results from fed-batch E. coli expression optimization.

Definitive Screening Designs: The Modern Shortcut

Definitive Screening Designs (DSDs), introduced by Jones and Nachtsheim in 2011, represent the most significant advance in DOE methodology for bioprocess applications in the past decade. DSDs combine the screening and optimization stages into a single design that requires only 2k+1 runs for k factors—for example, 13 runs for 6 factors versus 44+ runs for a traditional PB screening followed by CCD optimization.

DSDs achieve this efficiency through three key properties:

For bioprocess optimization with 6–8 candidate factors, DSDs have been adopted by major biopharmaceutical companies as of 2025 for mAb process characterization, vaccine development, and gene therapy manufacturing. The reduction from ~45 experiments (PB + CCD) to ~15 experiments translates directly to weeks of saved bioreactor time and tens of thousands of dollars in media costs.

The limitation: DSDs require at least 6 factors to fully estimate a second-order model. With fewer factors, use a traditional CCD or BBD.

Worked Example: DOE for E. coli Fed-Batch Optimization

This worked example walks through a complete DOE optimization of recombinant protein expression in E. coli BL21(DE3) fed-batch culture, targeting maximum soluble titer (g/L).

Worked Example — Stage 1: Plackett-Burman Screening

Objective: Identify the most significant factors affecting soluble titer from an initial set of 7 factors.

Factors (7):

  1. Induction temperature: 25 °C (low) / 37 °C (high)
  2. IPTG concentration: 0.1 mM (low) / 1.0 mM (high)
  3. Induction OD600: 0.6 (low) / 2.0 (high)
  4. Post-induction pH: 6.5 (low) / 7.5 (high)
  5. Dissolved oxygen: 20% (low) / 60% (high)
  6. Glucose feed rate: 2 g/L/h (low) / 8 g/L/h (high)
  7. Induction duration: 4 h (low) / 16 h (high)

Design: Plackett-Burman, 12 runs + 3 center points = 15 experiments

Response: Soluble protein titer (g/L)

Results (ANOVA, p < 0.05):
• Induction temperature: p = 0.001 *** (most significant)
• IPTG concentration: p = 0.008 **
• Post-induction pH: p = 0.023 *
• Induction OD600: p = 0.089 (borderline)
• DO, feed rate, duration: p > 0.10 (not significant)

Conclusion: Three factors (temperature, IPTG, pH) carry forward to optimization. OD600 is borderline—fix at 1.0 (center) and monitor.

Worked Example — Stage 2: CCD Optimization

Factors (3): Temperature (25–37 °C), IPTG (0.1–1.0 mM), pH (6.5–7.5)

Design: Face-centered CCD (α = 1), 20 runs (8 factorial + 6 axial + 6 center points)

Coded levels:
• Temperature: −1 = 25 °C, 0 = 31 °C, +1 = 37 °C
• IPTG: −1 = 0.1 mM, 0 = 0.55 mM, +1 = 1.0 mM
• pH: −1 = 6.5, 0 = 7.0, +1 = 7.5

Fitted model (R² = 0.94, Adj R² = 0.91, Pred R² = 0.84):
Titer = 3.42 − 0.87·T + 0.31·IPTG + 0.22·pH
        − 0.45·T·IPTG + 0.18·T·pH
        − 0.68·T² − 0.29·IPTG² − 0.14·pH²

Predicted optimum: Temperature = 28.5 °C, IPTG = 0.4 mM, pH = 7.05

Predicted titer: 4.12 g/L (95% PI: 3.6–4.6 g/L)

OFAT baseline titer: 2.1 g/L → DOE improvement: 1.96×

E. coli Expression Optimizer

Use our interactive tool to explore strain selection, promoter systems, IPTG induction parameters, and soluble vs. inclusion body expression strategies.

Optimize Expression →
Figure 3. Main effects and interaction coefficients from the CCD model for E. coli fed-batch titer optimization. Negative coefficients for temperature (T) indicate that lower temperatures favor soluble expression. The T×IPTG interaction is the strongest interaction term, confirming that optimal IPTG concentration depends on induction temperature.

Stage 3: Model Validation and Confirmation Runs

A DOE model is only as reliable as its validation. Run 3–5 independent experiments at the predicted optimum conditions and compare actual vs. predicted responses. The actual values must fall within the 95% prediction interval for the model to be considered validated.

Model Diagnostic Checklist

Before running confirmation experiments, verify these model diagnostics:

Worked Example — Stage 3: Confirmation Runs

Predicted optimum: T = 28.5 °C, IPTG = 0.4 mM, pH = 7.05

Predicted titer: 4.12 g/L (95% PI: 3.6–4.6 g/L)

Confirmation run results (n = 5):
Run 1: 4.08 g/L  ✓
Run 2: 3.91 g/L  ✓
Run 3: 4.22 g/L  ✓
Run 4: 3.85 g/L  ✓
Run 5: 4.15 g/L  ✓

Mean: 4.04 ± 0.15 g/L
All values within 95% PI (3.6–4.6) → Model validated ✓

From Optimum to Design Space

For regulatory submissions under QbD (ICH Q8), the validated model defines a design space—the multidimensional region of factor combinations where product quality is assured. The design space is typically narrower than the experimental range and is established by setting acceptance criteria on all critical quality attributes (CQAs) simultaneously.

Edge-of-failure experiments at the design space boundaries provide evidence that the process is robust. Regulatory agencies expect evidence that the process performs acceptably throughout the entire design space, not just at the optimum point.

Model Validation Decision Flow CHECK R² > 0.85? Adj–Pred < 0.20? Adeq Prec > 4? Yes VERIFY Normal residuals? No patterns? Lack of fit p>0.05? Pass CONFIRM 3–5 runs at optimum Within 95% PI? Actual ≈ Predicted? Yes MODEL VALIDATED Define design space No Remove non-significant terms, re-fit model Fail Transform response or augment design No Check for lurking variables, re-design Failure at any stage requires iterating back — DOE is rarely one-shot. Budget 2–3 iterations.
Figure 4. Model validation decision flow for DOE in bioprocessing. Each stage has pass/fail criteria with corrective actions for failures. Confirmation runs at the predicted optimum are the final validation step before defining the design space.
Decision flow diagram for DOE model validation. Step 1: Check model diagnostics (R-squared greater than 0.85, adjusted minus predicted R-squared less than 0.20, adequate precision greater than 4). Step 2: Verify residual plots (normal residuals, no patterns, non-significant lack of fit). Step 3: Run 3-5 confirmation experiments at the predicted optimum and check if actual values fall within the 95% prediction interval. If all pass, the model is validated and the design space can be defined. Failure at each step has corrective actions: remove non-significant terms, transform response or augment design, or check for lurking variables.

DOE Software and Tools for Bioprocess Engineers

Selecting the right DOE software depends on your organization’s needs and budget. All major platforms support the design types discussed in this guide (PB, fractional factorial, CCD, BBD, DSD). The differences lie in ease of use, visualization capabilities, and bioprocess-specific features.

Table 3. DOE software comparison for bioprocess applications (as of 2026)
Software License Type Strengths Bioprocess Features
JMP (SAS) Commercial (~$1,800/yr) Visual design builder, DSD support, contour profiler Bioprocess DOE tutorials, pharma templates
Design-Expert (Stat-Ease) Commercial (~$1,500/yr) RSM specialist, 3D surface plots, ANOVA diagnostics Mixture designs for media optimization
Minitab Commercial (~$1,600/yr) Wide statistical toolset, SPC integration QC/validation workflows, control charts
MODDE (Sartorius) Commercial Built for bioprocess, MVDA integration Ambr/bioreactor data import, design space explorer
Python (pyDOE2, statsmodels) Open source Flexible, scriptable, CI/CD integration Custom models, Bayesian optimization integration
R (rsm, FrF2, AlgDesign) Open source Publication-quality plots, extensive packages Custom optimal designs, mixed models
Table 3. Comparison of DOE software platforms commonly used in bioprocess development. Commercial tools offer better visualization and support, while open-source options provide flexibility and scriptability.

An emerging trend as of 2025–2026 is the integration of Bayesian optimization with DOE. Tools like BayBE (Merck KGaA), Obsidian (Merck & Co.), and ProcessOptimizer (Novo Nordisk) use Gaussian process models to adaptively select the next experiment based on all previous results, reducing total experiments by approximately 50–69% compared to classical RSM for equivalent optimization performance.

Media Formulation & Cost Estimator

After optimizing media composition with DOE, estimate costs across 14 basal media types with supplement options and batch/fed-batch/perfusion mode comparison.

Estimate Costs →

Frequently Asked Questions

How many experiments do I need for DOE optimization in bioprocessing?

The number of experiments depends on the design type and number of factors. A Plackett-Burman screening design for 7 factors requires only 12 runs. A Central Composite Design (CCD) for 3 factors needs 20 runs (8 factorial + 6 axial + 6 center points). Definitive Screening Designs need only 2k+1 runs (e.g., 13 runs for 6 factors) and can estimate main effects, interactions, and quadratic terms in a single step.

What is the difference between screening and optimization DOE designs?

Screening designs (Plackett-Burman, fractional factorial) test many factors (6–15) in few runs to identify which 3–5 factors significantly affect the response. They detect main effects but not interactions or curvature. Optimization designs (CCD, Box-Behnken) test fewer factors (2–5) at more levels to build a response surface model with quadratic terms, enabling precise identification of optimum conditions.

How does DOE compare to one-factor-at-a-time (OFAT) in bioprocessing?

DOE typically achieves 1.3–2× higher yield improvements compared to OFAT because it detects factor interactions that OFAT misses entirely. OFAT requires more total experiments while exploring less of the design space. For a 4-factor optimization, OFAT might need 20+ sequential experiments and still miss the true optimum, whereas a CCD covers the entire design space in 30 runs including replicates.

What software is best for DOE in bioprocessing?

JMP (SAS) and Design-Expert (Stat-Ease) are the most widely used DOE software in bioprocessing due to their visual design builders and response surface tools. Minitab is popular in QC environments. For open-source options, Python’s pyDOE2 and R’s rsm and FrF2 packages provide full DOE capability. MODDE (Sartorius) is specifically designed for bioprocess applications with built-in templates.

What is a Definitive Screening Design and when should I use it?

A Definitive Screening Design (DSD) is a modern DOE approach introduced by Jones and Nachtsheim (2011) that estimates main effects, two-factor interactions, and quadratic effects in a single design requiring only 2k+1 runs for k factors. Use DSDs when you have 5–16 factors and want to combine screening and optimization into one step, reducing total experiments by over 50% compared to the traditional two-stage approach.

How do you validate a DOE model for bioprocess optimization?

Validate a DOE model by running 3–5 confirmation runs at the predicted optimum conditions and comparing actual vs. predicted responses. The actual values should fall within the prediction interval (typically 95% PI). Also check model diagnostics: R² should exceed 0.85, predicted R² should be within 0.20 of adjusted R², adequate precision should exceed 4, and residuals should show no patterns in normal probability and residuals vs. predicted plots.

Related Tools

References

  1. Mandenius, C.F. & Brundin, A. (2008). Bioprocess optimization using design-of-experiments methodology. Biotechnology Progress, 24(6), 1191–1203. DOI: 10.1002/btpr.67
  2. Jones, B. & Nachtsheim, C.J. (2011). A class of three-level designs for definitive screening in the presence of second-order effects. Journal of Quality Technology, 43(1), 1–15.
  3. Politis, S.N. et al. (2021). Design of experiments and design space approaches in pharmaceutical bioprocess optimization. European Journal of Pharmaceutics and Biopharmaceutics, 166, 208–221. DOI: 10.1016/j.ejpb.2021.06.004
  4. Papathanasiou, M.M. & Experiment review (2023). A review of algorithmic approaches for cell culture media optimization. Frontiers in Bioengineering and Biotechnology, 11, 1195294. DOI: 10.3389/fbioe.2023.1195294
  5. Gisperg, G.F. et al. (2025). Bayesian Optimization in Bioprocess Engineering — Where Do We Stand Today? Biotechnology and Bioengineering. DOI: 10.1002/bit.28960
Share

📚 Resources & Further Reading

Stay updated on bioprocess tools

Get notified when we publish new articles, calculators, and reference guides for fermentation & cell culture engineers.

Free forever · No spam · Unsubscribe anytime