Fusion Tag Selection for Recombinant Protein Expression in E. coli: His, MBP, SUMO, GST, and NusA Compared

June 2026 14 min read Bioprocess Engineering

Key Takeaways

Contents

  1. Why Fusion Tags Matter
  2. Six Fusion Tags Compared
  3. Decision Framework: Choosing the Right Tag
  4. Protease Cleavage: Removing the Tag
  5. Tandem-Tag Strategies
  6. Worked Example: Tag Selection for a 15 kDa Cytokine
  7. Frequently Asked Questions

Why Fusion Tags Matter for E. coli Expression

Roughly 40-50% of recombinant proteins expressed in E. coli form insoluble inclusion bodies, making fusion tag selection one of the most consequential decisions in early-stage protein production. The right tag can shift a protein from entirely insoluble to >90% soluble in the cytoplasmic fraction, eliminating the need for denaturation-refolding workflows that typically recover only 10-30% of the starting material.

Fusion tags serve two distinct purposes that are often conflated. Purification tags like His6 provide affinity handles for single-step capture on IMAC resin but contribute negligibly to solubility. Solubility-enhancing tags like MBP, SUMO, NusA, GST, and thioredoxin (TRX) are large fusion partners that actively promote folding of the target protein during translation. Most practical constructs combine both functions, pairing a His6 purification handle with a solubility-enhancing partner and a protease cleavage site for downstream tag removal.

The challenge is that no single fusion tag works universally. Marblestone et al. (2006) tested six tags on three model proteins and found that SUMO and NusA were the top performers, but each protein responded differently. Costa et al. (2014) extended this comparison to eight tags across six difficult-to-express targets and reached the same conclusion: tag selection is empirical. The practical response is to screen 3-4 tags in parallel using standardized expression vectors, evaluate total expression, soluble fraction, and purification yield, then commit to the best performer for scale-up.

Six Fusion Tags Compared

The six most commonly used fusion tags for E. coli expression differ in size, solubility enhancement, purification mechanism, and downstream processing. The table below summarizes their key properties side by side.

Table 1. Properties of six common fusion tags for recombinant protein expression in E. coli.
Fusion tag comparison: size, solubility enhancement, purification, and key considerations
Tag Size (kDa) Solubility Enhancement Purification Method Typical Soluble Fraction (%) Key Limitation
His6 0.8 None Ni/Co-IMAC 20-30 No solubility benefit; co-purifies His-rich host proteins
GST 26 Moderate Glutathione resin 40-50 Forms homodimers; may alter target oligomeric state
TRX 12 Moderate Requires His6 addition 50-60 Cytoplasmic redox sensitivity; limited to reducing environments
SUMO 12 Good Requires His6 addition 50-70 SUMO protease not commercially universal; yeast Ulp1 inactive on some mutants
MBP 43 Excellent Amylose resin or His6-IMAC 70-80 Large tag reduces target:total protein ratio
NusA 55 Excellent Requires His6 addition 70-80 Largest tag; highest metabolic burden; may reduce expression level

His6: The Universal Purification Handle

His6 tag is a string of six histidine residues (0.8 kDa) that binds Ni-NTA or Co-TALON IMAC resin with micromolar affinity. It enables one-step capture from crude lysate with 80-95% purity in a single pass. Because of its small size, His6 rarely interferes with protein folding or function and often does not need to be removed. However, His6 provides no solubility enhancement. It is a purification tag, not a solubility tag. When a target protein forms inclusion bodies with His6 alone, adding a solubility-enhancing fusion partner is the standard next step.

MBP: The Solubility Gold Standard

Maltose-binding protein (MBP) is a 43 kDa periplasmic E. coli protein and the most reliable solubility-enhancing fusion tag available. Kapust and Waugh (1999) demonstrated that MBP was "uncommonly effective" at promoting the solubility of fused polypeptides, producing soluble protein in cases where GST, TRX, and His6 all failed. MBP acts as an intramolecular chaperone: it folds rapidly during translation, creating a soluble folding nucleus that keeps the downstream target protein in a folding-competent state rather than aggregating. N-terminal placement is critical. MBP at the N-terminus is roughly twice as effective as C-terminal MBP, because the tag must fold before the target to exert its chaperone effect.

SUMO: Native N-Terminus After Cleavage

Small ubiquitin-like modifier (SUMO) is a 12 kDa globular protein that enhances both expression level and solubility when fused to the N-terminus of a target. SUMO's unique advantage is its protease: SUMO protease (Ulp1 in yeast, SENP in mammals) recognizes the three-dimensional fold of SUMO rather than a linear peptide sequence, giving it exceptional specificity with virtually no off-target cleavage. After cleavage, the target protein retains its native N-terminal methionine with no residual linker amino acids. This matters for proteins where the N-terminus affects activity, stability, or immunogenicity.

GST: Affinity Purification with Moderate Solubility

Glutathione S-transferase (GST) is a 26 kDa enzyme that binds glutathione-Sepharose resin under mild conditions and elutes with reduced glutathione. GST provides moderate solubility enhancement (40-50% soluble fraction for typical targets) and has the advantage of a built-in affinity purification handle. The main limitation is that GST naturally forms homodimers, which can force dimerization of the fused target protein. This is acceptable for pull-down assays and interaction studies but problematic when the target must be monomeric.

NusA: Maximum Solubility at Maximum Size

N-utilization substance A (NusA) is a 55 kDa E. coli transcription factor and one of the most effective solubility-enhancing fusion tags, matching MBP in head-to-head comparisons. De Marco et al. (2004) showed that NusA fusion increased both solubility and stability of recombinant proteins, yielding 13-20 mg of fusion protein per litre of culture. NusA is particularly effective for toxic proteins that kill cells when expressed at high levels, because its large size dilutes the toxic target within the fusion. The trade-off is metabolic burden: at 55 kDa, NusA is the largest common fusion tag, and the target protein represents a smaller fraction of the total expressed protein mass.

TRX: Small Tag for Cytoplasmic Targets

Thioredoxin (TRX) is a 12 kDa E. coli oxidoreductase that enhances solubility through its highly soluble, compact fold. TRX is most effective for small target proteins (<30 kDa) that lack disulfide bonds and fold in the reducing cytoplasmic environment. For targets requiring disulfide bonds, TRX fusion in E. coli trxB/gor mutant strains (Origami, SHuffle) can promote oxidative folding in the cytoplasm. TRX lacks its own affinity handle, so constructs typically include an N-terminal His6 for IMAC purification.

Figure 1. Comparative performance of six fusion tags across three metrics: total expression level, soluble fraction, and purified yield, normalized to His6-only construct (set to 100%). Data represent typical ranges from published multi-tag screening studies.

Decision Framework: Choosing the Right Fusion Tag

Fusion tag selection follows a structured decision tree that begins with the target protein's properties and ends with 2-3 candidate tags for parallel screening. The primary branch points are whether the protein is likely soluble without a tag, whether a native N-terminus is required after tag removal, and how large the tag can be relative to the target.

Fusion Tag Selection Decision Tree Flowchart guiding users through fusion tag selection based on protein solubility needs, size constraints, and N-terminus requirements Target protein expressed in E. coli Is simple affinity purification sufficient? (Protein already soluble) YES His6 0.8 kDa NO Is a native N-terminus required after cleavage? (No residual linker amino acids) YES SUMO 12 kDa NO Is the target <15 kDa or highly aggregation-prone? (Needs maximum solubility enhancement) YES MBP 43 kDa, best solubility or NusA (55 kDa) NO Need built-in affinity handle (no His tag)? (Pull-down assays, interaction studies) YES GST 26 kDa, dimerizes NO Prefer small tag (target is cytoplasmic)? (Reducing environment, no disulfides needed) YES TRX 12 kDa, compact DEFAULT Screen His6-MBP + His6-SUMO + one additional tag in parallel When in doubt, screen 3 tags in parallel using standardized vectors (pET, pOPIN, pNX series)
Figure 2. Fusion tag selection decision tree. Start with the target protein's properties and follow the branches. When no single tag is obvious, screen His6-MBP and His6-SUMO in parallel with one additional candidate.
Decision tree for choosing a fusion tag. Soluble targets use His6 only. Targets requiring a native N-terminus use SUMO. Small or aggregation-prone targets use MBP or NusA. Pull-down targets use GST. Small cytoplasmic targets use TRX. Default strategy screens His6-MBP plus His6-SUMO in parallel.

Several practical rules sharpen the decision beyond the tree above:

Protease Cleavage: Removing the Fusion Tag

After purification, most applications require removal of the fusion tag to obtain the native target protein. Five site-specific proteases dominate this step, each with distinct trade-offs between specificity, speed, cost, and N-terminal scar.

Table 2. Comparison of five proteases for fusion tag removal.
Protease comparison: recognition sequence, cleavage efficiency, specificity, and residual scar
Protease Recognition Sequence Cleavage Temp. Residual Scar Specificity Typical Usage
TEV ENLYFQ/S 4-30 °C Ser (or Gly) Very high His6, MBP, GST constructs
SUMO protease (Ulp1) SUMO 3D fold 4-30 °C None (native N-term) Exceptional SUMO constructs only
PreScission (3C) LEVLFQ/GP 4-5 °C Gly-Pro High GST constructs
Thrombin LVPR/GS 20-37 °C Gly-Ser Moderate GST, His6 constructs
Factor Xa IEGR/ 20-37 °C None Low GST constructs; declining use

TEV protease is the most widely used tag-removal enzyme because of its very high specificity. It cleaves the sequence ENLYFQ/S with minimal off-target activity, even during overnight incubation. The main drawback is slow kinetics: TEV requires 1:10 to 1:50 (w/w) protease:substrate ratios and 4-16 hours at 4-25 °C for complete cleavage. A single serine residue remains at the N-terminus of the target after cleavage.

SUMO protease recognizes the three-dimensional fold of the SUMO domain rather than a linear peptide sequence. This structural recognition makes it exceptionally specific, with no reported off-target cleavage in E. coli lysates. It is also substantially faster than TEV, achieving >95% cleavage at 1:100 enzyme:substrate ratios in 1-2 hours. The critical advantage is scarless cleavage: the target protein retains its native N-terminal methionine.

Thrombin and Factor Xa are serine proteases with broader substrate specificity. Both can cleave at secondary sites in the target protein, particularly at Arg-rich sequences. Their use is declining in favour of TEV and SUMO proteases, though thrombin remains common in legacy GST-fusion workflows.

Figure 3. Protease performance comparison. Cleavage completeness (% at 4 hours), specificity score (10 = no off-target sites), and cost index (relative $/mg protein processed). SUMO protease combines the highest specificity with efficient cleavage kinetics.

E. coli Expression Optimizer

Optimize your E. coli expression conditions: strain, temperature, inducer concentration, and media for maximum soluble protein yield.

Open Calculator

Tandem-Tag Strategies

Tandem-tag constructs combine a purification tag with a solubility-enhancing tag in a single fusion, providing both IMAC capture and folding assistance. The two most popular configurations are His6-MBP and His6-SUMO, both with a protease cleavage site between the combined tag and the target protein.

His6-MBP: The Industry Workhorse

The His6-MBP construct places a His6 tag at the N-terminus followed by MBP (43 kDa), a TEV protease site, and the target protein. First-pass purification uses Ni-IMAC to capture the His6-tagged fusion. After TEV cleavage, subtractive IMAC removes the His6-MBP fragment and uncleaved fusion, leaving pure target in the flow-through. This workflow typically achieves >95% purity in two chromatography steps. His6-MBP is the default choice when solubility is a concern and no specific constraint favours another tag.

His6-SUMO: Scarless Cleavage

The His6-SUMO construct works identically to His6-MBP in the purification workflow, but SUMO protease replaces TEV for tag removal. The advantage is a clean, native N-terminus on the target protein. The trade-off is that SUMO (12 kDa) provides less solubility enhancement than MBP (43 kDa), so His6-SUMO is best suited for targets that are moderately aggregation-prone rather than severely insoluble.

Table 3. Tandem-tag constructs: purification workflow comparison.
Tandem-tag workflows for His6-MBP and His6-SUMO constructs
Step His6-MBP-TEV-Target His6-SUMO-Target
1. Capture Ni-IMAC (bind, wash 20-40 mM imidazole, elute 250-300 mM) Ni-IMAC (same conditions)
2. Tag removal TEV protease, 1:20 w/w, 16 h at 4 °C SUMO protease, 1:100 w/w, 1-2 h at 25 °C
3. Subtraction Subtractive Ni-IMAC: target in flow-through Subtractive Ni-IMAC: target in flow-through
N-terminal scar Ser (from TEV site) None (native Met)
Typical yield 5-50 mg/L culture 5-30 mg/L culture

Protein MW & Extinction Coefficient Calculator

Calculate the molecular weight and molar extinction coefficient of your fusion protein construct from the amino acid sequence.

Open Calculator

Worked Example: Tag Selection for a 15 kDa Human Cytokine

Worked Example: Selecting a Fusion Tag for IL-33 (15.4 kDa)

Target: Human interleukin-33 (IL-33), 15.4 kDa, no disulfide bonds in the active form, requires native N-terminus (Ser112) for receptor binding. Previously expressed as His6-IL-33 in BL21(DE3): 100% inclusion bodies at 37 °C, 80% insoluble at 18 °C.

Step 1: Apply the decision tree.

Step 2: Design the parallel screen.

Step 3: Expression conditions.

Strain: BL21(DE3)
Media: TB + 50 μg/mL kanamycin
Growth: 37 °C to OD600 = 0.6-0.8
Induction: 0.1 mM IPTG, 18 °C, 16 h
Lysis: 50 mM Tris pH 8.0, 300 mM NaCl, 10 mM imidazole, 1 mg/mL lysozyme, sonication

Expected results:

His6-IL33:       Total = 80 mg/L, Soluble = 16 mg/L (20%)
His6-SUMO-IL33: Total = 60 mg/L, Soluble = 42 mg/L (70%)
His6-MBP-IL33:  Total = 45 mg/L, Soluble = 38 mg/L (84%)

Decision: His6-SUMO-IL33 gives the best balance of yield and native N-terminus. After Ni-IMAC capture, SUMO protease cleavage (1:100 w/w, 2 h, 25 °C), and subtractive IMAC, expected final yield is 25-30 mg pure native IL-33 per litre of culture.

Protein Concentration Calculator (BCA/Bradford/A280)

Quantify your purified fusion protein by BCA assay, Bradford assay, or UV absorbance at 280 nm with extinction coefficient correction.

Open Calculator

Frequently Asked Questions

Which fusion tag is best for insoluble proteins in E. coli?

MBP (maltose-binding protein) is the most consistently effective solubility-enhancing fusion tag for insoluble proteins in E. coli. In head-to-head comparisons across multiple target proteins, MBP produces soluble fusion protein in 70 to 80% of cases compared to 40 to 50% for GST and 50 to 60% for thioredoxin. NusA performs comparably to MBP but at the cost of a much larger tag (55 kDa vs 43 kDa), which increases metabolic burden and reduces target protein yield.

What is the difference between a His tag and a solubility tag?

A His tag (typically 6 histidines, 0.8 kDa) is a purification tag that binds nickel or cobalt IMAC resins but does not meaningfully improve protein solubility. Solubility tags like MBP (43 kDa), SUMO (12 kDa), and NusA (55 kDa) are large fusion partners that actively promote folding of the attached target protein and prevent aggregation into inclusion bodies. Many expression constructs combine both: a His tag for purification plus a solubility tag for folding, with a protease site between the tag and target for downstream removal.

How do you remove a fusion tag after purification?

Fusion tags are removed by site-specific proteases that cleave an engineered recognition sequence between the tag and target protein. TEV protease is the most widely used (recognition sequence ENLYFQ/S), offering high specificity with minimal off-target cleavage but relatively slow kinetics. SUMO protease (Ulp1 or SENP) recognizes the three-dimensional structure of SUMO rather than a linear sequence, giving it exceptional specificity and the unique advantage of generating a native N-terminus on the target protein. After cleavage, subtractive IMAC removes the His-tagged protease and cleaved tag, leaving pure target in the flow-through.

Can you use multiple fusion tags on the same protein?

Yes. Tandem-tag constructs are common and often more effective than single tags. The most popular combination is His6-MBP, which provides both IMAC purification (His tag) and solubility enhancement (MBP) in a single fusion. His6-SUMO is increasingly used because SUMO protease cleavage generates a native N-terminus, eliminating residual cleavage-site amino acids. Dual-tag strategies typically use a protease site (TEV, SUMO protease, or PreScission) between the combined tag and the target protein, followed by subtractive IMAC to separate the cleaved tag from the purified target.

Does the position of the fusion tag matter?

Tag position significantly affects both expression and solubility. N-terminal tags are more effective solubility enhancers because the tag folds first during translation, creating a soluble folding nucleus that promotes downstream folding of the target. MBP at the N-terminus produces soluble protein roughly twice as often as when placed at the C-terminus. C-terminal tags can be useful for proteins with essential N-terminal residues or signal peptides. For purification-only tags like His6, C-terminal placement avoids interference with N-terminal methionine processing but may be cleaved by carboxypeptidases in some expression hosts.

Related Tools

References

  1. Marblestone JG, Edavettal SC, Lim Y, Lim P, Zuo X, Butt TR. Comparison of SUMO fusion technology with traditional gene fusion systems: Enhanced expression and solubility with SUMO. Protein Science. 2006;15(1):182-189. doi:10.1110/ps.051812706
  2. Costa S, Almeida A, Castro A, Domingues L. Fusion tags for protein solubility, purification and immunogenicity in Escherichia coli: the novel Fh8 system. Frontiers in Microbiology. 2014;5:63. doi:10.3389/fmicb.2014.00063
  3. Kapust RB, Waugh DS. Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Science. 1999;8(8):1668-1674. doi:10.1110/ps.8.8.1668
  4. De Marco V, Stier G, Blandin S, de Marco A. The solubility and stability of recombinant proteins are increased by their fusion to NusA. Biochemical and Biophysical Research Communications. 2004;322(3):766-771. doi:10.1016/j.bbrc.2004.07.189
  5. Malhotra A. Tagging for protein expression. Methods in Enzymology. 2009;463:239-258. doi:10.1016/S0076-6879(09)63016-0

Resources & Further Reading