Modeling a reaction by mass action cascades using PySCeS#

This notebook is part of the publication “EnzymeML at Work” from Lauterbach et al. 2022 and adds a given micro-kinetic model to an EnzymeML document. Prior to this, experimental data was collected using the EnzymeML spreadsheet and converted to EnzymeML. The following notebook adresses the following key procedures using PyEnzyme:

  • Editing existing species

  • Adding new species from scratch or by DB-fetch

  • Adding reactions via an equation

  • Setting up a model generator, and assigning rate laws to a kinetic model

  • Fitting parameters in this kinetic model to experimental data with PySCeS

We demonstrate how to open an EnzymeML document, extend this document with a micro-kinetic model using PyEnzyme, and then use PySCeS to estimate the parameters of this micro-kinetic model based on the measurement data included in the EnzymeML document.

For this to work, you will have to have PySCeS and PyEnzyme installed, which can be done using:

!pip install pysces
!pip install git+git://github.com/EnzymeML/PyEnzyme.git

This is not needed when running this notebook via Binder, as the environment is already set up.

For the parameter estimation with PySCeS, the CVODE algorithm is needed; this is provided by Assimulo. If you are using the Anaconda Python Distribution (and when running this notebook via Binder), this can easily be achieved by uncommenting and running the following line of code. Alternatively, refer to the Assimulo documentation: https://jmodelica.org/assimulo/

# !conda install -y -c conda-forge assimulo

We are now ready to import the required modules.

from pyenzyme import EnzymeMLDocument, EnzymeReaction, Complex, Reactant, Protein, Creator
from pyenzyme.enzymeml.models import KineticModel, KineticParameter

Editing an EnzymeML document#

Since the previously created EnzymeML document only includes the macro-kinetic reaction, the micro-kinetic model found in Lagerman et al. will be introduced to the EnzymeML document.

For this, intermediate species have to be added to the document, which, aside from small-molecules also include protein-substrate complexes. Macromolecular structures can be described in EnzymeML by using the Complex class, which basically acts as a reference container to participating species that is compatible to SBML.

In addition, the following cell also demonstrates how an exitsing EnzymeML document can be edited, where we are simplifying names for model creation. Furthermore, the methanol species will be added using the fromChebiID-Initializer that fetches and fills in data from the CHEBI-Database. Ultimately, using the initializer allows for data consistency and should always be chosen over manual initialization. The same is available for the Protein class, namely fromUniProtID.

# Load the dataset that was generated using the EnzymeML spreadsheet
enzmldoc = EnzymeMLDocument.fromFile("EnzymeML_Lagerman.omex")

# Rename entities for simplicity
enzmldoc.getReactant("s0").name = "PGME"
enzmldoc.getReactant("s1").name = "7-ADCA"
enzmldoc.getReactant("s2").name = "CEX"
enzmldoc.getReactant("s3").name = "PG"
enzmldoc.getProtein("p0").name = "E"

# Set missing initial concentration
enzmldoc.getReactant("CEX").init_conc = 0.0
enzmldoc.getReactant("CEX").unit = "mmole / l"

enzmldoc.getReactant("PG").init_conc = 0.0
enzmldoc.getReactant("PG").unit = "mmole / l"

# Change initial concentration of E to mmole / l
e = enzmldoc.getProtein("E")
e.init_conc = 0.0002
e.unit = "mmole / l"

# Add EA enzyme instance
ea = enzmldoc.getProtein("E").copy(deep=True)
ea.name = "EA"
ea.init_conc = 0.0
enzmldoc.addProtein(ea)

# Add deactivated enzyme instance
ed = enzmldoc.getProtein("E").copy(deep=True)
ed.name = "ED"
ed.init_conc = 0.0
enzmldoc.addProtein(ed)

# Set proteins to not constant for machnistic modeling
enzmldoc.getProtein("E").constant = False
enzmldoc.getProtein("EA").constant = False
enzmldoc.getProtein("ED").constant = False

# Add model intermediates
enzmldoc.addComplex("E·PGME", ["E", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·7-ADCA", ["E", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PG", ["E", "PG"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PGME·PGME", ["E", "PGME", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·7-ADCA", ["EA", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·PGME", ["EA", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·CEX", ["E", "CEX"], "v0", 0.0, "mmole/l")

# Remove the old reaction since we are going to expand it
# to a micro-kinetic model for modeling
del enzmldoc.reaction_dict["r0"]

Adding the micro-kinetic model#

Now that alls necessary species are defined, these will be set in relation by adding the reaction network defined in Lagerman et al. For this, the EnzymeReaction class can either be constructed using deignated addEduct methods (see examples on PyEnzyme Git) or the fromEquation initializer inspired by BasiCo. The latter will infer the corresponding ID as well as reversibility attribute from the equation string and parses educts and products to the object. Furthermore, both kinds of cunstructors require the EnzymeMLDocument for consistency checks.

In this case, we prefer the fromEquation-method since there are a lot of reactions which essentially are only made up of educts and products but no modifiers. At this point, please note the downside of fromEquation not being able to include modifiers, but since the enzymes/complexes are either consumed or produced, this justifies our usage.

# Now define the micro-kinetic model by adding each sub-reaction to the document
r1 = EnzymeReaction.fromEquation("E·7-ADCA = E + 7-ADCA", "reaction-1", enzmldoc)
r2 = EnzymeReaction.fromEquation("E·PGME = E + PGME", "reaction-2", enzmldoc)
r3 = EnzymeReaction.fromEquation("E·PGME -> EA", "reaction-3", enzmldoc)
r4 = EnzymeReaction.fromEquation("E·PGME·PGME = E·PGME + PGME", "reaction-4", enzmldoc)
r5 = EnzymeReaction.fromEquation("EA·PGME = EA + PGME", "reaction-5", enzmldoc)
r6 = EnzymeReaction.fromEquation("EA·PGME -> E + PG", "reaction-6", enzmldoc)
r7 = EnzymeReaction.fromEquation("EA -> E + PG", "reaction-7", enzmldoc)
r8 = EnzymeReaction.fromEquation("E·PG = E + PG", "reaction-8", enzmldoc)
r9 = EnzymeReaction.fromEquation("EA·7-ADCA = EA + 7-ADCA", "reaction-9", enzmldoc)
r10 = EnzymeReaction.fromEquation("EA·7-ADCA -> E + PG + 7-ADCA", "reaction-10", enzmldoc)
r11 = EnzymeReaction.fromEquation("EA·7-ADCA = E·CEX", "reaction-11", enzmldoc)
r12 = EnzymeReaction.fromEquation("E·CEX = E + CEX", "reaction-12", enzmldoc)
r13 = EnzymeReaction.fromEquation("E -> ED", "Enzyme deactivation", enzmldoc)

Setting up rate laws#

Given the now added reaction network, it is necessary to add appropriate rate laws to each reaction to facilitate parameter estimation using SBML-based modeling platforms. Since most of the reactions follow normal Mass Action laws, we can set up so called ModelFactories that represent an abstract model.

Previous PyEnzyme versions would require to implement an explicit model for each reaction in the form of

s0 * vmax / (Km + s0)

which requires an equation string for each model. This can become quite tedious and for models, where most reactions share the same rate law quite error prone if done manually. Hence, the createGenerator-initializer of the KineticModel offers a convinient way to generalize models and ensure consistency as well as re-usability.

Excursion: Setting up a model generator#

In order to set up a model generator it requires a name, an equation and an explicit description of the used parameters occuring in the equation. For instance, lets set up an Example-Model with equation param * substrate and parameter param for simplicity.

The algorithm will detect parameters based on the keyword arguments passed to the generator. In addtion, these keyword arguments should include a dicitonary that can optionally be equipped with all prossible attributes the KineticParameter class can hold.

# Setting up the generator
example_gen = KineticModel.createGenerator(name="Example-Model", equation="param * substrate", param={"unit": "1 / s"})

The generator can now be applied to any type of reaction by calling the object using the variables as keyword arguments and the corresponding actual species as values. For instance, in our example, substrate is the variable and thus has to be provided as a keyword argument. If for instance the reaction we want the model has a substrate called Pyruvate we can explicitly include this in our call.

# Using the generator with a single species
single_example = example_gen(substrate="Pyruvate")
print("One substrate --", single_example, end="\n\n")

# Generators can also take lists and convert them to the model
multi_example = example_gen(substrate=["Pyruvate", "H2O"])
print("Multiple substrate --", multi_example, end="\n\n")
One substrate -- name='Example-Model' equation="param * 'Pyruvate'" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None

Multiple substrate -- name='Example-Model' equation="param * ('Pyruvate' * 'H2O')" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None

Such a generated model can now be added to a reaction by assigning it to the model attribute.

Adding rate laws#

As previously discussed, all rate laws will be set up as generator objects that are assigned to each reaction using the corrsponding educts/products. In addition, parameters that occur in more than one reaction, are defined as gobal parameters.

Finally, after that has been done, all reactions will be added to the EnzymeMLDocument object and an overview generated to control the assignment using the printReactionSchemes method of the EnzymeML document.

# Set up generators for kinetic models
eq_rev = KineticModel.createGenerator(
    name="MassAction Reversible", equation="K_1 * substrate - K_2 * product",
    K_1={"unit": "1 / min"}, K_2={"unit": "1 / min"}
)

eq_irrev = KineticModel.createGenerator(
    name="MassAction Irreversible", equation="K_1 * substrate",
    K_1={"unit": "1 / min"}
)

mass_eq = KineticModel.createGenerator(
    name="MassAction Keq", equation=f"v_r*(K_eq * substrate - product)",
    K_eq={"unit": "mmole / l"}, v_r={"unit": "l / mmole min", "constant": True}
)
# Set up global parameters
v_r = enzmldoc.addGlobalParameter("v_r", unit="l / mmole min", constant=True)
K_si = enzmldoc.addGlobalParameter("K_si", unit="mmole / l")
K_n = enzmldoc.addGlobalParameter("K_n", unit="mmole / l")
r1.model = mass_eq(product=["E", "7-ADCA"], substrate="E·7-ADCA", mapping={"K_eq": "K_n"})
r2.model = mass_eq(product=["E", "PGME"], substrate="E·PGME", mapping={"K_eq": "K_s"})
r3.model = eq_irrev(substrate="E·PGME", mapping={"K_1": "k_2"})
r4.model = mass_eq(product=["E·PGME", "PGME"], substrate="E·PGME·PGME", mapping={"K_eq": "K_si"})
r5.model = mass_eq(product=["EA", "PGME"], substrate="EA·PGME", mapping={"K_eq": "K_si"})
r6.model = eq_irrev(substrate="EA·PGME", mapping={"K_1": "k_6"})
r7.model = eq_irrev(substrate="EA", mapping={"K_1": "k_3"})
r8.model = mass_eq(product=["E", "PG"], substrate="E·PG", mapping={"K_eq": "K_pg"})
r9.model = mass_eq(product=["EA", "7-ADCA"], substrate="EA·7-ADCA", mapping={"K_eq": "K_n"})
r10.model = eq_irrev(substrate="EA·7-ADCA", mapping={"K_1": "k_5"})
r11.model = eq_rev(substrate="EA·7-ADCA", product="E·CEX", mapping={"K_1": "k_4", "K_2": "k_4b"})
r12.model = mass_eq(substrate="E·CEX", product=["E", "CEX"], mapping={"K_eq": "K_p"})
r13.model = eq_irrev(substrate="E", mapping={"K_1": "k_d"} )
# Finally, add all reactions to the EnzymeML document
reaction_ids = enzmldoc.addReactions(
    [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13]
)
enzmldoc.printReactionSchemes(by_name=True)
Name equation kinetic law
ID
r0 reaction-1 1.0 E·7-ADCA <=> 1.0 E + 1.0 7-ADCA v_r*(K_n * E·7-ADCA - (E * 7-ADCA))
r1 reaction-2 1.0 E·PGME <=> 1.0 E + 1.0 PGME v_r*(K_s * E·PGME - (E * PGME))
r2 reaction-3 1.0 E·PGME -> 1.0 EA k_2 * E·PGME
r3 reaction-4 1.0 E·PGME·PGME <=> 1.0 E·PGME + 1.0 PGME v_r*(K_si * E·PGME·PGME - (E·PGME * PGME))
r4 reaction-5 1.0 EA·PGME <=> 1.0 EA + 1.0 PGME v_r*(K_si * EA·PGME - (EA * PGME))
r5 reaction-6 1.0 EA·PGME -> 1.0 E + 1.0 PG k_6 * EA·PGME
r6 reaction-7 1.0 EA -> 1.0 E + 1.0 PG k_3 * EA
r7 reaction-8 1.0 E·PG <=> 1.0 E + 1.0 PG v_r*(K_pg * E·PG - (E * PG))
r8 reaction-9 1.0 EA·7-ADCA <=> 1.0 EA + 1.0 7-ADCA v_r*(K_n * EA·7-ADCA - (EA * 7-ADCA))
r9 reaction-10 1.0 EA·7-ADCA -> 1.0 E + 1.0 PG + 1.0 7-ADCA k_5 * EA·7-ADCA
r10 reaction-11 1.0 EA·7-ADCA <=> 1.0 E·CEX k_4 * EA·7-ADCA - k_4b * E·CEX
r11 reaction-12 1.0 E·CEX <=> 1.0 E + 1.0 CEX v_r*(K_p * E·CEX - (E * CEX))
r12 Enzyme deactivation 1.0 E -> 1.0 ED k_d * E
# Finally, write the document to an OMEX archive
enzmldoc.toFile(".", name="Model_4")
Archive was written to ./Model_4.omex

Now that the EnzymeMLDocument has been adapted to the micro-kinetic model, it can be modeled and optimized using PySCeS and COPASI. Since both modeling package interfaces are an integral part of PyEnzyme, called Thin Layer, a simple call to the corresponding Thin Layer object is necessary.

But before optimization, it might be necessary to define initial values. Since manipulating the KineticParameter initial_values attributes inside the script that generates the EnzymeMLDocument can get quite tedious, PyEnzyme offers an external data structure from within initial values can be applied. This way, the EnzymeML document is only modifed at optimization and remains untouched until then.

The initialization file is in the YAML format and contains all reactions and their parameters. In addtion, PyEnzyme offers a method to generate such a YAML file, which can be edited manually with the initial parameter values for the optimization.

# In addition, generate a blank YAML file to manually enter initial values for modeling
enzmldoc.generateInitialValueTemplate()

Using the PySCeS thin layer#

The PySCeS thin layer can be used in conjunction with the initialization YAML file. The thin layer will extract all necessary data, feed it into the simulation framework and iteratively optimizes the given parameters until convergence. At the end, the estimated parameters will be written to a new EnzymeMLDocument and saved.

from pyenzyme.thinlayers import ThinLayerPysces

First, the SBML model is converted to a PySCeS input file, this happens automatically. The warning about SBML Level 3 can be safely ignored as PySCeS has all the functionality required for this simulation.

# Initialize the layer
tl_pysces = ThinLayerPysces(
    "Model_4.omex", init_file="EnzymeML_Lagerman_init_values_.yaml",
    model_dir="pySCeS"
)
Check SBML support is at action level 2
SBML file is L3V2



*******ERRORS*********

WARNING: Model is encoded as SBML Level 3, PySCeS only officially supports L2V5.

*******ERRORS*********


Possible errors detected in SBML conversion, Model may be incomplete. Please check the error log file "EnzymeML_Lagerman.xml-sbml_conversion_errors.txt" for details.


***************************************************************
Issues encountered in SBML translation (model processed anyway)
SBML source: pySCeS/EnzymeML_Lagerman.xml
***************************************************************

Parameter units ignored for parameters:
['v_r', 'K_si', 'K_n']

Parameter units ignored for (local) parameters:
['K_s', 'k_2', 'k_6', 'k_3', 'K_pg', 'k_5', 'k_4', 'k_4b', 'K_p', 'k_d']

***************************************************************

Info: single compartment model: locating "r0" in default compartment
Info: single compartment model: locating "r1" in default compartment
Info: single compartment model: locating "r2" in default compartment
Info: single compartment model: locating "r3" in default compartment
Info: single compartment model: locating "r4" in default compartment
Info: single compartment model: locating "r5" in default compartment
Info: single compartment model: locating "r6" in default compartment
Info: single compartment model: locating "r7" in default compartment
Info: single compartment model: locating "r8" in default compartment
Info: single compartment model: locating "r9" in default compartment
Info: single compartment model: locating "r10" in default compartment
Info: single compartment model: locating "r11" in default compartment
Info: single compartment model: locating "r12" in default compartment
Writing file: pySCeS/EnzymeML_Lagerman.xml.psc

SBML2PSC
in : pySCeS/EnzymeML_Lagerman.xml
out: pySCeS/EnzymeML_Lagerman.xml.psc
Assuming extension is .psc
Using model directory: pySCeS
pySCeS/EnzymeML_Lagerman.xml.psc loading .....
Parsing file: pySCeS/EnzymeML_Lagerman.xml.psc
Info: No reagents have been fixed

Calculating L matrix . . . . . . .  done.
Calculating K matrix . . . . . . . . .  done.

Now, the optimization is run. Two different numerical integration algorithms are available in PySCeS, i.e. LSODA and CVODE. Here we choose CVODE. The optimization algorithm can also be specified, any of the algorithms available in LMFIT (https://lmfit.github.io/lmfit-py/fitting.html#choosing-different-fitting-methods) can be chosen.

# Run optimization
tl_pysces.model.mode_integrator='CVODE'
tl_opt = tl_pysces.optimize(method="least_squares")
tl_opt

Fit Statistics

fitting methodleast_squares
# function evals18
# data points320
# variables12
chi-square 2951.22432
reduced chi-square 9.58189715
Akaike info crit. 734.929405
Bayesian info crit. 780.149257

Variables

name value standard error relative error initial value min max vary
v_r 1.0000e+09 0.00000000 (0.00%) 1000000000.0 -inf inf False
K_si 5.39337342 0.93246815 (17.29%) 6.346729249 0.01000000 1000.00000 True
K_n 6.50049502 1.55960885 (23.99%) 11.00542741 0.01000000 10000.0000 True
r1_K_s 5.12950319 1.03650938 (20.21%) 5.676255758 0.01000000 1000.00000 True
r2_k_2 569452.681 92198.7563 (16.19%) 652085.5139 1.00000000 1000000.00 True
r5_k_6 249555.287 77303.5694 (30.98%) 393169.3931 1.00000000 10000000.0 True
r6_k_3 15.8995797 6.25028953 (39.31%) 8.93817854 1.00000000 1000000.00 True
r7_K_pg 129.952300 29.9876732 (23.08%) 47.34446037 0.01000000 1000.00000 True
r9_k_5 884605.533 204522.629 (23.12%) 672295.823 1.00000000 1000000.00 True
r10_k_4 1577461.07 345178.098 (21.88%) 1870570.524 1.00000000 1.0000e+08 True
r10_k_4b 36802.5461 5963.80588 (16.20%) 42451.1374 1.00000000 1.0000e+08 True
r11_K_p 1.29574677 0.34043006 (26.27%) 0.9433184993 0.01000000 1000.00000 True
r12_k_d 0.32920628 0.23138589 (70.29%) 0.5149464784 1.0000e-03 1.00000000 True

Correlations (unreported correlations are < 0.100)

K_sir2_k_2-0.9611
r9_k_5r10_k_40.8630
r1_K_sr2_k_20.6379
r1_K_sr5_k_6-0.6288
K_sir1_K_s-0.5700
K_nr5_k_6-0.5434
r10_k_4r11_K_p-0.5423
K_nr10_k_4b-0.4756
r5_k_6r10_k_40.4694
r9_k_5r10_k_4b-0.4659
r6_k_3r10_k_4-0.4555
r6_k_3r9_k_5-0.4515
K_nr1_K_s0.4407
r5_k_6r11_K_p-0.4210
r7_K_pgr9_k_5-0.4191
K_nr12_k_d-0.3832
r1_K_sr12_k_d-0.3732
r1_K_sr6_k_3-0.3514
r7_K_pgr10_k_4-0.3464
K_sir11_K_p0.3307
r1_K_sr10_k_4b-0.3208
r2_k_2r5_k_6-0.3110
r7_K_pgr10_k_4b0.3106
r2_k_2r11_K_p-0.2926
r5_k_6r12_k_d0.2885
r5_k_6r9_k_50.2838
r9_k_5r11_K_p-0.2721
r5_k_6r10_k_4b0.2640
K_sir10_k_4-0.2627
r10_k_4r10_k_4b-0.2584
r5_k_6r7_K_pg-0.2483
r10_k_4br12_k_d0.2313
K_sir6_k_30.2301
r7_K_pgr11_K_p0.2282
K_sir5_k_60.2018
K_nr7_K_pg0.1983
r2_k_2r6_k_3-0.1950
K_nr11_K_p0.1950
r11_K_pr12_k_d-0.1878
r10_k_4br11_K_p0.1768
K_nr6_k_3-0.1702
K_sir12_k_d0.1581
r7_K_pgr12_k_d-0.1562
r6_k_3r12_k_d-0.1559
K_sir10_k_4b0.1410
r2_k_2r10_k_40.1405
K_sir9_k_5-0.1372
r1_K_sr7_K_pg0.1354
r1_K_sr11_K_p0.1068

Finally, the result is written to a new EnzymeML Document and saved to file.

# Write to new EnzymeMLDocument and save
nu_doc = tl_pysces.write()
nu_doc.toFile(".", name="EnzymeML_Lagerman_M4_PySCeS_Modeled")
Archive was written to ./EnzymeML_Lagerman_M4_PySCeS_Modeled.omex