Modeling a reaction by mass action cascades using PySCeS#

This notebook is part of the publication “EnzymeML at Work” from Lauterbach et al. 2022 and adds a given micro-kinetic model to an EnzymeML document. Prior to this, experimental data was collected using the EnzymeML spreadsheet and converted to EnzymeML. The following notebook adresses the following key procedures using PyEnzyme:

Editing existing species
Adding new species from scratch or by DB-fetch
Adding reactions via an equation
Setting up a model generator, and assigning rate laws to a kinetic model
Fitting parameters in this kinetic model to experimental data with PySCeS

We demonstrate how to open an EnzymeML document, extend this document with a micro-kinetic model using PyEnzyme, and then use PySCeS to estimate the parameters of this micro-kinetic model based on the measurement data included in the EnzymeML document.

For this to work, you will have to have PySCeS and PyEnzyme installed, which can be done using:

!pip install pysces
!pip install git+git://github.com/EnzymeML/PyEnzyme.git

This is not needed when running this notebook via Binder, as the environment is already set up.

For the parameter estimation with PySCeS, the CVODE algorithm is needed; this is provided by Assimulo. If you are using the Anaconda Python Distribution (and when running this notebook via Binder), this can easily be achieved by uncommenting and running the following line of code. Alternatively, refer to the Assimulo documentation: https://jmodelica.org/assimulo/

# !conda install -y -c conda-forge assimulo

We are now ready to import the required modules.

from pyenzyme import EnzymeMLDocument, EnzymeReaction, Complex, Reactant, Protein, Creator
from pyenzyme.enzymeml.models import KineticModel, KineticParameter

Editing an EnzymeML document#

Since the previously created EnzymeML document only includes the macro-kinetic reaction, the micro-kinetic model found in Lagerman et al. will be introduced to the EnzymeML document.

For this, intermediate species have to be added to the document, which, aside from small-molecules also include protein-substrate complexes. Macromolecular structures can be described in EnzymeML by using the Complex class, which basically acts as a reference container to participating species that is compatible to SBML.

In addition, the following cell also demonstrates how an exitsing EnzymeML document can be edited, where we are simplifying names for model creation. Furthermore, the methanol species will be added using the fromChebiID-Initializer that fetches and fills in data from the CHEBI-Database. Ultimately, using the initializer allows for data consistency and should always be chosen over manual initialization. The same is available for the Protein class, namely fromUniProtID.

# Load the dataset that was generated using the EnzymeML spreadsheet
enzmldoc = EnzymeMLDocument.fromFile("EnzymeML_Lagerman.omex")

# Rename entities for simplicity
enzmldoc.getReactant("s0").name = "PGME"
enzmldoc.getReactant("s1").name = "7-ADCA"
enzmldoc.getReactant("s2").name = "CEX"
enzmldoc.getReactant("s3").name = "PG"
enzmldoc.getProtein("p0").name = "E"

# Set missing initial concentration
enzmldoc.getReactant("CEX").init_conc = 0.0
enzmldoc.getReactant("CEX").unit = "mmole / l"

enzmldoc.getReactant("PG").init_conc = 0.0
enzmldoc.getReactant("PG").unit = "mmole / l"

# Change initial concentration of E to mmole / l
e = enzmldoc.getProtein("E")
e.init_conc = 0.0002
e.unit = "mmole / l"

# Add EA enzyme instance
ea = enzmldoc.getProtein("E").copy(deep=True)
ea.name = "EA"
ea.init_conc = 0.0
enzmldoc.addProtein(ea)

# Add deactivated enzyme instance
ed = enzmldoc.getProtein("E").copy(deep=True)
ed.name = "ED"
ed.init_conc = 0.0
enzmldoc.addProtein(ed)

# Set proteins to not constant for machnistic modeling
enzmldoc.getProtein("E").constant = False
enzmldoc.getProtein("EA").constant = False
enzmldoc.getProtein("ED").constant = False

# Add model intermediates
enzmldoc.addComplex("E·PGME", ["E", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·7-ADCA", ["E", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PG", ["E", "PG"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PGME·PGME", ["E", "PGME", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·7-ADCA", ["EA", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·PGME", ["EA", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·CEX", ["E", "CEX"], "v0", 0.0, "mmole/l")

# Remove the old reaction since we are going to expand it
# to a micro-kinetic model for modeling
del enzmldoc.reaction_dict["r0"]

Adding the micro-kinetic model#

Now that alls necessary species are defined, these will be set in relation by adding the reaction network defined in Lagerman et al. For this, the EnzymeReaction class can either be constructed using deignated addEduct methods (see examples on PyEnzyme Git) or the fromEquation initializer inspired by BasiCo. The latter will infer the corresponding ID as well as reversibility attribute from the equation string and parses educts and products to the object. Furthermore, both kinds of cunstructors require the EnzymeMLDocument for consistency checks.

In this case, we prefer the fromEquation-method since there are a lot of reactions which essentially are only made up of educts and products but no modifiers. At this point, please note the downside of fromEquation not being able to include modifiers, but since the enzymes/complexes are either consumed or produced, this justifies our usage.

# Now define the micro-kinetic model by adding each sub-reaction to the document
r1 = EnzymeReaction.fromEquation("E·7-ADCA = E + 7-ADCA", "reaction-1", enzmldoc)
r2 = EnzymeReaction.fromEquation("E·PGME = E + PGME", "reaction-2", enzmldoc)
r3 = EnzymeReaction.fromEquation("E·PGME -> EA", "reaction-3", enzmldoc)
r4 = EnzymeReaction.fromEquation("E·PGME·PGME = E·PGME + PGME", "reaction-4", enzmldoc)
r5 = EnzymeReaction.fromEquation("EA·PGME = EA + PGME", "reaction-5", enzmldoc)
r6 = EnzymeReaction.fromEquation("EA·PGME -> E + PG", "reaction-6", enzmldoc)
r7 = EnzymeReaction.fromEquation("EA -> E + PG", "reaction-7", enzmldoc)
r8 = EnzymeReaction.fromEquation("E·PG = E + PG", "reaction-8", enzmldoc)
r9 = EnzymeReaction.fromEquation("EA·7-ADCA = EA + 7-ADCA", "reaction-9", enzmldoc)
r10 = EnzymeReaction.fromEquation("EA·7-ADCA -> E + PG + 7-ADCA", "reaction-10", enzmldoc)
r11 = EnzymeReaction.fromEquation("EA·7-ADCA = E·CEX", "reaction-11", enzmldoc)
r12 = EnzymeReaction.fromEquation("E·CEX = E + CEX", "reaction-12", enzmldoc)
r13 = EnzymeReaction.fromEquation("E -> ED", "Enzyme deactivation", enzmldoc)

Setting up rate laws#

Given the now added reaction network, it is necessary to add appropriate rate laws to each reaction to facilitate parameter estimation using SBML-based modeling platforms. Since most of the reactions follow normal Mass Action laws, we can set up so called ModelFactories that represent an abstract model.

Previous PyEnzyme versions would require to implement an explicit model for each reaction in the form of

s0 * vmax / (Km + s0)

which requires an equation string for each model. This can become quite tedious and for models, where most reactions share the same rate law quite error prone if done manually. Hence, the createGenerator-initializer of the KineticModel offers a convinient way to generalize models and ensure consistency as well as re-usability.

Excursion: Setting up a model generator#

In order to set up a model generator it requires a name, an equation and an explicit description of the used parameters occuring in the equation. For instance, lets set up an Example-Model with equation param * substrate and parameter param for simplicity.

The algorithm will detect parameters based on the keyword arguments passed to the generator. In addtion, these keyword arguments should include a dicitonary that can optionally be equipped with all prossible attributes the KineticParameter class can hold.

# Setting up the generator
example_gen = KineticModel.createGenerator(name="Example-Model", equation="param * substrate", param={"unit": "1 / s"})

The generator can now be applied to any type of reaction by calling the object using the variables as keyword arguments and the corresponding actual species as values. For instance, in our example, substrate is the variable and thus has to be provided as a keyword argument. If for instance the reaction we want the model has a substrate called Pyruvate we can explicitly include this in our call.

# Using the generator with a single species
single_example = example_gen(substrate="Pyruvate")
print("One substrate --", single_example, end="\n\n")

# Generators can also take lists and convert them to the model
multi_example = example_gen(substrate=["Pyruvate", "H2O"])
print("Multiple substrate --", multi_example, end="\n\n")

One substrate -- name='Example-Model' equation="param * 'Pyruvate'" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None

Multiple substrate -- name='Example-Model' equation="param * ('Pyruvate' * 'H2O')" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None

Such a generated model can now be added to a reaction by assigning it to the model attribute.

Adding rate laws#

As previously discussed, all rate laws will be set up as generator objects that are assigned to each reaction using the corrsponding educts/products. In addition, parameters that occur in more than one reaction, are defined as gobal parameters.

Finally, after that has been done, all reactions will be added to the EnzymeMLDocument object and an overview generated to control the assignment using the printReactionSchemes method of the EnzymeML document.

# Set up generators for kinetic models
eq_rev = KineticModel.createGenerator(
    name="MassAction Reversible", equation="K_1 * substrate - K_2 * product",
    K_1={"unit": "1 / min"}, K_2={"unit": "1 / min"}
)

eq_irrev = KineticModel.createGenerator(
    name="MassAction Irreversible", equation="K_1 * substrate",
    K_1={"unit": "1 / min"}
)

mass_eq = KineticModel.createGenerator(
    name="MassAction Keq", equation=f"v_r*(K_eq * substrate - product)",
    K_eq={"unit": "mmole / l"}, v_r={"unit": "l / mmole min", "constant": True}
)

# Set up global parameters
v_r = enzmldoc.addGlobalParameter("v_r", unit="l / mmole min", constant=True)
K_si = enzmldoc.addGlobalParameter("K_si", unit="mmole / l")
K_n = enzmldoc.addGlobalParameter("K_n", unit="mmole / l")

r1.model = mass_eq(product=["E", "7-ADCA"], substrate="E·7-ADCA", mapping={"K_eq": "K_n"})
r2.model = mass_eq(product=["E", "PGME"], substrate="E·PGME", mapping={"K_eq": "K_s"})
r3.model = eq_irrev(substrate="E·PGME", mapping={"K_1": "k_2"})
r4.model = mass_eq(product=["E·PGME", "PGME"], substrate="E·PGME·PGME", mapping={"K_eq": "K_si"})
r5.model = mass_eq(product=["EA", "PGME"], substrate="EA·PGME", mapping={"K_eq": "K_si"})
r6.model = eq_irrev(substrate="EA·PGME", mapping={"K_1": "k_6"})
r7.model = eq_irrev(substrate="EA", mapping={"K_1": "k_3"})
r8.model = mass_eq(product=["E", "PG"], substrate="E·PG", mapping={"K_eq": "K_pg"})
r9.model = mass_eq(product=["EA", "7-ADCA"], substrate="EA·7-ADCA", mapping={"K_eq": "K_n"})
r10.model = eq_irrev(substrate="EA·7-ADCA", mapping={"K_1": "k_5"})
r11.model = eq_rev(substrate="EA·7-ADCA", product="E·CEX", mapping={"K_1": "k_4", "K_2": "k_4b"})
r12.model = mass_eq(substrate="E·CEX", product=["E", "CEX"], mapping={"K_eq": "K_p"})
r13.model = eq_irrev(substrate="E", mapping={"K_1": "k_d"} )

# Finally, add all reactions to the EnzymeML document
reaction_ids = enzmldoc.addReactions(
    [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13]
)

enzmldoc.printReactionSchemes(by_name=True)

	Name	equation	kinetic law
ID
r0	reaction-1	1.0 E·7-ADCA <=> 1.0 E + 1.0 7-ADCA	v_r(K_n E·7-ADCA - (E * 7-ADCA))
r1	reaction-2	1.0 E·PGME <=> 1.0 E + 1.0 PGME	v_r(K_s E·PGME - (E * PGME))
r2	reaction-3	1.0 E·PGME -> 1.0 EA	k_2 * E·PGME
r3	reaction-4	1.0 E·PGME·PGME <=> 1.0 E·PGME + 1.0 PGME	v_r(K_si E·PGME·PGME - (E·PGME * PGME))
r4	reaction-5	1.0 EA·PGME <=> 1.0 EA + 1.0 PGME	v_r(K_si EA·PGME - (EA * PGME))
r5	reaction-6	1.0 EA·PGME -> 1.0 E + 1.0 PG	k_6 * EA·PGME
r6	reaction-7	1.0 EA -> 1.0 E + 1.0 PG	k_3 * EA
r7	reaction-8	1.0 E·PG <=> 1.0 E + 1.0 PG	v_r(K_pg E·PG - (E * PG))
r8	reaction-9	1.0 EA·7-ADCA <=> 1.0 EA + 1.0 7-ADCA	v_r(K_n EA·7-ADCA - (EA * 7-ADCA))
r9	reaction-10	1.0 EA·7-ADCA -> 1.0 E + 1.0 PG + 1.0 7-ADCA	k_5 * EA·7-ADCA
r10	reaction-11	1.0 EA·7-ADCA <=> 1.0 E·CEX	k_4 * EA·7-ADCA - k_4b * E·CEX
r11	reaction-12	1.0 E·CEX <=> 1.0 E + 1.0 CEX	v_r(K_p E·CEX - (E * CEX))
r12	Enzyme deactivation	1.0 E -> 1.0 ED	k_d * E

# Finally, write the document to an OMEX archive
enzmldoc.toFile(".", name="Model_4")

Archive was written to ./Model_4.omex

Now that the EnzymeMLDocument has been adapted to the micro-kinetic model, it can be modeled and optimized using PySCeS and COPASI. Since both modeling package interfaces are an integral part of PyEnzyme, called Thin Layer, a simple call to the corresponding Thin Layer object is necessary.

But before optimization, it might be necessary to define initial values. Since manipulating the KineticParameter initial_values attributes inside the script that generates the EnzymeMLDocument can get quite tedious, PyEnzyme offers an external data structure from within initial values can be applied. This way, the EnzymeML document is only modifed at optimization and remains untouched until then.

The initialization file is in the YAML format and contains all reactions and their parameters. In addtion, PyEnzyme offers a method to generate such a YAML file, which can be edited manually with the initial parameter values for the optimization.

# In addition, generate a blank YAML file to manually enter initial values for modeling
enzmldoc.generateInitialValueTemplate()

Using the PySCeS thin layer#

The PySCeS thin layer can be used in conjunction with the initialization YAML file. The thin layer will extract all necessary data, feed it into the simulation framework and iteratively optimizes the given parameters until convergence. At the end, the estimated parameters will be written to a new EnzymeMLDocument and saved.

from pyenzyme.thinlayers import ThinLayerPysces

First, the SBML model is converted to a PySCeS input file, this happens automatically. The warning about SBML Level 3 can be safely ignored as PySCeS has all the functionality required for this simulation.

# Initialize the layer
tl_pysces = ThinLayerPysces(
    "Model_4.omex", init_file="EnzymeML_Lagerman_init_values_.yaml",
    model_dir="pySCeS"
)

Check SBML support is at action level 2
SBML file is L3V2



*******ERRORS*********

WARNING: Model is encoded as SBML Level 3, PySCeS only officially supports L2V5.

*******ERRORS*********


Possible errors detected in SBML conversion, Model may be incomplete. Please check the error log file "EnzymeML_Lagerman.xml-sbml_conversion_errors.txt" for details.


***************************************************************
Issues encountered in SBML translation (model processed anyway)
SBML source: pySCeS/EnzymeML_Lagerman.xml
***************************************************************

Parameter units ignored for parameters:
['v_r', 'K_si', 'K_n']

Parameter units ignored for (local) parameters:
['K_s', 'k_2', 'k_6', 'k_3', 'K_pg', 'k_5', 'k_4', 'k_4b', 'K_p', 'k_d']

***************************************************************

Info: single compartment model: locating "r0" in default compartment
Info: single compartment model: locating "r1" in default compartment
Info: single compartment model: locating "r2" in default compartment
Info: single compartment model: locating "r3" in default compartment
Info: single compartment model: locating "r4" in default compartment
Info: single compartment model: locating "r5" in default compartment
Info: single compartment model: locating "r6" in default compartment
Info: single compartment model: locating "r7" in default compartment
Info: single compartment model: locating "r8" in default compartment
Info: single compartment model: locating "r9" in default compartment
Info: single compartment model: locating "r10" in default compartment
Info: single compartment model: locating "r11" in default compartment
Info: single compartment model: locating "r12" in default compartment
Writing file: pySCeS/EnzymeML_Lagerman.xml.psc

SBML2PSC
in : pySCeS/EnzymeML_Lagerman.xml
out: pySCeS/EnzymeML_Lagerman.xml.psc
Assuming extension is .psc
Using model directory: pySCeS
pySCeS/EnzymeML_Lagerman.xml.psc loading .....
Parsing file: pySCeS/EnzymeML_Lagerman.xml.psc
Info: No reagents have been fixed

Calculating L matrix . . . . . . .  done.
Calculating K matrix . . . . . . . . .  done.

Now, the optimization is run. Two different numerical integration algorithms are available in PySCeS, i.e. LSODA and CVODE. Here we choose CVODE. The optimization algorithm can also be specified, any of the algorithms available in LMFIT (https://lmfit.github.io/lmfit-py/fitting.html#choosing-different-fitting-methods) can be chosen.

# Run optimization
tl_pysces.model.mode_integrator='CVODE'
tl_opt = tl_pysces.optimize(method="least_squares")

tl_opt

Fit Statistics

fitting method	least_squares
# function evals	18
# data points	320
# variables	12
chi-square	2951.22432
reduced chi-square	9.58189715
Akaike info crit.	734.929405
Bayesian info crit.	780.149257

Variables

name	value	standard error	relative error	initial value	min	max	vary
v_r	1.0000e+09	0.00000000	(0.00%)	1000000000.0	-inf	inf	False
K_si	5.39337342	0.93246815	(17.29%)	6.346729249	0.01000000	1000.00000	True
K_n	6.50049502	1.55960885	(23.99%)	11.00542741	0.01000000	10000.0000	True
r1_K_s	5.12950319	1.03650938	(20.21%)	5.676255758	0.01000000	1000.00000	True
r2_k_2	569452.681	92198.7563	(16.19%)	652085.5139	1.00000000	1000000.00	True
r5_k_6	249555.287	77303.5694	(30.98%)	393169.3931	1.00000000	10000000.0	True
r6_k_3	15.8995797	6.25028953	(39.31%)	8.93817854	1.00000000	1000000.00	True
r7_K_pg	129.952300	29.9876732	(23.08%)	47.34446037	0.01000000	1000.00000	True
r9_k_5	884605.533	204522.629	(23.12%)	672295.823	1.00000000	1000000.00	True
r10_k_4	1577461.07	345178.098	(21.88%)	1870570.524	1.00000000	1.0000e+08	True
r10_k_4b	36802.5461	5963.80588	(16.20%)	42451.1374	1.00000000	1.0000e+08	True
r11_K_p	1.29574677	0.34043006	(26.27%)	0.9433184993	0.01000000	1000.00000	True
r12_k_d	0.32920628	0.23138589	(70.29%)	0.5149464784	1.0000e-03	1.00000000	True

Correlations (unreported correlations are < 0.100)

K_si	r2_k_2	-0.9611
r9_k_5	r10_k_4	0.8630
r1_K_s	r2_k_2	0.6379
r1_K_s	r5_k_6	-0.6288
K_si	r1_K_s	-0.5700
K_n	r5_k_6	-0.5434
r10_k_4	r11_K_p	-0.5423
K_n	r10_k_4b	-0.4756
r5_k_6	r10_k_4	0.4694
r9_k_5	r10_k_4b	-0.4659
r6_k_3	r10_k_4	-0.4555
r6_k_3	r9_k_5	-0.4515
K_n	r1_K_s	0.4407
r5_k_6	r11_K_p	-0.4210
r7_K_pg	r9_k_5	-0.4191
K_n	r12_k_d	-0.3832
r1_K_s	r12_k_d	-0.3732
r1_K_s	r6_k_3	-0.3514
r7_K_pg	r10_k_4	-0.3464
K_si	r11_K_p	0.3307
r1_K_s	r10_k_4b	-0.3208
r2_k_2	r5_k_6	-0.3110
r7_K_pg	r10_k_4b	0.3106
r2_k_2	r11_K_p	-0.2926
r5_k_6	r12_k_d	0.2885
r5_k_6	r9_k_5	0.2838
r9_k_5	r11_K_p	-0.2721
r5_k_6	r10_k_4b	0.2640
K_si	r10_k_4	-0.2627
r10_k_4	r10_k_4b	-0.2584
r5_k_6	r7_K_pg	-0.2483
r10_k_4b	r12_k_d	0.2313
K_si	r6_k_3	0.2301
r7_K_pg	r11_K_p	0.2282
K_si	r5_k_6	0.2018
K_n	r7_K_pg	0.1983
r2_k_2	r6_k_3	-0.1950
K_n	r11_K_p	0.1950
r11_K_p	r12_k_d	-0.1878
r10_k_4b	r11_K_p	0.1768
K_n	r6_k_3	-0.1702
K_si	r12_k_d	0.1581
r7_K_pg	r12_k_d	-0.1562
r6_k_3	r12_k_d	-0.1559
K_si	r10_k_4b	0.1410
r2_k_2	r10_k_4	0.1405
K_si	r9_k_5	-0.1372
r1_K_s	r7_K_pg	0.1354
r1_K_s	r11_K_p	0.1068

Finally, the result is written to a new EnzymeML Document and saved to file.

# Write to new EnzymeMLDocument and save
nu_doc = tl_pysces.write()
nu_doc.toFile(".", name="EnzymeML_Lagerman_M4_PySCeS_Modeled")

Archive was written to ./EnzymeML_Lagerman_M4_PySCeS_Modeled.omex