Modeling a reaction by mass action cascades using PySCeS#
This notebook is part of the publication “EnzymeML at Work” from Lauterbach et al. 2022 and adds a given micro-kinetic model to an EnzymeML document. Prior to this, experimental data was collected using the EnzymeML spreadsheet and converted to EnzymeML. The following notebook adresses the following key procedures using PyEnzyme:
Editing existing species
Adding new species from scratch or by DB-fetch
Adding reactions via an equation
Setting up a model generator, and assigning rate laws to a kinetic model
Fitting parameters in this kinetic model to experimental data with PySCeS
We demonstrate how to open an EnzymeML document, extend this document with a micro-kinetic model using PyEnzyme, and then use PySCeS to estimate the parameters of this micro-kinetic model based on the measurement data included in the EnzymeML document.
For this to work, you will have to have PySCeS and PyEnzyme installed, which can be done using:
!pip install pysces
!pip install git+git://github.com/EnzymeML/PyEnzyme.git
This is not needed when running this notebook via Binder, as the environment is already set up.
For the parameter estimation with PySCeS, the CVODE algorithm is needed; this is provided by Assimulo. If you are using the Anaconda Python Distribution (and when running this notebook via Binder), this can easily be achieved by uncommenting and running the following line of code. Alternatively, refer to the Assimulo documentation: https://jmodelica.org/assimulo/
# !conda install -y -c conda-forge assimulo
We are now ready to import the required modules.
from pyenzyme import EnzymeMLDocument, EnzymeReaction, Complex, Reactant, Protein, Creator
from pyenzyme.enzymeml.models import KineticModel, KineticParameter
Editing an EnzymeML document#
Since the previously created EnzymeML document only includes the macro-kinetic reaction, the micro-kinetic model found in Lagerman et al. will be introduced to the EnzymeML document.
For this, intermediate species have to be added to the document, which,
aside from small-molecules also include protein-substrate complexes.
Macromolecular structures can be described in EnzymeML by using the
Complex
class, which basically acts as a reference container to
participating species that is compatible to SBML.
In addition, the following cell also demonstrates how an exitsing
EnzymeML document can be edited, where we are simplifying names for
model creation. Furthermore, the methanol species will be added using
the fromChebiID
-Initializer that fetches and fills in data from the
CHEBI-Database. Ultimately, using the initializer allows for data
consistency and should always be chosen over manual initialization. The
same is available for the Protein
class, namely fromUniProtID
.
# Load the dataset that was generated using the EnzymeML spreadsheet
enzmldoc = EnzymeMLDocument.fromFile("EnzymeML_Lagerman.omex")
# Rename entities for simplicity
enzmldoc.getReactant("s0").name = "PGME"
enzmldoc.getReactant("s1").name = "7-ADCA"
enzmldoc.getReactant("s2").name = "CEX"
enzmldoc.getReactant("s3").name = "PG"
enzmldoc.getProtein("p0").name = "E"
# Set missing initial concentration
enzmldoc.getReactant("CEX").init_conc = 0.0
enzmldoc.getReactant("CEX").unit = "mmole / l"
enzmldoc.getReactant("PG").init_conc = 0.0
enzmldoc.getReactant("PG").unit = "mmole / l"
# Change initial concentration of E to mmole / l
e = enzmldoc.getProtein("E")
e.init_conc = 0.0002
e.unit = "mmole / l"
# Add EA enzyme instance
ea = enzmldoc.getProtein("E").copy(deep=True)
ea.name = "EA"
ea.init_conc = 0.0
enzmldoc.addProtein(ea)
# Add deactivated enzyme instance
ed = enzmldoc.getProtein("E").copy(deep=True)
ed.name = "ED"
ed.init_conc = 0.0
enzmldoc.addProtein(ed)
# Set proteins to not constant for machnistic modeling
enzmldoc.getProtein("E").constant = False
enzmldoc.getProtein("EA").constant = False
enzmldoc.getProtein("ED").constant = False
# Add model intermediates
enzmldoc.addComplex("E·PGME", ["E", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·7-ADCA", ["E", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PG", ["E", "PG"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·PGME·PGME", ["E", "PGME", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·7-ADCA", ["EA", "7-ADCA"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("EA·PGME", ["EA", "PGME"], "v0", 0.0, "mmole/l")
enzmldoc.addComplex("E·CEX", ["E", "CEX"], "v0", 0.0, "mmole/l")
# Remove the old reaction since we are going to expand it
# to a micro-kinetic model for modeling
del enzmldoc.reaction_dict["r0"]
Adding the micro-kinetic model#
Now that alls necessary species are defined, these will be set in
relation by adding the reaction network defined in Lagerman et al. For
this, the EnzymeReaction
class can either be constructed using
deignated addEduct
methods (see examples on PyEnzyme Git) or the
fromEquation
initializer inspired by BasiCo
. The latter will
infer the corresponding ID as well as reversibility
attribute from
the equation string and parses educts and products to the object.
Furthermore, both kinds of cunstructors require the EnzymeMLDocument
for consistency checks.
In this case, we prefer the fromEquation
-method since there are a
lot of reactions which essentially are only made up of educts and
products but no modifiers. At this point, please note the downside of
fromEquation
not being able to include modifiers, but since the
enzymes/complexes are either consumed or produced, this justifies our
usage.
# Now define the micro-kinetic model by adding each sub-reaction to the document
r1 = EnzymeReaction.fromEquation("E·7-ADCA = E + 7-ADCA", "reaction-1", enzmldoc)
r2 = EnzymeReaction.fromEquation("E·PGME = E + PGME", "reaction-2", enzmldoc)
r3 = EnzymeReaction.fromEquation("E·PGME -> EA", "reaction-3", enzmldoc)
r4 = EnzymeReaction.fromEquation("E·PGME·PGME = E·PGME + PGME", "reaction-4", enzmldoc)
r5 = EnzymeReaction.fromEquation("EA·PGME = EA + PGME", "reaction-5", enzmldoc)
r6 = EnzymeReaction.fromEquation("EA·PGME -> E + PG", "reaction-6", enzmldoc)
r7 = EnzymeReaction.fromEquation("EA -> E + PG", "reaction-7", enzmldoc)
r8 = EnzymeReaction.fromEquation("E·PG = E + PG", "reaction-8", enzmldoc)
r9 = EnzymeReaction.fromEquation("EA·7-ADCA = EA + 7-ADCA", "reaction-9", enzmldoc)
r10 = EnzymeReaction.fromEquation("EA·7-ADCA -> E + PG + 7-ADCA", "reaction-10", enzmldoc)
r11 = EnzymeReaction.fromEquation("EA·7-ADCA = E·CEX", "reaction-11", enzmldoc)
r12 = EnzymeReaction.fromEquation("E·CEX = E + CEX", "reaction-12", enzmldoc)
r13 = EnzymeReaction.fromEquation("E -> ED", "Enzyme deactivation", enzmldoc)
Setting up rate laws#
Given the now added reaction network, it is necessary to add appropriate
rate laws to each reaction to facilitate parameter estimation using
SBML-based modeling platforms. Since most of the reactions follow normal
Mass Action laws, we can set up so called ModelFactories
that
represent an abstract model.
Previous PyEnzyme versions would require to implement an explicit model for each reaction in the form of
s0 * vmax / (Km + s0)
which requires an equation string for each model. This can become quite
tedious and for models, where most reactions share the same rate law
quite error prone if done manually. Hence, the
createGenerator
-initializer of the KineticModel
offers a
convinient way to generalize models and ensure consistency as well as
re-usability.
Excursion: Setting up a model generator#
In order to set up a model generator it requires a name
, an
equation
and an explicit description of the used parameters occuring
in the equation. For instance, lets set up an Example-Model
with
equation param * substrate
and parameter param
for simplicity.
The algorithm will detect parameters based on the keyword arguments
passed to the generator. In addtion, these keyword arguments should
include a dicitonary that can optionally be equipped with all prossible
attributes the KineticParameter
class can hold.
# Setting up the generator
example_gen = KineticModel.createGenerator(name="Example-Model", equation="param * substrate", param={"unit": "1 / s"})
The generator can now be applied to any type of reaction by calling the
object using the variables as keyword arguments and the corresponding
actual species as values. For instance, in our example, substrate
is
the variable and thus has to be provided as a keyword argument. If for
instance the reaction we want the model has a substrate called
Pyruvate
we can explicitly include this in our call.
# Using the generator with a single species
single_example = example_gen(substrate="Pyruvate")
print("One substrate --", single_example, end="\n\n")
# Generators can also take lists and convert them to the model
multi_example = example_gen(substrate=["Pyruvate", "H2O"])
print("Multiple substrate --", multi_example, end="\n\n")
One substrate -- name='Example-Model' equation="param * 'Pyruvate'" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None
Multiple substrate -- name='Example-Model' equation="param * ('Pyruvate' * 'H2O')" parameters=[KineticParameter(name='param', value=None, unit='1 / s', initial_value=None, upper=None, lower=None, is_global=False, stdev=None, constant=False, ontology=None)] ontology=None
Such a generated model can now be added to a reaction by assigning it to
the model
attribute.
Adding rate laws#
As previously discussed, all rate laws will be set up as generator objects that are assigned to each reaction using the corrsponding educts/products. In addition, parameters that occur in more than one reaction, are defined as gobal parameters.
Finally, after that has been done, all reactions will be added to the
EnzymeMLDocument
object and an overview generated to control the
assignment using the printReactionSchemes
method of the EnzymeML
document.
# Set up generators for kinetic models
eq_rev = KineticModel.createGenerator(
name="MassAction Reversible", equation="K_1 * substrate - K_2 * product",
K_1={"unit": "1 / min"}, K_2={"unit": "1 / min"}
)
eq_irrev = KineticModel.createGenerator(
name="MassAction Irreversible", equation="K_1 * substrate",
K_1={"unit": "1 / min"}
)
mass_eq = KineticModel.createGenerator(
name="MassAction Keq", equation=f"v_r*(K_eq * substrate - product)",
K_eq={"unit": "mmole / l"}, v_r={"unit": "l / mmole min", "constant": True}
)
# Set up global parameters
v_r = enzmldoc.addGlobalParameter("v_r", unit="l / mmole min", constant=True)
K_si = enzmldoc.addGlobalParameter("K_si", unit="mmole / l")
K_n = enzmldoc.addGlobalParameter("K_n", unit="mmole / l")
r1.model = mass_eq(product=["E", "7-ADCA"], substrate="E·7-ADCA", mapping={"K_eq": "K_n"})
r2.model = mass_eq(product=["E", "PGME"], substrate="E·PGME", mapping={"K_eq": "K_s"})
r3.model = eq_irrev(substrate="E·PGME", mapping={"K_1": "k_2"})
r4.model = mass_eq(product=["E·PGME", "PGME"], substrate="E·PGME·PGME", mapping={"K_eq": "K_si"})
r5.model = mass_eq(product=["EA", "PGME"], substrate="EA·PGME", mapping={"K_eq": "K_si"})
r6.model = eq_irrev(substrate="EA·PGME", mapping={"K_1": "k_6"})
r7.model = eq_irrev(substrate="EA", mapping={"K_1": "k_3"})
r8.model = mass_eq(product=["E", "PG"], substrate="E·PG", mapping={"K_eq": "K_pg"})
r9.model = mass_eq(product=["EA", "7-ADCA"], substrate="EA·7-ADCA", mapping={"K_eq": "K_n"})
r10.model = eq_irrev(substrate="EA·7-ADCA", mapping={"K_1": "k_5"})
r11.model = eq_rev(substrate="EA·7-ADCA", product="E·CEX", mapping={"K_1": "k_4", "K_2": "k_4b"})
r12.model = mass_eq(substrate="E·CEX", product=["E", "CEX"], mapping={"K_eq": "K_p"})
r13.model = eq_irrev(substrate="E", mapping={"K_1": "k_d"} )
# Finally, add all reactions to the EnzymeML document
reaction_ids = enzmldoc.addReactions(
[r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13]
)
enzmldoc.printReactionSchemes(by_name=True)
Name | equation | kinetic law | |
---|---|---|---|
ID | |||
r0 | reaction-1 | 1.0 E·7-ADCA <=> 1.0 E + 1.0 7-ADCA | v_r*(K_n * E·7-ADCA - (E * 7-ADCA)) |
r1 | reaction-2 | 1.0 E·PGME <=> 1.0 E + 1.0 PGME | v_r*(K_s * E·PGME - (E * PGME)) |
r2 | reaction-3 | 1.0 E·PGME -> 1.0 EA | k_2 * E·PGME |
r3 | reaction-4 | 1.0 E·PGME·PGME <=> 1.0 E·PGME + 1.0 PGME | v_r*(K_si * E·PGME·PGME - (E·PGME * PGME)) |
r4 | reaction-5 | 1.0 EA·PGME <=> 1.0 EA + 1.0 PGME | v_r*(K_si * EA·PGME - (EA * PGME)) |
r5 | reaction-6 | 1.0 EA·PGME -> 1.0 E + 1.0 PG | k_6 * EA·PGME |
r6 | reaction-7 | 1.0 EA -> 1.0 E + 1.0 PG | k_3 * EA |
r7 | reaction-8 | 1.0 E·PG <=> 1.0 E + 1.0 PG | v_r*(K_pg * E·PG - (E * PG)) |
r8 | reaction-9 | 1.0 EA·7-ADCA <=> 1.0 EA + 1.0 7-ADCA | v_r*(K_n * EA·7-ADCA - (EA * 7-ADCA)) |
r9 | reaction-10 | 1.0 EA·7-ADCA -> 1.0 E + 1.0 PG + 1.0 7-ADCA | k_5 * EA·7-ADCA |
r10 | reaction-11 | 1.0 EA·7-ADCA <=> 1.0 E·CEX | k_4 * EA·7-ADCA - k_4b * E·CEX |
r11 | reaction-12 | 1.0 E·CEX <=> 1.0 E + 1.0 CEX | v_r*(K_p * E·CEX - (E * CEX)) |
r12 | Enzyme deactivation | 1.0 E -> 1.0 ED | k_d * E |
# Finally, write the document to an OMEX archive
enzmldoc.toFile(".", name="Model_4")
Archive was written to ./Model_4.omex
Now that the EnzymeMLDocument has been adapted to the micro-kinetic model, it can be modeled and optimized using PySCeS and COPASI. Since both modeling package interfaces are an integral part of PyEnzyme, called Thin Layer, a simple call to the corresponding Thin Layer object is necessary.
But before optimization, it might be necessary to define initial values. Since manipulating the KineticParameter initial_values attributes inside the script that generates the EnzymeMLDocument can get quite tedious, PyEnzyme offers an external data structure from within initial values can be applied. This way, the EnzymeML document is only modifed at optimization and remains untouched until then.
The initialization file is in the YAML format and contains all reactions and their parameters. In addtion, PyEnzyme offers a method to generate such a YAML file, which can be edited manually with the initial parameter values for the optimization.
# In addition, generate a blank YAML file to manually enter initial values for modeling
enzmldoc.generateInitialValueTemplate()
Using the PySCeS thin layer#
The PySCeS thin layer can be used in conjunction with the initialization YAML file. The thin layer will extract all necessary data, feed it into the simulation framework and iteratively optimizes the given parameters until convergence. At the end, the estimated parameters will be written to a new EnzymeMLDocument and saved.
from pyenzyme.thinlayers import ThinLayerPysces
First, the SBML model is converted to a PySCeS input file, this happens automatically. The warning about SBML Level 3 can be safely ignored as PySCeS has all the functionality required for this simulation.
# Initialize the layer
tl_pysces = ThinLayerPysces(
"Model_4.omex", init_file="EnzymeML_Lagerman_init_values_.yaml",
model_dir="pySCeS"
)
Check SBML support is at action level 2 SBML file is L3V2 *******ERRORS********* WARNING: Model is encoded as SBML Level 3, PySCeS only officially supports L2V5. *******ERRORS********* Possible errors detected in SBML conversion, Model may be incomplete. Please check the error log file "EnzymeML_Lagerman.xml-sbml_conversion_errors.txt" for details. *************************************************************** Issues encountered in SBML translation (model processed anyway) SBML source: pySCeS/EnzymeML_Lagerman.xml *************************************************************** Parameter units ignored for parameters: ['v_r', 'K_si', 'K_n'] Parameter units ignored for (local) parameters: ['K_s', 'k_2', 'k_6', 'k_3', 'K_pg', 'k_5', 'k_4', 'k_4b', 'K_p', 'k_d'] *************************************************************** Info: single compartment model: locating "r0" in default compartment Info: single compartment model: locating "r1" in default compartment Info: single compartment model: locating "r2" in default compartment Info: single compartment model: locating "r3" in default compartment Info: single compartment model: locating "r4" in default compartment Info: single compartment model: locating "r5" in default compartment Info: single compartment model: locating "r6" in default compartment Info: single compartment model: locating "r7" in default compartment Info: single compartment model: locating "r8" in default compartment Info: single compartment model: locating "r9" in default compartment Info: single compartment model: locating "r10" in default compartment Info: single compartment model: locating "r11" in default compartment Info: single compartment model: locating "r12" in default compartment Writing file: pySCeS/EnzymeML_Lagerman.xml.psc SBML2PSC in : pySCeS/EnzymeML_Lagerman.xml out: pySCeS/EnzymeML_Lagerman.xml.psc Assuming extension is .psc Using model directory: pySCeS pySCeS/EnzymeML_Lagerman.xml.psc loading ..... Parsing file: pySCeS/EnzymeML_Lagerman.xml.psc Info: No reagents have been fixed Calculating L matrix . . . . . . . done. Calculating K matrix . . . . . . . . . done.
Now, the optimization is run. Two different numerical integration
algorithms are available in PySCeS, i.e. LSODA
and CVODE
. Here
we choose CVODE
. The optimization algorithm can also be specified,
any of the algorithms available in LMFIT
(https://lmfit.github.io/lmfit-py/fitting.html#choosing-different-fitting-methods)
can be chosen.
# Run optimization
tl_pysces.model.mode_integrator='CVODE'
tl_opt = tl_pysces.optimize(method="least_squares")
tl_opt
Fit Statistics
fitting method | least_squares | |
# function evals | 18 | |
# data points | 320 | |
# variables | 12 | |
chi-square | 2951.22432 | |
reduced chi-square | 9.58189715 | |
Akaike info crit. | 734.929405 | |
Bayesian info crit. | 780.149257 |
Variables
name | value | standard error | relative error | initial value | min | max | vary |
---|---|---|---|---|---|---|---|
v_r | 1.0000e+09 | 0.00000000 | (0.00%) | 1000000000.0 | -inf | inf | False |
K_si | 5.39337342 | 0.93246815 | (17.29%) | 6.346729249 | 0.01000000 | 1000.00000 | True |
K_n | 6.50049502 | 1.55960885 | (23.99%) | 11.00542741 | 0.01000000 | 10000.0000 | True |
r1_K_s | 5.12950319 | 1.03650938 | (20.21%) | 5.676255758 | 0.01000000 | 1000.00000 | True |
r2_k_2 | 569452.681 | 92198.7563 | (16.19%) | 652085.5139 | 1.00000000 | 1000000.00 | True |
r5_k_6 | 249555.287 | 77303.5694 | (30.98%) | 393169.3931 | 1.00000000 | 10000000.0 | True |
r6_k_3 | 15.8995797 | 6.25028953 | (39.31%) | 8.93817854 | 1.00000000 | 1000000.00 | True |
r7_K_pg | 129.952300 | 29.9876732 | (23.08%) | 47.34446037 | 0.01000000 | 1000.00000 | True |
r9_k_5 | 884605.533 | 204522.629 | (23.12%) | 672295.823 | 1.00000000 | 1000000.00 | True |
r10_k_4 | 1577461.07 | 345178.098 | (21.88%) | 1870570.524 | 1.00000000 | 1.0000e+08 | True |
r10_k_4b | 36802.5461 | 5963.80588 | (16.20%) | 42451.1374 | 1.00000000 | 1.0000e+08 | True |
r11_K_p | 1.29574677 | 0.34043006 | (26.27%) | 0.9433184993 | 0.01000000 | 1000.00000 | True |
r12_k_d | 0.32920628 | 0.23138589 | (70.29%) | 0.5149464784 | 1.0000e-03 | 1.00000000 | True |
Correlations (unreported correlations are < 0.100)
K_si | r2_k_2 | -0.9611 |
r9_k_5 | r10_k_4 | 0.8630 |
r1_K_s | r2_k_2 | 0.6379 |
r1_K_s | r5_k_6 | -0.6288 |
K_si | r1_K_s | -0.5700 |
K_n | r5_k_6 | -0.5434 |
r10_k_4 | r11_K_p | -0.5423 |
K_n | r10_k_4b | -0.4756 |
r5_k_6 | r10_k_4 | 0.4694 |
r9_k_5 | r10_k_4b | -0.4659 |
r6_k_3 | r10_k_4 | -0.4555 |
r6_k_3 | r9_k_5 | -0.4515 |
K_n | r1_K_s | 0.4407 |
r5_k_6 | r11_K_p | -0.4210 |
r7_K_pg | r9_k_5 | -0.4191 |
K_n | r12_k_d | -0.3832 |
r1_K_s | r12_k_d | -0.3732 |
r1_K_s | r6_k_3 | -0.3514 |
r7_K_pg | r10_k_4 | -0.3464 |
K_si | r11_K_p | 0.3307 |
r1_K_s | r10_k_4b | -0.3208 |
r2_k_2 | r5_k_6 | -0.3110 |
r7_K_pg | r10_k_4b | 0.3106 |
r2_k_2 | r11_K_p | -0.2926 |
r5_k_6 | r12_k_d | 0.2885 |
r5_k_6 | r9_k_5 | 0.2838 |
r9_k_5 | r11_K_p | -0.2721 |
r5_k_6 | r10_k_4b | 0.2640 |
K_si | r10_k_4 | -0.2627 |
r10_k_4 | r10_k_4b | -0.2584 |
r5_k_6 | r7_K_pg | -0.2483 |
r10_k_4b | r12_k_d | 0.2313 |
K_si | r6_k_3 | 0.2301 |
r7_K_pg | r11_K_p | 0.2282 |
K_si | r5_k_6 | 0.2018 |
K_n | r7_K_pg | 0.1983 |
r2_k_2 | r6_k_3 | -0.1950 |
K_n | r11_K_p | 0.1950 |
r11_K_p | r12_k_d | -0.1878 |
r10_k_4b | r11_K_p | 0.1768 |
K_n | r6_k_3 | -0.1702 |
K_si | r12_k_d | 0.1581 |
r7_K_pg | r12_k_d | -0.1562 |
r6_k_3 | r12_k_d | -0.1559 |
K_si | r10_k_4b | 0.1410 |
r2_k_2 | r10_k_4 | 0.1405 |
K_si | r9_k_5 | -0.1372 |
r1_K_s | r7_K_pg | 0.1354 |
r1_K_s | r11_K_p | 0.1068 |
Finally, the result is written to a new EnzymeML Document and saved to file.
# Write to new EnzymeMLDocument and save
nu_doc = tl_pysces.write()
nu_doc.toFile(".", name="EnzymeML_Lagerman_M4_PySCeS_Modeled")
Archive was written to ./EnzymeML_Lagerman_M4_PySCeS_Modeled.omex