Getting started with PyEnzyme
[1]:
from pyenzyme import (
EnzymeMLDocument, Creator, Vessel, Protein, Complex,
Reactant, EnzymeReaction, Measurement, MichaelisMenten,Replicate
)
Initializing an EnzymeML document
In order to write an EnzymeML document it needs to be initialized by calling the EnzymeMLDocument
object. At this point it is possible to add metadata such as a name, URL, DOI or PubmedID to the document. In addition, it is necessary but not mandatory to add author information. Please note, that for a Dataverse upload adding an author is mandatory though.
EnzymeMLDocument
is the container object that stores all experiment information based on sub classes.Creator
carries the metadata about an author.addCreator
adds aCreator
object to theEnzymeMLdocument
[2]:
# Initialize the EnzymeML document
enzmldoc = EnzymeMLDocument(name="Experiment")
# Add authors to the document
author = Creator(
given_name="Jan", family_name="Range",
mail="jan.range@simtech.uni-stuttgart.de")
author_id = enzmldoc.addCreator(author)
Simple single substrate reaction
PyEnzyme is capable to document complete experiments from planning to execution, modelling and ultimately database upload. For this, consider a simple single substrate enzyme-catalyzed reaction, given in the following:
In order to properly document each step, it is necessary to start with the definition of all entities. This is done by initializing the appropriate objects and their metadata. Since pyEnzyme is capable to report micro-kinetic models, it is possible to define intermediates that may not be directly observable, such as Enzyme-Substrate complexes. This facilitates mathematical modeling based on differential equations and time-course data and offers a flexible way that is independent of existing models.
The next steps involve definition of the following entities:
Type |
Name |
---|---|
Vessel |
Eppendorf Tube |
Protein |
Enzyme |
Reactant |
Substrate |
Reactant |
Product |
Complex |
Enzyme-Substrate-Complex |
Complex |
Enzyme-Product-Complex |
Tips and hints:
Use the addXYZ-functions to append information to an EnzymeML document
Add-Methods return the identifier, which can later be used to build reactions and models. Thus it is best when these are stored in a variable or data structure
PyEnzyme takes care of type checking and validation. Furthermore, technicalities such as unit-decomposition (used to convert unit scales properly) and identifier assignment are done within the backend. Hence, focus on what matters.
Vessels
Vessel
carries the metadata for vessels that are used.addVessel
adds aVessel
object to the document and returns the ID.
[3]:
vessel = Vessel(name="Eppendorf Tube", volume=10.0, unit="ml")
vessel_id = enzmldoc.addVessel(vessel)
Proteins
Protein
carries the metadata for proteins that are part of the experiment.addProtein
adds aProtein
object to the document and returns the ID.
[4]:
enzyme = Protein(name="Enzyme", vessel_id=vessel_id, sequence="MAVKLT")
enzyme_id = enzmldoc.addProtein(enzyme)
Reactants
Reactant
carries the metadata for reactants that are part of the experiment.addReactant
adds aReactant
object to the document and returns the ID.
[5]:
substrate = Reactant(name="Substrate", vessel_id=vessel_id)
substrate_id = enzmldoc.addReactant(substrate)
[6]:
product = Reactant(name="Product", vessel_id=vessel_id)
product_id = enzmldoc.addReactant(product)
Complexes
Complex
carries the metadata for complexes that are part of the experiment.addComplex
adds aComplex
object to the document and returns the ID.
[7]:
es_complex_id = enzmldoc.addComplex(
name="Enzyme-Substrate-Complex",
vessel_id=vessel_id,
participants=[enzyme_id, substrate_id]
)
[8]:
ep_complex_id = enzmldoc.addComplex(
name="Enzyme-Product-Complex",
vessel_id=vessel_id,
participants=[enzyme_id, product_id]
)
Building the reaction network
In order for the micro-kinetic model to be accessible to various modeling platforms and EnzymeML, each reaction in the model has to be documented. Similar to the previous step, this involves the creation of ÈnzymeReaction
objects which will be added to the EnzymeML document. Hence, the following part-reactions need to be defined:
\(\ce{Substrate + Enzyme \rightleftharpoons [ES] }\)
\(\ce{[ES] \rightleftharpoons [EP]}\)
\(\ce{[EP] \rightleftharpoons Product + Enzyme}\)
Tips and hints:
By using the addEduct/Product/Modifier-methods the reaction will be successivly created.
Add-methods require the
EnzymeMLDocument
object to be added. This is necessary to check, whether given identifiers already exist to mitigate later errors.Similar to the other add-methods,
addReaction
returns the given identifier. Thus it is best to store these in variables or data structures too.At this point, kinetic laws can be added to the reaction, but in this example we’ll add them afterwards.
Creating Reactions
EnzymeReaction
carries the metadata for reactions that are part of the experiment.
[9]:
reaction_1 = EnzymeReaction(name="Reaction 1", reversible=True)
# Add elements
reaction_1.addEduct(species_id=substrate_id, stoichiometry=1.0, enzmldoc=enzmldoc)
reaction_1.addEduct(species_id=enzyme_id, stoichiometry=1.0, enzmldoc=enzmldoc)
reaction_1.addProduct(species_id=es_complex_id, stoichiometry=1.0, enzmldoc=enzmldoc)
[10]:
reaction_2 = EnzymeReaction(name="Reaction 2", reversible=True)
# Add elements
reaction_2.addEduct(species_id=es_complex_id, stoichiometry=1.0, enzmldoc=enzmldoc)
reaction_2.addProduct(species_id=ep_complex_id, stoichiometry=1.0, enzmldoc=enzmldoc)
Alternatively, it is also possible to initialize a reaction by supplying a reaction equation that contains either the names or ID and optionally their stoichiometric coefficients. In addition, reversiiblity will be determined from the arrow in the equation. While ->
denotes an irreversible reaction, the string <=>
represents a reversible reaction. It is advised though to mainly use addXYZ
-methods in production based deployments, since these offer more felxibility. For usage in
i.e. Jupyter notebooks for a single use case, adding a reaction by equation may be superior to the latter in regard to readability and line efficiency.
[11]:
reaction_3 = EnzymeReaction.fromEquation(
equation="Enzyme-Product-Complex -> Enzyme + Product",
name="Reaction 3",
vessel_id=vessel_id,
enzmldoc=enzmldoc
)
Adding reactions
addReaction
adds anEnzymeReaction
object to the document and returns the ID.
[12]:
# Finally, add al reactions to the document
reaction_1_id = enzmldoc.addReaction(reaction_1)
reaction_2_id = enzmldoc.addReaction(reaction_2)
reaction_3_id = enzmldoc.addReaction(reaction_3)
Documenting measurement setups
Now that the theoretical foundation of the experiment has been layed out, it is time to specify the setup of teh measurement. PyEnzyme offers a lab-like system to document such setups. Typically, experiments involve multiple runs with varying initial concentrations of every element that occurs in the reaction network or/and varying conditions such as temperature and pH. Hence, PyEnzyme builts on top of a measurement system, where each of these represent a ‘run’.
In this example, the following setups will be tracked including changing inital concentrations and temperatures:
Measurement Name |
Species |
Initial concentration |
Unit |
Temperature |
pH |
---|---|---|---|---|---|
Run 1 |
Substrate |
10.0 |
mmole / l |
37.0 °C |
7.4 |
Run 1 |
Enzyme |
20.0 |
fmole / l |
37.0 °C |
7.4 |
Run 1 |
Product |
0.0 |
mmole / l |
37.0 °C |
7.4 |
Run 2 |
Substrate |
100.0 |
mmole / l |
39.0 °C |
7.4 |
Run 2 |
Enzyme |
40.0 |
fmole / l |
39.0 °C |
7.4 |
Run 2 |
Product |
0.0 |
mmole / l |
39.0 °C |
7.4 |
Measurement
carries the metadata for measurements that are conducted in the experiment.addData
appends measurement data to theMeasurement
object and checks consistency.
[13]:
measurement_1 = Measurement(name="Run 1", temperature=37.0, temperature_unit="C", ph=7.4, global_time_unit="mins")
# Add each entity that will be measured
measurement_1.addData(reactant_id=substrate_id, init_conc=10.0, unit="mmole / l")
measurement_1.addData(reactant_id=product_id, unit="mmole / l")
measurement_1.addData(protein_id=enzyme_id, init_conc=20.0, unit="fmole / l")
# Add it to the EnzymeML document
meas_1_id = enzmldoc.addMeasurement(measurement_1)
[14]:
measurement_2 = Measurement(name="Run 2", temperature=39.0, temperature_unit="C", ph=7.4, global_time_unit="mins")
# Add each entity that will be measured
measurement_2.addData(reactant_id=substrate_id, init_conc=100.0, unit="mmole / l")
measurement_2.addData(reactant_id=product_id, unit="mmole / l")
measurement_2.addData(protein_id=enzyme_id, init_conc=40.0, unit="fmole / l")
# Add it to the EnzymeML document
meas_2_id = enzmldoc.addMeasurement(measurement_2)
[15]:
# Check the measurement table
print(enzmldoc.printMeasurements())
ID Species Conc Unit
===============================
m0 p0 20 fmole / l
m0 s0 10 mmole / l
m0 s1 0 mmole / l
m1 p0 40 fmole / l
m1 s0 100 mmole / l
m1 s1 0 mmole / l
Adding experimental raw data
After the setup has been defined in terms of measurements, the actual time-course data can be generated and added to the documemnt. PyEnzyme offers a Replicate
class as a container for raw data that aside from raw data carries metadata describing the replicate itself.
In the following example, replication data will be hard-coded and added to our measurement of choice. For this our digital lab measured the product formation as well as substrate depletion for each measurement setup.
Replicate
carries the tim-courses and metadata for each measured entity.addReplicates
addsReplicate
objects to a measurement to the correspondingMeasurementData
container where the concentrations are also stored.
[16]:
repl_substrate_1 = Replicate(
id="repl_substrate_1",
species_id=substrate_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[5,4,3,2,1,0]
)
repl_product_1 = Replicate(
id="repl_product_1",
species_id=product_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[0,1,2,3,4,5]
)
[17]:
# Add it to the first measurement 'Run 1'
meas = enzmldoc.getMeasurement(meas_1_id)
meas.addReplicates([repl_product_1, repl_substrate_1], enzmldoc=enzmldoc)
[18]:
repl_substrate_2 = Replicate(
id="repl_substrate_2", species_id=substrate_id,
data_unit="mmole / l", time_unit="min",
time=[1,2,3,4,5,6], data=[50,40,30,20,10,0]
)
repl_product_2 = Replicate(
id="repl_product_2", species_id=product_id,
data_unit="mmole / l", time_unit="min",
time=[1,2,3,4,5,6], data=[0,10,20,30,40,50]
)
[19]:
# Add it to the first measurement 'Run 2'
meas = enzmldoc.getMeasurement(meas_2_id)
meas.addReplicates([repl_product_2, repl_substrate_2], enzmldoc=enzmldoc)
Saving and distributing an EnzymeML document
Finally, the experiment has been finished and meta- as well as raw-data been documented. In order to make the data exchangable, PyEnzyme offers several options for data export. First and foremost, the complete experiment can be exported to EnzymeML which is SBML compatible and thus accessible by SBML-based modeling tools (e.g. COPASI, PySCeS). Furthermore, in regard of the web, PyEnzyme offers a JSON export too.
Apart from raw exports, PyEnzyme can also interface with the federated databases system Dataverse by providing a simple upload method that automatically uploads and processes the document contents to a Dataverse compatible format. Please note, the Dataverse must support the ‘EnzymeML’ metadatablock for a successful upload.
Export
toFile
writes the EnzymeML document to an OMEX archive at the specified path.json
converts the EnzymeML document to a JSON string, which in turn can be used for REST interfaces or data storage.toXMLString
returns the XML representation that is also found in the OMEX archive.
[20]:
# To an OMEX archive
enzmldoc.toFile(".", name="My_Experiment")
# To a JSON string
with open("My_Experiment.json", "w") as file_handle:
file_handle.write(enzmldoc.json(indent=2))
# To an XML string
xml_string = enzmldoc.toXMLString()
Archive was written to ./My_Experiment.omex
Upload
uploadToDataverse
uploads the document to a Dataverse installation.Please note, that in order to work, your environment should contain these variables
DATAVERSE_URL
: The URL to your installation.DATAVERSE_API_TOKEN
: The API Token to access the dataverse.
[21]:
# enzmldoc.uploadToDataverse(dataverse_name="playground")
Loading an EnzymeML document
It is not expected to create an EnzymeML document in a single session, but to let it evolve over the course of an experiment. Thus it is necessary to load and edit an EnzymeML document, without re-creating everything from start. PyEnzyme’s EnzymeMLDocument
object offers an initialization method fromFile
to edit an already existing document. In addition, it is also possible to use the aforementioned JSON
Tips and hints:
PyEnzyme stores a history in the document, which keeps track of what has been changed and added in the course of an experiment. This is done, to spot potential errors and facilitate teh documentation of an experiment’s lifeline.
[22]:
# Load an EnzymeML document from OMEX
enzmldoc = EnzymeMLDocument.fromFile("./My_Experiment.omex")
# Load an EnzymeML document from JSON
json_string = open("My_Experiment.json").read()
enzmldoc = enzmldoc.fromJSON(json_string)
print(enzmldoc)
>>> Units
ID: u0 Name: ml
ID: u1 Name: mmole / l
ID: u2 Name: min
ID: u3 Name: K
ID: u4 Name: fmole / l
>>> Reactants
ID: s0 Name: Substrate
ID: s1 Name: Product
>>> Proteins
ID: p0 Name: Enzyme
>>> Reactions
ID: r0 Name: Reaction 1
ID: r1 Name: Reaction 2
ID: r2 Name: Reaction 3
>>> Measurements
ID Species Conc Unit
===============================
m0 p0 20 fmole / l
m0 s0 10 mmole / l
m0 s1 0 mmole / l
m1 p0 40 fmole / l
m1 s0 100 mmole / l
m1 s1 0 mmole / l
Apart from programmatic creation of an EnzymeML document, PyEnzyme offers a way to convert the ‘EnzymeML spreadhseet template’ to an OMEX file. Since spreadsheets are the bread and butter of current lab documentation, the template widely covers teh data model and thus provides an easy access to EnzymeML’s capabilities.
[23]:
# Similar to the OMEX and JSON loaders, its a simple call
enzmldoc = EnzymeMLDocument.fromTemplate("EnzymeML_Template_Example.xlsm")
print(enzmldoc)
/opt/homebrew/Caskroom/miniforge/base/envs/enzymeml/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:315: UserWarning: Data Validation extension is not supported and will be removed
warn(msg)
>>> Units
ID: u0 Name: ul
ID: u1 Name: K
ID: u2 Name: mmole / l
ID: u3 Name: sec
ID: u4 Name: umole / l
>>> Reactants
ID: s0 Name: Pyruvate
ID: s1 Name: Acetaldehyde
ID: s2 Name: CO2
>>> Proteins
ID: p0 Name: Pyruvate decarboxylase isozyme 1
>>> Reactions
ID: r0 Name: Pyruvate decarboxylation
>>> Measurements
ID Species Conc Unit
===============================
m0 p0 100 umole / l
m0 s0 1000 mmole / l
m0 s2 0 mmole / l
m0 s1 0 mmole / l
m1 p0 100 umole / l
m1 s0 1000 mmole / l
m1 s2 0 mmole / l
m1 s1 0 mmole / l
m2 p0 1 umole / l
m2 s0 1000 mmole / l
m2 s2 0 mmole / l
m2 s1 0 mmole / l
Editing EnzymeML: Kinetic Modeling
Building on top of the previous section about loading an EnzymeML document, this example will demonstrate how to interact with an already created EnzymeML document using the OMEX loader. Since the purpose of an experiment is to generate data from a theory, modeling takes care of the interpretation of an experiment outcome. However, PyEnzyme and EnzymeML are no modeling platforms, but provides a convinient way to interface to such. Hence, this example will demonstrate how such an interfacing could look like.
The enzyme-catalyzed reaction that has been reported in the course of this example obviously follows a simple Michaelis-Menten-Kinetic and thus will be reported as such. But first of all, the next part will demonstrate how measurement data can be exported to be used by a modeling framework/platform.
First, load the EnzymeML document:
[24]:
# Load the EnzymeML document
enzmldoc = EnzymeMLDocument.fromFile("My_Experiment.omex")
In order to get the measurement data given in the document, the EnzymeMLDocument
object offers the exportMeasurementData
-method which will export the data of ‘all’ or specified measurements to a Pandas DataFrame object.
[25]:
# Get the data from measurement "m0" ...
meas_data = enzmldoc.exportMeasurementData(measurement_ids=["m0"])["m0"]
# Which is a dict containing "data" and "initConc" information, where data is the part we want
meas_data = meas_data["data"]
meas_data
[25]:
time/min | repl_substrate_1/s0/mmole / l | repl_product_1/s1/mmole / l | |
---|---|---|---|
0 | 2.0 | 4.0 | 1.0 |
1 | 3.0 | 3.0 | 2.0 |
2 | 4.0 | 2.0 | 3.0 |
3 | 5.0 | 1.0 | 4.0 |
4 | 6.0 | 0.0 | 5.0 |
The given output could now be used for an optimizer in conjunction with the metadata that is given in the EnzymeML document. In order to gather stoichiometries and such, one can access other data in the document by specifically exporting the desired reactions:
[26]:
for reaction in enzmldoc.reaction_dict.values():
# Every entity of an EnzymeML document is stored in its corresponding
# dictionary. This example serves as a get-go solution to access all
# other objects
educts = reaction.educts
products = reaction.products
# From this point, a modeling framework/platform could derive important metadata
Assuming the modeling has now been done, the estimated parameters for the desired reactions can now be added to each reaction. Since this is an example for demonstration, this will only be carried out for the first reaction by using a Michaelis-Menten-Model.
[27]:
# Get the appropriate IDs by using the getter-methods
substrate_id = enzmldoc.getReactant("Substrate", by_id=False).id
enzyme_id = enzmldoc.getProtein("Enzyme", by_id=False).id
# Create the model
model = MichaelisMenten(
kcat_val=10.0,
kcat_unit="mmole / l s",
km_val=20.0,
km_unit="mmole / l",
protein_id=enzyme_id,
substrate_id=substrate_id,
enzmldoc=enzmldoc
)
# Add it to 'Reaction 1'
reaction = enzmldoc.getReaction("Reaction 1", by_id=False)
reaction.model = model
Finally, write the EnzymeML document to an OMEX file or upload it to a database, such as described in the corresponding section. Please note, that this is a minimal example to demonstrate the capabilities of PyEnzyme. However, if you like to inspect an actual interface implementation to modeling platforms, please inspect the Thin Layer implementations for COPASI and PySCeS in the GitHub repository.