Setting up an EnzymeML document#
In order to write an EnzymeML document it needs to be initialized by
calling the EnzymeMLDocument
object. At this point it is possible to
add metadata such as a name, URL, DOI or PubmedID to the document. In
addition, it is necessary but not mandatory to add author information.
Please note, that for a Dataverse upload adding an author is mandatory
though.
EnzymeMLDocument
is the container object that stores all experiment information based on sub classes.Creator
carries the metadata about an author.addCreator
adds aCreator
object to theEnzymeMLdocument
import pyenzyme as pe
# Initialize the EnzymeML document
enzmldoc = pe.EnzymeMLDocument(name="Experiment")
# Add authors to the document
author = pe.Creator(given_name="Max", family_name="Mustermann", mail="max@mustermann.de")
author_id = enzmldoc.addCreator(author)
Documentation of a simple single substrate reaction#
PyEnzyme is capable to document complete experiments from planning to execution, modelling and ultimately database upload. For this, consider a simple single substrate enzyme-catalyzed reaction, given in the following:
In order to properly document each step, it is necessary to start with the definition of all entities. This is done by initializing the appropriate objects and their metadata. Since pyEnzyme is capable to report micro-kinetic models, it is possible to define intermediates that may not be directly observable, such as Enzyme-Substrate complexes. This facilitates mathematical modeling based on differential equations and time-course data and offers a flexible way that is independent of existing models.
The next steps involve definition of the following entities:
Type |
Name |
---|---|
Vessel |
Eppendorf Tube |
Protein |
Enzyme |
Reactant |
Substrate |
Reactant |
Product |
Complex |
ES |
Complex |
EP |
Tips and hints:
Use the addXYZ-functions to append information to an EnzymeML document
Add-Methods return the identifier, which can later be used to build reactions and models. Thus it is best when these are stored in a variable or data structure
PyEnzyme takes care of type checking and validation. Furthermore, technicalities such as unit-decomposition (used to convert unit scales properly) and identifier assignment are done within the backend. Hence, focus on what matters.
Vessels
Vessel
carries the metadata for vessels that are used.addVessel
adds aVessel
object to the document and returns the ID.
vessel = pe.Vessel(name="Eppendorf Tube", volume=10.0, unit="ml")
vessel_id = enzmldoc.addVessel(vessel)
Proteins
Protein
carries the metadata for proteins that are part of the experiment.addProtein
adds aProtein
object to the document and returns the ID.
enzyme = pe.Protein(name="Enzyme", vessel_id=vessel_id,
sequence="MAVKLT", constant=False)
enzyme_id = enzmldoc.addProtein(enzyme)
Reactants
Reactant
carries the metadata for reactants that are part of the experiment.addReactant
adds aReactant
object to the document and returns the ID.
substrate = pe.Reactant(name="Substrate", vessel_id=vessel_id)
substrate_id = enzmldoc.addReactant(substrate)
product = pe.Reactant(name="Product", vessel_id=vessel_id)
product_id = enzmldoc.addReactant(product)
Complexes
Complex
carries the metadata for complexes that are part of the experiment.addComplex
adds aComplex
object to the document and returns the ID.
es_complex_id = enzmldoc.addComplex(
name="ES",
vessel_id=vessel_id,
participants=[enzyme_id, substrate_id]
)
ep_complex_id = enzmldoc.addComplex(
name="EP",
vessel_id=vessel_id,
participants=[enzyme_id, product_id]
)
Building the reaction network#
In order for the micro-kinetic model to be accessible to various
modeling platforms and EnzymeML, each reaction in the model has to be
documented. Similar to the previous step, this involves the creation of
EnzymeReaction
objects which will be added to the EnzymeML document.
Hence, the following part-reactions need to be defined:
\(\ce{Substrate + Enzyme \rightleftharpoons [ES] }\)
\(\ce{[ES] \rightleftharpoons [EP]}\)
\(\ce{[EP] \rightleftharpoons Product + Enzyme}\)
Tips and hints:
Add-methods require the
EnzymeMLDocument
object to be added. This is necessary to check, whether given identifiers already exist to mitigate later errors.Similar to the other add-methods,
addReaction
returns the given identifier. Thus it is best to store these in variables or data structures too.At this point, kinetic laws can be added to the reaction, but in this example we’ll add them afterwards.
Creating reactions from equations#
PyEnzyme offers two ways of creating EnzymeReaction
objects:
By using an equation that either includes the ID or name of a Reactant/Protein/Complex
By using
add
-methods to build up all elements
Both methods will result in the same EnzymeReaction
object, but both
might shine individually in different contexts. For instance, when
working on a single experiment using a Jupyter Notebook, initialization
by equation improves readability and reduces boilerplate code. On the
other hand, if your application is meant to maintain a variety of
reaction i.e. using PyEnzyme as a backend for an Electronic Lab
Notebook, using the add
-methods should prove to be more flexible and
safe than manipulating string.
# Setting up by equation
reaction_1 = pe.EnzymeReaction.fromEquation("Substrate + Enzyme = ES", "reaction-1", enzmldoc)
reaction_2 = pe.EnzymeReaction.fromEquation("ES = EP", "reaction-2", enzmldoc)
reaction_3 = pe.EnzymeReaction.fromEquation("EP = Product", "reaction-3", enzmldoc)
# Setting up via add-methods (only for reaction 1 for demo)
reaction_1 = pe.EnzymeReaction(name="reaction-1", reversible=True)
# Add each element
reaction_1.addEduct(species_id=substrate_id, stoichiometry=1.0,
enzmldoc=enzmldoc)
reaction_1.addEduct(species_id=enzyme_id, stoichiometry=1.0,
enzmldoc=enzmldoc)
reaction_1.addProduct(species_id=product_id, stoichiometry=1.0,
enzmldoc=enzmldoc)
Adding the reaction to the document#
addReaction
adds anEnzymeReaction
object to the document and returns the ID. (not used here)addReactions
adds multiples ofEnzymeReaction
objects to the document and returns an ID mapping. (used here)
# Finally, add al reactions to the document
reaction_ids = enzmldoc.addReactions([reaction_1, reaction_2, reaction_3])
reaction_ids
{'reaction-1': 'r0', 'reaction-2': 'r1', 'reaction-3': 'r2'}
Documenting measurement setups#
Now that the theoretical foundation of the experiment has been layed out, it is time to specify the setup of teh measurement. PyEnzyme offers a lab-like system to document such setups. Typically, experiments involve multiple runs with varying initial concentrations of every element that occurs in the reaction network or/and varying conditions such as temperature and pH. Hence, PyEnzyme builts on top of a measurement system, where each of these represent a ‘run’.
In this example, the following setups will be tracked including changing inital concentrations and temperatures:
Measurement Name |
Species |
Initial concentration |
Unit |
Temperature |
pH |
---|---|---|---|---|---|
Run 1 |
Substrate |
10.0 |
mmole / l |
37.0 °C |
7.4 |
Run 1 |
Enzyme |
20.0 |
fmole / l |
37.0 °C |
7.4 |
Run 1 |
Product |
0.0 |
mmole / l |
37.0 °C |
7.4 |
Run 2 |
Substrate |
100.0 |
mmole / l |
39.0 °C |
7.4 |
Run 2 |
Enzyme |
40.0 |
fmole / l |
39.0 °C |
7.4 |
Run 2 |
Product |
0.0 |
mmole / l |
39.0 °C |
7.4 |
Measurement 1: ‘Run 1’
Measurement
carries the metadata for measurements that are conducted in the experiment.addData
appends measurement data to theMeasurement
object and checks consistency.
measurement_1 = pe.Measurement(
name="Run 1", temperature=37.0, temperature_unit="C",
ph=7.4, global_time_unit="mins"
)
# Add each entity that will be measured
measurement_1.addData(reactant_id=substrate_id,
init_conc=10.0, unit="mmole / l")
measurement_1.addData(reactant_id=product_id,
unit="mmole / l")
measurement_1.addData(protein_id=enzyme_id, init_conc=20.0,
unit="fmole / l")
# Add it to the EnzymeML document
meas_1_id = enzmldoc.addMeasurement(measurement_1)
Measurement 2: ‘Run 2’
Measurement
carries the metadata for measurements that are conducted in the experiment.addData
appends measurement data to theMeasurement
object and checks consistency.
measurement_2 = pe.Measurement(
name="Run 2", temperature=39.0, temperature_unit="C",
ph=7.4, global_time_unit="mins"
)
# Add each entity that will be measured
measurement_2.addData(reactant_id=substrate_id,
init_conc=100.0, unit="mmole / l")
measurement_2.addData(reactant_id=product_id,
unit="mmole / l")
measurement_2.addData(protein_id=enzyme_id,
init_conc=40.0, unit="fmole / l")
# Add it to the EnzymeML document
meas_2_id = enzmldoc.addMeasurement(measurement_2)
# Check the measurement table
print(enzmldoc.printMeasurements())
>>> Measurement m0: Run 1
s0 | initial conc: 10.0 mmole / l | #replicates: 0
s1 | initial conc: 0.0 mmole / l | #replicates: 0
p0 | initial conc: 20.0 fmole / l | #replicates: 0
>>> Measurement m1: Run 2
s0 | initial conc: 100.0 mmole / l | #replicates: 0
s1 | initial conc: 0.0 mmole / l | #replicates: 0
p0 | initial conc: 40.0 fmole / l | #replicates: 0
None
Adding experimental raw data#
After the setup has been defined in terms of measurements, the actual
time-course data can be generated and added to the documemnt. PyEnzyme
offers a Replicate
class as a container for raw data that aside from
raw data carries metadata describing the replicate itself.
In the following example, replication data will be hard-coded and added to our measurement of choice. For this our digital lab measured the product formation as well as substrate depletion for each measurement setup.
Data for ‘Run 1’
Replicate
carries the tim-courses and metadata for each measured entity.addReplicates
addsReplicate
objects to a measurement to the correspondingMeasurementData
container where the concentrations are also stored.
repl_substrate_1 = pe.Replicate(
id="repl_substrate_1",
species_id=substrate_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[5,4,3,2,1,0]
)
repl_product_1 = pe.Replicate(
id="repl_product_1",
species_id=product_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[0,1,2,3,4,5]
)
# Add it to the first measurement 'Run 1'
measurement = enzmldoc.getMeasurement(meas_1_id)
measurement.addReplicates([repl_product_1, repl_substrate_1], enzmldoc=enzmldoc)
Data for ‘Run 2’
Replicate
carries the tim-courses and metadata for each measured entity.addReplicates
addsReplicate
objects to a measurement to the correspondingMeasurementData
container where the concentrations are also stored.
repl_substrate_2 = pe.Replicate(
id="repl_substrate_2",
species_id=substrate_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[50,40,30,20,10,0]
)
repl_product_2 = pe.Replicate(
id="repl_product_2",
species_id=product_id,
data_unit="mmole / l",
time_unit="min",
time=[1,2,3,4,5,6],
data=[0,10,20,30,40,50]
)
# Add it to the first measurement 'Run 2'
measurement = enzmldoc.getMeasurement(meas_2_id)
measurement.addReplicates([repl_product_2, repl_substrate_2], enzmldoc=enzmldoc)
Saving and distributing an EnzymeML document#
Finally, the experiment has been finished and meta- as well as raw-data been documented. In order to make the data exchangable, PyEnzyme offers several options for data export. First and foremost, the complete experiment can be exported to EnzymeML which is SBML compatible and thus accessible by SBML-based modeling tools (e.g. COPASI, PySCeS). Furthermore, in regard of the web, PyEnzyme offers a JSON export too.
Apart from raw exports, PyEnzyme can also interface with the federated databases system Dataverse by providing a simple upload method that automatically uploads and processes the document contents to a Dataverse compatible format. Please note, the Dataverse must support the ‘EnzymeML’ metadatablock for a successful upload.
Export
toFile
writes the EnzymeML document to an OMEX archive at the specified path.json
converts the EnzymeML document to a JSON string, which in turn can be used for REST interfaces or data storage.toXMLString
returns the XML representation that is also found in the OMEX archive.
# To an OMEX archive
enzmldoc.toFile(".", name="My_Experiment")
# To a JSON string
with open("My_Experiment.json", "w") as file_handle:
file_handle.write(enzmldoc.json(indent=2))
# To an XML string
xml_string = enzmldoc.toXMLString()
Archive was written to ./My_Experiment.omex
Upload
uploadToDataverse
uploads the document to a Dataverse installation.Please note, that in order to work, your environment should contain these variables
DATAVERSE_URL
: The URL to your installation.DATAVERSE_API_TOKEN
: The API Token to access the dataverse.
# Uncomment if you want to use this on your own Dataverse
# enzmldoc.uploadToDataverse(dataverse_name="playground")
Loading and editing#
It is not expected to create an EnzymeML document in a single session,
but to let it evolve over the course of an experiment. Thus it is
necessary to load and edit an EnzymeML document, without re-creating
everything from start. PyEnzyme’s EnzymeMLDocument
object offers an
initialization method fromFile
to edit an already existing document.
In addition, it is also possible to use the aforementioned JSON
Tips and hints:
PyEnzyme stores a history in the document, which keeps track of what has been changed and added in the course of an experiment. This is done, to spot potential errors and facilitate teh documentation of an experiment’s lifeline.
# Load an EnzymeML document from OMEX
enzmldoc = pe.EnzymeMLDocument.fromFile("./My_Experiment.omex")
# Load an EnzymeML document from JSON
json_string = open("My_Experiment.json").read()
enzmldoc = enzmldoc.fromJSON(json_string)
enzmldoc.printDocument(measurements=True)
Experiment
>>> Reactants
ID: s0 Name: Substrate
ID: s1 Name: Product
>>> Proteins
ID: p0 Name: Enzyme
>>> Complexes
ID: c0 Name: ES
ID: c1 Name: EP
>>> Reactions
ID: r0 Name: reaction-1
ID: r1 Name: reaction-2
ID: r2 Name: reaction-3
>>> Measurements
>>> Measurement m0: Run 1
s0 | initial conc: 10.0 mmole / l | #replicates: 1
s1 | initial conc: 0.0 mmole / l | #replicates: 1
p0 | initial conc: 20.0 fmole / l | #replicates: 0
>>> Measurement m1: Run 2
s0 | initial conc: 100.0 mmole / l | #replicates: 1
s1 | initial conc: 0.0 mmole / l | #replicates: 1
p0 | initial conc: 40.0 fmole / l | #replicates: 0
Apart from programmatic creation of an EnzymeML document, PyEnzyme offers a way to convert the ‘EnzymeML spreadhseet template’ to an OMEX file. Since spreadsheets are the bread and butter of current lab documentation, the template widely covers teh data model and thus provides an easy access to EnzymeML’s capabilities.
# Similar to the OMEX and JSON loaders, its a simple call
enzmldoc = pe.EnzymeMLDocument.fromTemplate("EnzymeML_Template_Example.xlsm")
enzmldoc.printDocument()
Building on top of the previous section about loading an EnzymeML document, this example will demonstrate how to interact with an already created EnzymeML document using the OMEX loader. Since the purpose of an experiment is to generate data from a theory, modeling takes care of the interpretation of an experiment outcome. However, PyEnzyme and EnzymeML are no modeling platforms, but provides a convinient way to interface to such. Hence, this example will demonstrate how such an interfacing could look like.
The enzyme-catalyzed reaction that has been reported in the course of this example obviously follows a simple Michaelis-Menten-Kinetic and thus will be reported as such. But first of all, the next part will demonstrate how measurement data can be exported to be used by a modeling framework/platform.
First, load the EnzymeML document:
# Load the EnzymeML document
enzmldoc = pe.EnzymeMLDocument.fromFile("My_Experiment.omex")
In order to get the measurement data given in the document, the
EnzymeMLDocument
object offers the exportMeasurementData
-method
which will export the data of ‘all’ or specified measurements to a
Pandas DataFrame object.
# Get the data from measurement "m0" ...
meas_data = enzmldoc.exportMeasurementData(measurement_ids="m0")
# Which is a dict containing "data" and "initConc" information, where data is the part we want
meas_data = meas_data["data"]
meas_data
s0 | s1 | time | |
---|---|---|---|
0 | 5.0 | 0.0 | 1.0 |
1 | 4.0 | 1.0 | 2.0 |
2 | 3.0 | 2.0 | 3.0 |
3 | 2.0 | 3.0 | 4.0 |
4 | 1.0 | 4.0 | 5.0 |
5 | 0.0 | 5.0 | 6.0 |
The given output could now be used for an optimizer in conjunction with the metadata that is given in the EnzymeML document. In order to gather stoichiometries and such, one can access other data in the document by specifically exporting the desired reactions:
for reaction in enzmldoc.reaction_dict.values():
# Every entity of an EnzymeML document is stored in its corresponding
# dictionary. This example serves as a get-go solution to access all
# other objects
educts = reaction.educts
products = reaction.products
# From this point on, a modeling framework/platform could derive important metadata
Assuming the modeling has now been done, the estimated parameters for the desired reactions can now be added to each reaction. Since this is an example for demonstration, this will only be carried out for the first reaction by using a Michaelis-Menten-Model.
from pyenzyme.enzymeml.models import MichaelisMentenKCat
# Get the appropriate IDs by using the getter-methods
substrate = enzmldoc.getReactant("Substrate")
enzyme = enzmldoc.getProtein("Enzyme")
# Create the model
model = MichaelisMentenKCat(
substrate=substrate.id,
protein=enzyme.id,
k_cat={"value": 10.0, "unit": "1 / s"},
k_m={"value": 20.0, "unit": "mmole / l"},
enzmldoc=enzmldoc
)
# Add it to 'Reaction 1'
reaction = enzmldoc.getReaction("reaction-1")
reaction.model = model
Finally, write the EnzymeML document to an OMEX file or upload it to a database, such as described in the corresponding section. Please note, that this is a minimal example to demonstrate the capabilities of PyEnzyme. However, if you like to inspect an actual interface implementation to modeling platforms, please inspect the Thin Layer implementations for COPASI and PySCeS in the GitHub repository and examples in the “EnzymeML at Work” repository.