Graphein - Atom Graph Tutorial#
In this notebook, we’ll run through residue-level graph construction in Graphein. We start by discsussing the config, the high-level API and spend the bulk of the tutorial running through the low-level API. First, let’s take a look at our protein of interest: here’s one I prepared earlier ;)
[1]:
# Install Graphein if necessary
# !pip install graphein
# Install pymol if necessary - in this tutorial PyMol is only used for the initial plot. Feel free to skip!
# sudo apt-get install pymol (recommended for colab) OR conda install -c schrodinger pymol
[2]:
# NBVAL_SKIP
from graphein.utils.pymol import MolViewer
pymol = MolViewer()
pymol.delete("all") # delete all objects from other sessions if necessary.
pymol.fetch('3eiy')
pymol.display()
[2]:
 
Config#
As per the residue graph tutorial, we first define a config object which parameterises the graph construction.
[3]:
from graphein.protein.config import ProteinGraphConfig
config = ProteinGraphConfig()
config.dict()
[3]:
{'granularity': 'CA',
 'keep_hets': False,
 'insertions': False,
 'pdb_dir': PosixPath('../examples/pdbs'),
 'verbose': False,
 'exclude_waters': True,
 'deprotonate': False,
 'protein_df_processing_functions': None,
 'edge_construction_functions': [<function graphein.protein.edges.distance.add_peptide_bonds(G: 'nx.Graph') -> 'nx.Graph'>],
 'node_metadata_functions': [<function graphein.protein.features.nodes.amino_acid.meiler_embedding(n, d, return_array: bool = False) -> Union[pandas.core.series.Series, <built-in function array>]>],
 'edge_metadata_functions': None,
 'graph_metadata_functions': None,
 'get_contacts_config': None,
 'dssp_config': None}
Let’s run through the config:
- granularity: specifies the granularity of the graph (i.e. what should the nodes be). Possible values are: atom identifiers (e.g.- "CA"for \(\alpha\) carbon,- "CB"for \(\beta\) carbon),- "centroid"to use residue centroids (under the hood, this is the same as “CA”, but we use the average x,y,z coordinates for the atoms in the residue) or- "atom"for atom-level construction. This is discussed here.
- keep_hets: this is a boolean specifying whether or not to keep heteroatoms present in the .pdb file. Heteroatoms are typically non-protein atoms (waters, metal ions, ligands) but can sometimes contain non-standard or modified residues.
- insertions: boolean specifying whether or not to keep insertions in the PDB file
- pdb_diroptional path to a folder in which to save pdb files. Otherwise,- /tmp/will be used
- verbose: bool controlling amount of info printed
- exclude_waters: not implemented
- deprotonate: bool indicating whether or not to remove Hydrogen atoms
- protein_df_processing_functions: list of functions with which to process the PDB dataframe. Discussed in the low-level API.
- edge_construction_functions: list of functions to compute edges with
- node_metadata_functions: list of functions to annotate nodes with
- edge_metadata_functions: list of functions to annotate edges with
- graph_meta_functions: list of functions to annotate graph with
- get_contacts_config: A separate config object if using GetContacts edge construction functions
Config for Atom-level Graphs#
We’ll now make the necessary changes to compute an atomic graph.
[4]:
from graphein.protein.edges.atomic import add_atomic_edges
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges]}
config = ProteinGraphConfig(**params_to_change)
config.dict()
[4]:
{'granularity': 'atom',
 'keep_hets': False,
 'insertions': False,
 'pdb_dir': PosixPath('../examples/pdbs'),
 'verbose': False,
 'exclude_waters': True,
 'deprotonate': False,
 'protein_df_processing_functions': None,
 'edge_construction_functions': [<function graphein.protein.edges.atomic.add_atomic_edges(G: networkx.classes.graph.Graph, tolerance: float = 0.56) -> networkx.classes.graph.Graph>],
 'node_metadata_functions': [<function graphein.protein.features.nodes.amino_acid.meiler_embedding(n, d, return_array: bool = False) -> Union[pandas.core.series.Series, <built-in function array>]>],
 'edge_metadata_functions': None,
 'graph_metadata_functions': None,
 'get_contacts_config': None,
 'dssp_config': None}
High-level API#
Graphein features a high-level API which should be applicable for most simple graph constructions. This can be used on either a .pdb file (so you can run whatever pre-processing you wish), or we can provide a PDB accession code and retrieve a structure from the PDB itself. If a path is provided, it takes precedence over the PDB code.
To use it we do as follows:
[5]:
from graphein.protein.graphs import construct_graph
g = construct_graph(config=config, pdb_code="3eiy")
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.features.nodes.amino_acid:Reading meiler embeddings from: /Users/arianjamasb/github/graphein/graphein/protein/features/nodes/meiler_embeddings.csv
To use a local file, you can run:
g = construct_graph(config=config, pdb_path="../examples/pdbs/3eiy.pdb")
[6]:
from graphein.protein.visualisation import plotly_protein_structure_graph
p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy). Nodes coloured by their Element"
    )
p.show()
As you can see, all the bond types are the same. In order to add bond order assignment, we need to pass the add_bond_order function to the list of edge functions. We do the same for assigning ring status with add_ring_status
[7]:
from graphein.protein.edges.atomic import add_atomic_edges, add_bond_order, add_ring_status
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, add_ring_status]}
config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")
p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy). Nodes coloured by their Element"
    )
p.show()
# Print some edges to verify
for i, (u, v, a) in enumerate(g.edges(data=True)):
    if i % 30 == 0:
        print(u, v, a)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
A:SER:2:N A:SER:2:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4964471256947236}
A:ASN:5:CG A:ASN:5:OD1 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2476606109034614}
A:LYS:10:C A:ASP:11:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3200488627319846}
A:GLN:14:CA A:GLN:14:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5587077981456363}
A:ASN:17:CA A:ASN:17:CB {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5510625390357422}
A:GLU:21:CA A:GLU:21:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.541464563329304}
A:GLN:25:CA A:GLN:25:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5395742918092665}
A:PRO:28:CG A:PRO:28:CD {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.5008284378968835}
A:GLU:32:N A:GLU:32:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4628581612719675}
A:LYS:35:CE A:LYS:35:NZ {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5094432748533508}
A:LEU:40:C A:VAL:41:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.327494256108103}
A:ARG:44:C A:PHE:45:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3319241720158084}
A:GLY:47:C A:THR:48:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3300627804731626}
A:ARG:51:CZ A:ARG:51:NH2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3355399657067553}
A:ASN:55:CA A:ASN:55:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.509162681754352}
A:PHE:58:CG A:PHE:58:CD1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.3973292382255516}
A:GLN:61:CD A:GLN:61:NE2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3256794484338952}
A:ASP:66:C A:GLY:67:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3207834796059468}
A:VAL:70:CB A:VAL:70:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.544997734626169}
A:VAL:74:CB A:VAL:74:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5372358309641367}
A:PHE:78:CG A:PHE:78:CD1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.4093633314372853}
A:ALA:82:N A:ALA:82:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4588163009782962}
A:ARG:87:CA A:ARG:87:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5217493223261191}
A:ALA:90:C A:LEU:91:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3322357148793165}
A:LYS:95:CA A:LYS:95:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5207899263211861}
A:ASP:98:CG A:ASP:98:OD2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2490592459927585}
A:ASP:103:C A:ASP:103:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2269449865417776}
A:VAL:107:C A:VAL:107:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2373241289169141}
A:HIS:111:CG A:HIS:111:ND1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.3857918314090316}
A:CYS:115:CB A:CYS:115:SG {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.8180277775655709}
A:ASN:120:CA A:ASN:120:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5075738787867117}
A:ILE:124:N A:ILE:124:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.449215994943473}
A:VAL:127:CB A:VAL:127:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5337962706956894}
A:LEU:131:C A:LEU:131:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2503483514605058}
A:GLN:134:CD A:GLN:134:NE2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.315718434924434}
A:PHE:138:CA A:PHE:138:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5332312284844685}
A:GLU:140:CD A:GLU:140:OE1 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2552963793463292}
A:LYS:143:CG A:LYS:143:CD {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.532886819044384}
A:GLY:148:N A:GLY:148:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4515767289399488}
A:VAL:151:CA A:VAL:151:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5020672421699368}
A:GLY:155:C A:TRP:156:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3299751877384798}
A:ILE:159:CA A:ILE:159:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5274563823559741}
A:HIS:163:CB A:HIS:163:CG {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.497479883003441}
A:ILE:166:CB A:ILE:166:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5218219343931156}
A:ALA:171:CA A:ALA:171:CB {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5277568523819485}
A:LYS:174:CD A:LYS:174:CE {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.513726857791723}
We can also include some of the distance-based functions we used in the residue graph construction. Here we add the residue-level delaunay triangulation to the graph.
[8]:
from functools import partial
from graphein.protein.edges.distance import add_delaunay_triangulation
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, partial(add_delaunay_triangulation)]}
config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")
p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy) with Delaunay Triangulation. Nodes coloured by their Element"
    )
p.show()
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.edges.distance:Detected 8582 simplices in the Delaunay Triangulation.
It’s also possible to apply the triangulation to certain atomtypes, such as \(C\alpha\) only partialling the add_delaunay_triangulation and specifying a list of allowable_nodes.
[9]:
from functools import partial
from graphein.protein.edges.distance import add_delaunay_triangulation
params_to_change = {
    "granularity": "atom",
    "edge_construction_functions": [add_atomic_edges,
                                    add_bond_order,
                                    partial(add_delaunay_triangulation, allowable_nodes=["CA"])
                                    ]
    }
config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")
p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy) with CA triangulation. Nodes coloured by their Element"
    )
p.show()
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.edges.distance:Detected 954 simplices in the Delaunay Triangulation.