graphein.utils#
Utils#
Utilities for working with graph objects.
- graphein.utils.utils.annotate_edge_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#
- Annotates Graph edges with edge metadata. Each function in - funcsmust take the three arguments- u,- vand- d, where- uand- vare the nodes of the edge, and- dis the edge data dictionary.- Additional parameters can be provided by using partial functions. - Parameters
- G (nx.Graph) – Graph to add edge metadata to 
- funcs (List[Callable]) – List of edge metadata annotation functions 
 
- Returns
- Graph with edge metadata added 
- Return type
- nx.Graph 
 
- graphein.utils.utils.annotate_graph_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#
- Annotates graph with graph-level metadata - Parameters
- G (nx.Graph) – Graph on which to add graph-level metadata to 
- funcs (List[Callable]) – List of graph metadata annotation functions 
 
- Returns
- Graph on which with node metadata added 
- Return type
- nx.Graph 
 
- graphein.utils.utils.annotate_node_features(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#
- Annotates nodes with features data. Note: passes whole graph to function. - Parameters
- G (nx.Graph) – Graph to add node features to 
- funcs (List[Callable]) – List of node feature annotation functions 
 
- Returns
- Graph with node features added 
- Return type
- nx.Graph 
 
- graphein.utils.utils.annotate_node_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#
- Annotates nodes with metadata. Each function in - funcsmust take two arguments- nand- d, where- nis the node and- dis the node data dictionary.- Additional parameters can be provided by using partial functions. - Parameters
- G (nx.Graph) – Graph to add node metadata to 
- funcs (List[Callable]) – List of node metadata annotation functions 
 
- Returns
- Graph with node metadata added 
- Return type
- nx.Graph 
 
- graphein.utils.utils.compute_edges(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#
- Computes edges for an Graph from a list of edge construction functions. Each func in - funcsmust take an- nx.Graphand return an- nx.Graph.- Parameters
- G (nx.Graph) – Graph to add features to 
- funcs (List[Callable]) – List of edge construction functions 
 
- Returns
- Graph with edges added 
- Return type
- nx.Graph 
 
- graphein.utils.utils.filter_dataframe(df: pandas.core.frame.DataFrame, funcs: List[Callable]) pandas.core.frame.DataFrame[source]#
- Applies transformation functions to a dataframe. Each function in - funcsmust accept a- pd.DataFrameand return a- pd.DataFrame.- Additional parameters can be provided by using partial functions. - Parameters
- df (pd.DataFrame) – Dataframe to apply transformations to. 
- funcs (List[Callable]) – List of transformation functions. 
 
- Return type
- nx.Graph 
 
- graphein.utils.utils.format_adjacency(G: networkx.classes.graph.Graph, adj: numpy.ndarray, name: str) xarray.core.dataarray.DataArray[source]#
- Format adjacency matrix nicely. - Intended to be used when computing an adjacency-like matrix of a graph object - G. For example, in defining a func:- def my_adj_matrix_func(G): adj = some_adj_func(G) return format_adjacency(G, adj, "xarray_coord_name") - Assumptions - adjshould be a 2D matrix of shape- (n_nodes, n_nodes)
 - #. - nameis something that is unique amongst all names used in the final adjacency tensor.- Parameters
- G – NetworkX-compatible Graph 
- adj (np.ndarray) – 2D numpy array of shape - (n_nodes, n_nodes)
- name (str) – A unique name for the kind of adjacency matrix being constructed. Gets used in xarray as a coordinate in the - "name"dimension.
 
- Returns
- An XArray DataArray of shape - (n_nodes, n_nodes, 1)
- Return type
- xr.DataArray 
 
- graphein.utils.utils.generate_adjacency_tensor(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) xarray.core.dataarray.DataArray[source]#
- Generate adjacency tensor for a graph. - Uses the collection of functions in - funcsto build an xarray DataArray that houses the resulting “adjacency tensor”.- A key design choice: We default to returning xarray DataArrays, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching - return_array=True.- Parameters
- G (nx.Graph) – NetworkX Graph. 
- funcs (List[Callable]) – A list of functions that take in G and return an xr.DataArray 
 
- Returns
- xr.DataArray, which is of shape - (n_nodes, n_nodes, n_funcs).
- Return type
- xr.DataArray 
 
- graphein.utils.utils.generate_feature_dataframe(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) pandas.core.frame.DataFrame[source]#
- Return a pandas DataFrame representation of node metadata. - funcshas to be list of callables whose signature is- f(n, d) -> pd.Series - where - nis the graph node,- dis the node metadata dictionary. The function must return a pandas Series whose name is the node.- Example function: - def x_vec(n: Hashable, d: Dict[Hashable, Any]) -> pd.Series: return pd.Series({"x_coord": d["x_coord"]}, name=n) - One fairly strong assumption is that each func has all the information it needs to act stored on the metadata dictionary. If you need to reference an external piece of information, such as a dictionary to look up values, set up the function to accept the dictionary, and use - functools.partialto “reduce” the function signature to just- (n, d). An example below:- from functools import partial def get_molweight(n, d, mw_dict): return pd.Series({"mw": mw_dict[d["amino_acid"]]}, name=n) mw_dict = {"PHE": 165, "GLY": 75, ...} get_molweight_func = partial(get_molweight, mw_dict=mw_dict) generate_feature_dataframe(G, [get_molweight_func]) - The - name=npiece is important; the- namebecomes the row index in the resulting dataframe.- The series that is returned from each function need not only contain one key-value pair. You can have two or more, and that’s completely fine; each key becomes a column in the resulting dataframe. - A key design choice: We default to returning DataFrames, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching - return_array=True.- Parameters
- G (nx.Graph) – A NetworkX-compatible graph object. 
- funcs (List[Callable]) – A list of functions. 
- return_array (bool) – Whether or not to return a NumPy array version of the data. Useful for consumption in tensor libs, like PyTorch or JAX. 
 
- Returns
- pandas DataFrame representation of node metadata. 
- Return type
- pd.DataFrame 
 
- graphein.utils.utils.import_message(submodule: str, package: str, conda_channel: Optional[str] = None, pip_install: bool = False) str[source]#
- Return warning if package is not found. Generic message for indicating to the user when a function relies on an optional module / package that is not currently installed. Includes installation instructions. Typically used in conjunction without optional featurisation libraries - Parameters
- submodule (str) – graphein submodule that needs an external dependency. 
- package (str) – External package this submodule relies on. 
- conda_channel (str, optional) – Conda channel package can be installed from, if at all. Defaults to None 
- pip_install (bool) – Whether package can be installed via pip. Defaults to False 
 
 
- graphein.utils.utils.onek_encoding_unk(x: Iterable[Any], allowable_set: List[Any]) List[bool][source]#
- Function for perfroming one hot encoding - Parameters
- x (Iterable[Any]) – values to one-hot 
- allowable_set (List[Any]) – set of options to encode 
 
- Returns
- one-hot encoding as list 
- Return type
- List[bool] 
 
- graphein.utils.utils.ping(host: str) bool[source]#
- Returns - Trueif host (str) responds to a ping request. Remember that a host may not respond to a ping (ICMP) request even if the host name is valid.
- graphein.utils.utils.protein_letters_3to1_all_caps(amino_acid: str) str[source]#
- Converts capitalised 3 letter amino acid code to single letter. Not provided in default biopython. 
Testing utilities for the Graphein library.
- graphein.testing.utils.compare_approximate(first, second)[source]#
- Return whether two dicts of arrays are approximates equal. 
- graphein.testing.utils.compare_exact(first: Dict[str, Any], second: Dict[str, Any]) bool[source]#
- Return whether two dicts of arrays are exactly equal. 
- graphein.testing.utils.edge_data_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph, comparison_func: typing.Callable = <function compare_exact>) bool[source]#
- Checks whether two graphs have the same edge features. - Parameters
- g ( - networkx.Graph) – The first graph.
- h ( - networkx.Graph) – The second graph.
- comparison_func – Matching function for edge features. Takes two edge feature dictionaries and returns - Trueif they are equal. Defaults to- compare_exact()
 
- Returns
- Trueif the graphs have the same node features,- Falseotherwise.
- Return type
 
- graphein.testing.utils.edges_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#
- Checks whether two graphs have the same edges. - Parameters
- g ( - networkx.Graph) – The first graph.
- h ( - networkx.Graph) – The second graph.
 
- Raises
- AssertionError – If the graphs do not contain the same nodes 
 
- graphein.testing.utils.graphs_isomorphic(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#
- Checks for structural isomorphism between two graphs: - gand- h.- Parameters
- g ( - networkx.Graph) – The first graph.
- h ( - networkx.Graph) – The second graph.
 
- Returns
- Trueif the graphs are isomorphic,- Falseotherwise.
- Return type
 
- graphein.testing.utils.nodes_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#
- Checks whether two graphs have the same nodes. - Parameters
- g ( - networkx.Graph) – The first graph.
- h ( - networkx.Graph) – The second graph.
 
- Raises
- AssertionError – If the graphs do not contain the same nodes 
 
CLI & Config#
Yaml parser for config objects
- graphein.utils.config_parser.config_constructor(loader: yaml.loader.FullLoader, node: yaml.nodes.MappingNode) pydantic.main.BaseModel[source]#
- Construct a BaseModel config. - Parameters
- loader – Given yaml loader 
- type – yaml.FullLoader 
- loader – A mapping node 
- type – yaml.nodes.MappingNode 
 
 
- graphein.utils.config_parser.function_constructor(loader: yaml.loader.FullLoader, tag_suffix: str, node: Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]) Callable[source]#
- Construct a Callable. If function parameters are given, this returns a partial function. - Parameters
- loader – Given yaml loader 
- type – yaml.FullLoader 
- tag_suffix – The name after the !func: tag 
- type – str 
- loader – A mapping node if function parameters are given, a scalar node if not 
- type – Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode] 
 
 
- graphein.utils.config_parser.get_loader() yaml.loader.Loader[source]#
- Add constructors to PyYAML loader. 
- graphein.utils.config_parser.parse_config(path: pathlib.Path) pydantic.main.BaseModel[source]#
- Parses a yaml configuration file into a config object. - Parameters
- path (pathlib.Path) – Path to configuration file 
 
Yaml parser for config objects
- class graphein.utils.config.PartialMatchOperator(regex_paths=None, types=None)[source]#
- Custom operator for deepdiff comparison. This operator compares whether the two partials are equal. 
- class graphein.utils.config.PathMatchOperator(regex_paths=None, types=None)[source]#
- Custom operator for deepdiff comparison. This operator compares whether the two pathlib Paths are equal. 
- graphein.utils.config.partial_functions_equal(func1: functools.partial, func2: functools.partial) bool[source]#
- Determine whether two partial functions are equal. - Parameters
- func1 (partial) – Partial function to check 
- func2 (partial) – Partial function to check 
 
- Returns
- Whether the two functions are equal 
- Return type