{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# obonet tutorial: import and analyze the Gene Ontology in Python\$n", "\n", "This tutotial shows:\n", "\n", "1. How to read the Gene Ontology OBO export into `networkx` using the [`obonet`](https://github.com/dhimmel/obonet) package.\n", "2. Simple tasks you can do with the `networkx.MultiDiGraph` data structure." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import networkx\$n", "import obonet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read the Gene Ontology\$n", "\n", "Learn more about the Gene Ontology (GO) downloads [here](http://geneontology.org/page/download-ontology). Note how we can read the OBO file from a URL. `obonet.read_obo` automically detects whether it's passed a local path, URL, or open file. In addition, `obonet.read_obo` will automtically decompress files ending in `.gz`, `.bz2`, or `.gz`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 4.46 s, sys: 209 ms, total: 4.67 s\$n", "Wall time: 6.8 s\$n" ] } ], "source": [ "%%time\$n", "url = 'http://purl.obolibrary.org/obo/go/go-basic.obo'\n", "graph = obonet.read_obo(url)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "44264" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Number of nodes\$n", "len(graph)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "87642" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Number of edges\$n", "graph.number_of_edges()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check if the ontology is a DAG\$n", "networkx.is_directed_acyclic_graph(graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lookup node properties\$n", "\n", "Returns a dictionary." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'name': 'phagocytosis',\n", " 'namespace': 'biological_process',\n", " 'def': '\"A vesicle-mediated transport process that results in the engulfment of external particulate material by phagocytes and their delivery to the lysosome. The particles are initially contained within phagocytic vacuoles (phagosomes), which then fuse with primary lysosomes to effect digestion of the particles.\" [ISBN:0198506732]',\n", " 'xref': ['Wikipedia:Phagocytosis'],\n", " 'is_a': ['GO:0016192']}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Retreive properties of phagocytosis\$n", "graph.nodes['GO:0006909']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'name': 'pilus shaft',\n", " 'namespace': 'cellular_component',\n", " 'def': '\"The long, slender, mid section of a pilus.\" [GOC:jl]',\n", " 'synonym': ['\"fimbrial shaft\" EXACT []'],\n", " 'is_a': ['GO:0110165'],\n", " 'relationship': ['part_of GO:0009289']}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Retreive properties of pilus shaft\$n", "graph.nodes['GO:0009418']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create name mappings\$n", "\n", "Note that for some OBO ontologies, some nodes only have an id and not a name ([see issue](https://github.com/dhimmel/obonet/issues/11))." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "id_to_name = {id_: data.get('name') for id_, data in graph.nodes(data=True)}\n", "name_to_id = {data['name']: id_ for id_, data in graph.nodes(data=True) if 'name' in data}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'myelination'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the name for GO:0042552\$n", "id_to_name['GO:0042552']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'GO:0042552'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the id for myelination\$n", "name_to_id['myelination']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find parent or child relationships" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "• pilus ⟶ is_a ⟶ cell projection\$n" ] } ], "source": [ "# Find edges to parent terms\$n", "node = name_to_id['pilus']\n", "for child, parent, key in graph.out_edges(node, keys=True):\n", " print(f'• {id_to_name[child]} ⟶ {key} ⟶ {id_to_name[parent]}')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "• pilus ⟵ part_of ⟵ pilus shaft\$n", "• pilus ⟵ part_of ⟵ pilus tip\$n", "• pilus ⟵ is_a ⟵ type IV pilus\$n" ] } ], "source": [ "# Find edges to children terms\$n", "node = name_to_id['pilus']\n", "for parent, child, key in graph.in_edges(node, keys=True):\n", " print(f'• {id_to_name[child]} ⟵ {key} ⟵ {id_to_name[parent]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find all superterms of myelination" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "ename": "NameError", "evalue": "name 'graph' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[1;32mc:\\opt\\logicmoo_workspace\\packs_sys\\logicmoo_opencog\\MeTTa\\flybase-learner\\go-obonet.ipynb Cell 19\u001b[0m line \u001b[0;36m\u001b[0;34m()\u001b[0m\$n\u001b[0;32m----> 1\u001b[0m \u001b[39msorted\u001b[39m(id_to_name[superterm] \u001b[39mfor\u001b[39;00m superterm \u001b[39min\u001b[39;00m networkx\u001b[39m.\u001b[39mdescendants(graph, \u001b[39m'\u001b[39m\u001b[39mGO:0042552\u001b[39m\u001b[39m'\u001b[39m))\n", "\u001b[0;31mNameError\u001b[0m: name 'graph' is not defined" ] } ], "source": [ "sorted(id_to_name[superterm] for superterm in networkx.descendants(graph, 'GO:0042552'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find all subterms of myelination" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['central nervous system myelin formation',\n", " 'central nervous system myelin maintenance',\n", " 'central nervous system myelination',\n", " 'myelin assembly',\n", " 'myelin maintenance',\n", " 'myelination in peripheral nervous system',\n", " 'myelination of anterior lateral line nerve axons',\n", " 'myelination of lateral line nerve axons',\n", " 'myelination of posterior lateral line nerve axons',\n", " 'negative regulation of myelination',\n", " 'paranodal junction assembly',\n", " 'peripheral nervous system myelin formation',\n", " 'peripheral nervous system myelin maintenance',\n", " 'positive regulation of myelination',\n", " 'regulation of myelination']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(id_to_name[subterm] for subterm in networkx.ancestors(graph, 'GO:0042552'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find all paths to the root" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "• starch binding ⟶ polysaccharide binding ⟶ carbohydrate binding ⟶ binding ⟶ molecular_function\$n" ] } ], "source": [ "paths = networkx.all_simple_paths(\n", " graph,\n", " source=name_to_id['starch binding'],\n", " target=name_to_id['molecular_function']\n", ")\n", "for path in paths:\n", " print('•', ' ⟶ '.join(id_to_name[node] for node in path))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See the ontology metadata" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "{'name': 'go',\n", " 'typedefs': [{'id': 'negatively_regulates',\n", " 'name': 'negatively regulates',\n", " 'namespace': 'external',\n", " 'xref': ['RO:0002212'],\n", " 'is_a': ['regulates']},\n", " {'id': 'part_of',\n", " 'name': 'part of',\n", " 'namespace': 'external',\n", " 'xref': ['BFO:0000050'],\n", " 'is_transitive': 'true'},\n", " {'id': 'positively_regulates',\n", " 'name': 'positively regulates',\n", " 'namespace': 'external',\n", " 'xref': ['RO:0002213'],\n", " 'holds_over_chain': ['negatively_regulates negatively_regulates'],\n", " 'is_a': ['regulates']},\n", " {'id': 'regulates',\n", " 'name': 'regulates',\n", " 'namespace': 'external',\n", " 'xref': ['RO:0002211'],\n", " 'is_transitive': 'true'},\n", " {'id': 'term_tracker_item',\n", " 'name': 'term tracker item',\n", " 'namespace': 'external',\n", " 'xref': ['IAO:0000233'],\n", " 'is_metadata_tag': 'true',\n", " 'is_class_level': 'true'}],\n", " 'instances': [],\n", " 'format-version': '1.2',\n", " 'data-version': 'releases/2020-10-09',\n", " 'subsetdef': ['chebi_ph7_3 \"Rhea list of ChEBI terms representing the major species at pH 7.3.\"',\n", " 'gocheck_do_not_annotate \"Term not to be used for direct annotation\"',\n", " 'gocheck_do_not_manually_annotate \"Term not to be used for direct manual annotation\"',\n", " 'goslim_agr \"AGR slim\"',\n", " 'goslim_aspergillus \"Aspergillus GO slim\"',\n", " 'goslim_candida \"Candida GO slim\"',\n", " 'goslim_chembl \"ChEMBL protein targets summary\"',\n", " 'goslim_drosophila \"Drosophila GO slim\"',\n", " 'goslim_flybase_ribbon \"FlyBase Drosophila GO ribbon slim\"',\n", " 'goslim_generic \"Generic GO slim\"',\n", " 'goslim_metagenomics \"Metagenomics GO slim\"',\n", " 'goslim_mouse \"Mouse GO slim\"',\n", " 'goslim_pir \"PIR GO slim\"',\n", " 'goslim_plant \"Plant GO slim\"',\n", " 'goslim_pombe \"Fission yeast GO slim\"',\n", " 'goslim_synapse \"synapse GO slim\"',\n", " 'goslim_yeast \"Yeast GO slim\"'],\n", " 'synonymtypedef': ['syngo_official_label \"label approved by the SynGO project\"',\n", " 'systematic_synonym \"Systematic synonym\" EXACT'],\n", " 'default-namespace': ['gene_ontology'],\n", " 'ontology': 'go'}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph.graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a dictionary of obsolete terms to their replacements" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "47281" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_with_obs = obonet.read_obo(url, ignore_obsolete=False)\n", "len(graph_with_obs)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[('GO:0000108', 'GO:0000109'),\n", " ('GO:0000174', 'GO:0000750'),\n", " ('GO:0000260', 'GO:0046961'),\n", " ('GO:0000261', 'GO:0046962'),\n", " ('GO:0000284', 'GO:0000753')]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "old_to_new = dict()\n", "for node, data in graph_with_obs.nodes(data=True):\n", " for replaced_by in data.get(\"replaced_by\", []):\n", " old_to_new[node] = replaced_by\$n", "list(old_to_new.items())[:5]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 2 }