{ "cells": [ { "cell_type": "markdown", "id": "c1dc1edc", "metadata": {}, "source": [ "# Load a knowledge base into an empty DAS" ] }, { "cell_type": "markdown", "id": "96908c71", "metadata": {}, "source": [ "This notebook shows how to start an empty DAS and load a knowledge base into it.\n", "\n", "The first cell just imports the relevant class and instantiates a DAS object." ] }, { "cell_type": "code", "execution_count": null, "id": "755c65e6", "metadata": {}, "outputs": [], "source": [ "from das.distributed_atom_space import DistributedAtomSpace\$n", "import warnings\$n", "# avoids an annoying warning message from the Couchbase lib\$n", "warnings.filterwarnings('ignore')\n", "das = DistributedAtomSpace()" ] }, { "cell_type": "markdown", "id": "70362877", "metadata": {}, "source": [ "Point `KNOWLEDGE_BASE` to the file or folder where the knowledge base is. No tarballs or zip files here, only plain `.metta` or `.scm` files or folders with multiple files." ] }, { "cell_type": "code", "execution_count": null, "id": "50b6daa4", "metadata": {}, "outputs": [], "source": [ "KNOWLEDGE_BASE = \"/tmp/samples\"" ] }, { "cell_type": "markdown", "id": "5efb59cb", "metadata": {}, "source": [ "Select between the two load methods according to the knowledge base format. `load_canonical_knowledge_base()` is a lot faster but can be used only for `.metta` files which follows some extra assumptions:\n", "\n", "- The DBs are empty.\n", "- All MeTTa files have exactly one toplevel expression per line.\n", "- There are no empty lines.\n", "- Every \"named\" expressions (e.g. nodes) mentioned in a given\$n", " expression is already mentioned in a typedef (i.e. something\$n", " like (: \"my_node_name\" my_type)' previously IN THE SAME FILE).\n", "- Every type mentioned in a typedef is already defined IN THE SAME FILE.\n", "- All expressions are normalized (regarding separators, parenthesis etc)\n", " like (: \"my_node_name\" my_type)' or\$n", " (Evaluation \"name\" (Evaluation \"name\" (List \"name\" \"name\")))'. No tabs,\n", " no double spaces, no spaces after (', etc.\n", "- All typedefs appear before any regular expressions\$n", "- Among typedefs, any terminal types (e.g. (: \"my_node_name\" my_type)') appear\$n", " after all actual type definitions (e.g. (: Concept Type)')\n", "- No \"(\" or \")\" in atom names\$n", "- Flat type hierarchy (i.e. all types inherit from Type)\n", "\n", "Usually, \"canonical\" files are generated automatically by some conversion tool (e.g. `flybase2metta`)\n", "\n", "If the knowledge base is `.scm` or regular `.metta` file(s) then you should select `das.load_knowledge_base()`" ] }, { "cell_type": "code", "execution_count": null, "id": "9480f15f", "metadata": {}, "outputs": [], "source": [ "das.clear_database()\n", "das.load_knowledge_base(KNOWLEDGE_BASE)\n", "#das.load_canonical_knowledge_base(KNOWLEDGE_BASE)" ] }, { "cell_type": "markdown", "id": "14317d6c", "metadata": {}, "source": [ "If the knowledge base is large, the load can take a time to finish. You can follow the progress by looking at `/tmp/das.log`. Once it's done, you should execute a count just to make sure it worked OK." ] }, { "cell_type": "code", "execution_count": null, "id": "8443903f", "metadata": {}, "outputs": [], "source": [ "das.count_atoms()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 5 }