--- title: Clean notebooks keywords: fastai sidebar: home_sidebar summary: "Strip notebooks from superfluous metadata" description: "Strip notebooks from superfluous metadata" nb_path: "nbs/07_clean.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

To avoid pointless conflicts while working with jupyter notebooks (with different execution counts or cell metadata), it is recommended to clean the notebooks before committing anything (done automatically if you install the git hooks with nbdev_install_git_hooks). The following functions are used to do that.

Utils

{% raw %}

rm_execution_count[source]

rm_execution_count(o)

Remove execution count in o

{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_output_data_vnd[source]

clean_output_data_vnd(o)

Remove application/vnd.google.colaboratory.intrinsic+json in data entries

{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_cell_output[source]

clean_cell_output(cell)

Remove execution count in cell

{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_cell[source]

clean_cell(cell, clear_all=False)

Clean cell by removing superfluous metadata or everything except the input if clear_all

{% endraw %} {% raw %}
{% endraw %} {% raw %}
tst = {'cell_type': 'code',
       'execution_count': 26,
       'metadata': {'hide_input': True, 'meta': 23},
       'outputs': [{'execution_count': 2, 
                    'data': {
                        'application/vnd.google.colaboratory.intrinsic+json': {
                            'type': 'string'},
                        'plain/text': ['sample output',]
                    },
                    'output': 'super'}],
       
       'source': 'awesome_code'}
tst1 = tst.copy()

clean_cell(tst)
test_eq(tst, {'cell_type': 'code',
              'execution_count': None,
              'metadata': {'hide_input': True},
              'outputs': [{'execution_count': None, 
                           'data': {'plain/text': ['sample output',]},
                           'output': 'super'}],
              'source': 'awesome_code'})

clean_cell(tst1, clear_all=True)
test_eq(tst1, {'cell_type': 'code',
               'execution_count': None,
               'metadata': {},
               'outputs': [],
               'source': 'awesome_code'})
{% endraw %} {% raw %}
tst2 = {
       'metadata': {'tags':[]},
       'outputs': [{
                    'metadata': {
                        'tags':[]
                    }}],
       
          "source": [
    ""
   ]}
clean_cell(tst2, clear_all=False)
test_eq(tst2, {
               'metadata': {},
               'outputs': [{
                    'metadata':{}}],
               'source': []})
{% endraw %} {% raw %}

clean_nb[source]

clean_nb(nb, clear_all=False)

Clean nb from superfluous metadata, passing clear_all to clean_cell

{% endraw %} {% raw %}
{% endraw %} {% raw %}
tst = {'cell_type': 'code',
       'execution_count': 26,
       'metadata': {'hide_input': True, 'meta': 23},
       'outputs': [{'execution_count': 2,
                    'data': {
                        'application/vnd.google.colaboratory.intrinsic+json': {
                            'type': 'string'},
                        'plain/text': ['sample output',]
                    },
                    'output': 'super'}],
       'source': 'awesome_code'}
nb = {'metadata': {'kernelspec': 'some_spec', 'jekyll': 'some_meta', 'meta': 37},
      'cells': [tst]}

clean_nb(nb)
test_eq(nb['cells'][0], {'cell_type': 'code',
              'execution_count': None,
              'metadata': {'hide_input': True},
              'outputs': [{'execution_count': None, 
                           'data': { 'plain/text': ['sample output',]},
                           'output': 'super'}],
              'source': 'awesome_code'})
test_eq(nb['metadata'], {'kernelspec': 'some_spec', 'jekyll': 'some_meta'})
{% endraw %} {% raw %}
{% endraw %}

Main function

{% raw %}

nbdev_clean_nbs[source]

nbdev_clean_nbs(fname:str=None, clear_all:bool_arg=False, disp:bool_arg=False, read_input_stream:bool_arg=False)

Clean all notebooks in fname to avoid merge conflicts

Type Default Details
fname str None A notebook name or glob to convert
clear_all bool_arg False Clean all metadata and outputs
disp bool_arg False Print the cleaned outputs
read_input_stream bool_arg False Read input stram and not nb folder
{% endraw %} {% raw %}
{% endraw %}

By default (fname left to None), the all the notebooks in lib_folder are cleaned. You can opt in to fully clean the notebook by removing every bit of metadata and the cell outputs by passing clear_all=True. disp is only used for internal use with git hooks and will print the clean notebook instead of saving it. Same for read_input_stream that will read the notebook from the input stream instead of the file names.