--- title: Clean notebooks keywords: fastai sidebar: home_sidebar summary: "Strip notebooks from superfluous metadata" description: "Strip notebooks from superfluous metadata" nb_path: "nbs/07_clean.ipynb" ---
To avoid pointless conflicts while working with jupyter notebooks (with different execution counts or cell metadata), it is recommended to clean the notebooks before committing anything (done automatically if you install the git hooks with nbdev_install_git_hooks
). The following functions are used to do that.
tst = {'cell_type': 'code',
'execution_count': 26,
'metadata': {'hide_input': True, 'meta': 23},
'outputs': [{'execution_count': 2,
'data': {
'application/vnd.google.colaboratory.intrinsic+json': {
'type': 'string'},
'plain/text': ['sample output',]
},
'output': 'super'}],
'source': 'awesome_code'}
tst1 = tst.copy()
clean_cell(tst)
test_eq(tst, {'cell_type': 'code',
'execution_count': None,
'metadata': {'hide_input': True},
'outputs': [{'execution_count': None,
'data': {'plain/text': ['sample output',]},
'output': 'super'}],
'source': 'awesome_code'})
clean_cell(tst1, clear_all=True)
test_eq(tst1, {'cell_type': 'code',
'execution_count': None,
'metadata': {},
'outputs': [],
'source': 'awesome_code'})
tst2 = {
'metadata': {'tags':[]},
'outputs': [{
'metadata': {
'tags':[]
}}],
"source": [
""
]}
clean_cell(tst2, clear_all=False)
test_eq(tst2, {
'metadata': {},
'outputs': [{
'metadata':{}}],
'source': []})
tst = {'cell_type': 'code',
'execution_count': 26,
'metadata': {'hide_input': True, 'meta': 23},
'outputs': [{'execution_count': 2,
'data': {
'application/vnd.google.colaboratory.intrinsic+json': {
'type': 'string'},
'plain/text': ['sample output',]
},
'output': 'super'}],
'source': 'awesome_code'}
nb = {'metadata': {'kernelspec': 'some_spec', 'jekyll': 'some_meta', 'meta': 37},
'cells': [tst]}
clean_nb(nb)
test_eq(nb['cells'][0], {'cell_type': 'code',
'execution_count': None,
'metadata': {'hide_input': True},
'outputs': [{'execution_count': None,
'data': { 'plain/text': ['sample output',]},
'output': 'super'}],
'source': 'awesome_code'})
test_eq(nb['metadata'], {'kernelspec': 'some_spec', 'jekyll': 'some_meta'})
By default (fname
left to None
), the all the notebooks in lib_folder
are cleaned. You can opt in to fully clean the notebook by removing every bit of metadata and the cell outputs by passing clear_all=True
. disp
is only used for internal use with git hooks and will print the clean notebook instead of saving it. Same for read_input_stream
that will read the notebook from the input stream instead of the file names.