--- title: Fix merge conflicts keywords: fastai sidebar: home_sidebar summary: "Fix merge conflicts in jupyter notebooks" description: "Fix merge conflicts in jupyter notebooks" nb_path: "nbs/05_merge.ipynb" ---
When working with jupyter notebooks (which are json files behind the scenes) and GitHub, it is very common that a merge conflict (that will add new lines in the notebook source file) will break some notebooks you are working on. This module defines the function fix_conflicts
to fix those notebooks for you, and attempt to automatically merge standard conflicts. The remaining ones will be delimited by markdown cells like this:
{% include image.html alt="Fixed notebook" width="700" caption="A notebook fixed after a merged conflict. The file couldn't be opened before the command was run, but after it the conflict is highlighted by markdown cells." max-width="700" file="/images/merge.PNG" %}
This is an example of broken notebook we defined in tst_nb
. The json format is broken by the lines automatically added by git. Such a file can't be opened again in jupyter notebook, leaving the user with no other choice than to fix the text file manually.
print(tst_nb)
Note that in this example, the second conflict is easily solved: it just concerns the execution count of the second cell and can be solved by choosing either option without really impacting your notebook. This is the kind of conflicts fix_conflicts
will (by default) fix automatically. The first conflict is more complicated as it spans across two cells and there is a cell present in one version, not the other. Such a conflict (and generally the ones where the inputs of the cells change form one version to the other) aren't automatically fixed, but fix_conflicts
will return a proper json file where the annotations introduced by git will be placed in markdown cells.
The first step to do this is to walk the raw text file to extract the cells. We can't read it as a JSON since it's broken, so we have to parse the text.
This function returns the beginning of the text (before the cells are defined), the list of cells and the end of the text (after the cells are defined).
start,cells,end = extract_cells(tst_nb)
test_eq(len(cells), 3)
test_eq(cells[0], """ {
"cell_type": "code",
<<<<<<< HEAD
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"z=3\n",
"z"
]
},""")
When walking the broken cells, we will add conflicts marker before and after the cells with conflicts as markdown cells. To do that we use this function.
tst = ''' {
"cell_type": "markdown",
"metadata": {},
"source": [
"A bit of markdown"
]
},'''
assert get_md_cell("A bit of markdown") == tst
ts = [''' {
"cell_type": "code",
"source": [
"'''+code+'''"
]
},''' for code in ["a=1", "b=1", "a=1"]]
assert same_inputs(ts[0],ts[2])
assert not same_inputs(ts[0], ts[1])
This is the main function used to walk through the cells of a notebook. cell
is the cell we're at, cf
the conflict state: 0
if we're not in any conflict, 1
if we are inside the first part of a conflict (between <<<<<<<
and =======
) and 2
for the second part of a conflict. names
contains the names of the branches (they start at [None,None]
and get updated as we pass along conflicts). prev
contains a copy of what should be included at the start of the second version (if cf=1
or cf=2
). added
starts at False
and keeps track of whether we added any markdown cells (this flag allows us to know if a fast merge didn't leave any conflicts at the end). fast
and trust_us
are passed along by fix_conflicts
: if fast
is True
, we don't point out conflict between cells if the inputs in the two versions are the same. Instead we merge using the local or remote branch, depending on trust_us
.
The function then returns the updated text (with one or several cells, depending on the conflicts to solve), the updated cf
, names
, prev
and added
.
tst = '\n'.join(['a', f'{conflicts[0]} HEAD', 'b', conflicts[1], 'c'])
c,cf,names,prev,added = analyze_cell(tst, 0, [None,None], None, False,fast=False)
test_eq(c, get_md_cell('`<<<<<<< HEAD`')+'\na\nb')
test_eq(cf, 2)
test_eq(names, ['HEAD', None])
test_eq(prev, ['a\nc'])
test_eq(added, True)
Here in this example, we were entering cell tst
with no conflict state. At the end of the cells, we are still in the second part of the conflict, hence cf=2
. The result returns a marker for the branch head, then the whole cell in version 1 (a + b). We save a (prior to the conflict hence common to the two versions) and c (only in version 2) for the next cell in prev
(that should contain the resolution of this conflict).
This begins by backing the notebook fname
to fname.bak
in case something goes wrong. Then it parses the broken json, solving conflicts in cells. If fast=True
, every conflict that only involves metadata or outputs of cells will be solved automatically by using the local (trust_us=True
) or the remote (trust_us=False
) branch. Otherwise, or for conflicts involving the inputs of cells, the json will be repaired by including the two version of the conflicted cell(s) with markdown cells indicating the conflicts. You will be able to open the notebook again and search for the conflicts (look for <<<<<<<
) then fix them as you wish.
If fast=True
, the function will print a message indicating whether the notebook was fully merged or if conflicts remain.