Here is how I might describe language models to a 5th grader:   

Before we describe this lets understand image repair. 


If we have a painting that was missing pieces of information ( time had flaked the paint off of the canvas) And had left only a black surface.  In certain areas you could probably if the areas are small enough take small amounts of paint that averages The colors around it and fill the missing paint.   When mixing the paint you’re probably going to be doing a little bit of statistical averaging to make it work out. This entire process can be done by a computer by averaging out the colors.  

If larger areas of the painting are missing it is not so easy. You might have to guess more of the pattern by looking at the area around it.  You know if parts of the white picket fence were missing, and in general the fence was straight.. Whatever you fill in will need to be the same “straightness”.  The grass in a particular area Is still green and the artists use a particular style of brushstrokes to make it. If an area of a cloudy sky was missing you might have to draw some clouds.  Whatever the clouds you are drawing are probably not the same as from the original picture.  However no one else other than the original artist would be able to tell the difference.  Such a repair system is an ideal application for a neural network.   

Nowadays we have better and better ways of applying neural networks to such a problem.  We can train a network to look at millions of pictures and it will very expertly be to fill in gaps on pictures it hasn't seen before. Given a blue sky it will very well draw the correct looking clouds where nothing Is notably wrong.  Obviously nowhere in such networks is our conceptualization of clouds and fences at least anywhere else other than from the visual spectrum of color, especially  true if we curated its environment.   Yet it is  so good at predicting where white color should be applied and they look like better clouds than some of us can draw.  Also we can also use this to start paintings on a fresh canvas!  

Okay if painting images is not a big deal to you, what if this was the same but with missing pieces of text?  

Three encodings combined make “language models” that fill in the missing text.

1) relative positional coding of word tokens from the start of the document.  

2) For which “groups of words” that seem to be statistically related to other “groups of words” we created a thing called “attention”.

3) “If we pay attention to this word the next words we pay attention to are these ones' ' How we swap out and transition between groups of words, (how we pay attention to what we are paying attention to) patterns is encoded as “self attention”.   

  Just like in image learning, with a large enough body of text  ( with not too much “2+2 = 5” lying around) phrases like “2+2 = 4” get trained into our models. The transformer model, like in the picture repair, can now fill in missing parts of text.  But we can also start out with a prompt of text and complete the missing parts (autocomplete) to find new answers.   We effectively made a search engine over millions of pieces of text that indexes things in such a cool way, it performs a sort of reasoning for us. This on its own is quite a feat! Of course no one would assume AGI would emerge from the image processor description above or a a text processor like GPT.

Copywrite © 2020 LOGICMOO (Unless otherwise credited in page)