Imagine a game of chance where you can only be on positions 'A', 'B' or 'C'. Suppose these are the rules:
A | B | C | |
---|---|---|---|
If you're currently on 'A' | .05 | .03 | .02 |
A | B | C | |
---|---|---|---|
If you're currently on 'B' | .04 | .02 | .04 |
A | B | C | |
---|---|---|---|
If you're currently on 'C' | .03 | .01 | .06 |
These could be combined into one matrix:
The probability you'll end up here → | A | B | C |
---|---|---|---|
If you're currently on 'A' | .05 | .03 | .02 |
If you're currently on 'B' | .04 | .02 | .04 |
If you're currently on 'C' | .03 | .01 | .06 |
Or, just...
A | B | C | |
---|---|---|---|
A | .05 | .03 | .02 |
B | .04 | .02 | .04 |
C | .03 | .01 | .06 |
where each row represents an 'input state' (where you are now) and each column represents an 'output state" (where you'll end up next). Let's call this "Matrix M"
The thing is, if you apply Matrix M to move from position A, B, or C to position A, B, or C, you could turn around and apply Matrix M to move again. And repeat that process as many times as you like. This was invented by the Russian mathematician Andrey Markov (1856–1922). And repeatedly applying a matrix like the one above is called a 'Markov Chain". It turns out that Markov Chains (or "Markov Process') are good at modeling all sorts of real world processes.
We could imagine a similar game where the probabilities of your next position depend on your current position and your previous position. This is called a '2nd order' Markov matrix because it depends on two states: where you are now and where you were just before now. So Matrix M is a '1st order" because it only depends on one 'input' state: where you are right now. And a '3rd' order Markove matrix depend on three input states. And so forth.
Brain Bend Moment:
The thing is, generally speaking the higher the order, the better the approximation. In Claude Shannon's seminal paper A Mathematical Theory of Communication1, he briefly considers probabilitstic approximations of English, first using sequences of letters, then of words (probabilities derived by analysing English text):
The point is that as the order becomes larger, the results become more organic and lifelike.