The Fibonacci Code

David MacKay, Seb Wills, Philip Sterne, and Graeme Mitchison

A standard ternary code for integers is the binary-plus-termination code, which uses the characters 1, 0, and X (for 'end of number').
ncB(n)
11X
210X
311X
4100X
5101X
Since every cB(n) starts with a 1, we can remove the head from these codes and obtain the headless binary code,
ncb(n)
1X
20X
31X
400X
501X
This code cb(n) defines an implicit distribution over integers n of the form \[ Pb(n) = \frac{1}{Z(\alpha)} \exp(-\alpha l(n)) \frac{1}{2^{l(n)}} \] where l(n) is the length of the headless binary code (roughly \log_2(n)), and \alpha is a parameter associated with the 'duration' of the X character relative to 0 and 1, whose 'durations' are equal.

If the duration of the X character is the same as the duration of the 0 and 1 then the probabilities are
ncb(n)Pb(n)
1X1/3
20X1/9
31X1/9
400X1/27
501X1/27

Asymptotically, \[ Pb(n) ~ 1/n^r \] for some $r$.

Here's an alternative ternary code for integers based on different character durations. Let the durations of 0 and 1 be
characterduration
01 unit
12 units
Then imagine a code c(n) that assigns a binary string, followed by an X, to each integer, such that the probabilities are of the form \[ P(n) ~ 1/n^r \] asymptotically; and such that \[ log 1/P(n) = 2a 1(c(n)) + a 0(c(n)) + delta \label{eq.2} \] where 1(c) is the function that counts how many 1s there are in the string c, and 0(c) counts how many 0s. (Note the multipliers $2a$ and $a$ corresponding to the durations of 0 and 1 in the preceding table.

Here is such a code.
nc(n)
1X
20X
300X
41X
5000X
601X
710X
80000X
9001X
10010X
11100X
1211X
1300000X
140001X
150010X
160100X
17011X
181000X
19101X
20110X
21000000X
The fibonacci numbers 1, 2, 3, 5, 8, 13, 21 are the strings of zeroes terminated by an X.

The code can be generated by the fibonacci game that Graeme learned from John Conway. Each 0 means "move along the row". Each 1 means "transfer to the row labelled by the current integer". X means "read out the current integer". Notice the patterns in this code.

This code assigns a non-increasing probability to the integers. Jumps in the probability occur at the fibonacci numbers. In the following table the last column is the count $ 2 1(c(n)) + 0(c(n))$ that appeared in equation \ref{eq.2}:
Table X
nc(n) log 1/P (scaled)
1X0
20X1
300X2
41X2
5000X3
601X3
710X3
80000X4
9001X4
10010X4
11100X4
1211X4
1300000X5
140001X5
150010X5
160100X5
17011X5
181000X5
19101X5
20110X5
21000000X6

Note:

  1. The codewords with a given 'duration' are sorted alphabetically!
  2. If you behead a bunch of codewords with one duration d (eg 5), they fall into two blocks of duration d-1 and d-2 respectively. This identity is highlighted in the copy of the table below. Once we notice this, a recursive encoding algorithm becomes evident (from number, n, to binary string)

Table Y
nc(n) log 1/P (scaled)
1X0
20X1
300X2
41X2
5000X3
601X3
710X3
80000X4
9001X4
10010X4
11100X4
1211X4
Table Z
1300000X5
140001X5
150010X5
160100X5
17011X5
181000X5
19101X5
20110X5
21000000X6

Connections:

  1. this is reminiscent of the Elias code for integers (page 135 of ITILA). That code has no "X" character, but it does have the recursive property, with integers being used to point you to rows in which integers can be found.
  2. It's also possibly related to the capacity of the constrained binary channels presented on page 248 of ITILA.

The fibonacci table. Each row is a fibonacci series. Each integer from 1 upwards appears exactly once in the table proper, and can be located by a unique binary string as described above.

 
Row number  Reminder number  The table proper
----------  ---------------  ----------------------------------------------------
  0             1             1  2  3  5  8 13 21	...   
  1             3             4  7 11 18 29 ...
  2             4             6 10 16 26 42 68
  3             6             9 15 24 39 ...
  4             8            12 20 32 52 ...
  5             9            14 23 37 60 ...
  6            11            17 28 45 .... 
  7            12            19 ......
  8            14            22 ...
  9            16            25 41 ...

Note that we can associate with each row a count, which is the weighted number of digits required to get to that row. Then if we sort the rows by that count, the sequence of integers moves through the table strictly in accordance with a triangular expanding region where the triangle has slope 1. If you add diagonal lines to the table below, it might look right. Slide a piece of paper over the table and see the integers appear almost in sequence. [You need to superpose rows with equal count, for best effect!]

 
Row number  Reminder number  The table proper
 and COUNT	   
----------  ---------------  ----------------------------------------------------
  0   0         1             1  2  3  5  8 13 21	...   


	   
  1   2         3             4  7 11 18 29 ...
	   
  2   3         4             6 10 16 26 42 68
  3   4         6             9 15 24 39 ...
  4   4         8            12 20 32 52 ...
  5   5         9            14 23 37 60 ...
  6   5        11            17 28 45 .... 
  7   5        12            19 ......
  8   6        14            22 ...
  9   6        16            25 41 ...

Non-constructive proof of the theorem that the Fibonacci table contains every integer exactly once.

  1. Generate all strings in the set X, 0X, 1X, 00X, 01X, 10X, 11X, ... and compute their associated integers. Call the string the code, c, and the integer n(c). This map from c to n is well-defined. It is conceivable that a particular integer is generated more than once. We want to prove that this is not so.
  2. Sort all the codes by their counts, as indicated in the recent table (Table X).
  3. Prove lemma 1, that the count is monotonic in n. I.e., if c(b) > c(a) then certainly b > a. (This should be provable, I think, by using inequalities that relate successive F.N.'s. Appending a zero scales the number up by a factor of golden ratio (insert inequality here); and appending a 1 scales up by (golden ratio + 1) and adds 1.)
  4. Now the count increases at each Fibonacci number, and the number of numbers that ought to have a particular count is a Fibonacci number (the preceding F.N.).
  5. Lemma 2: count the number of binary strings with count w and prove that it is a Fibonacci number. (Remember that the count is the number of 0s in the string plus twice the number of 1s; this is exactly what comes out of the trellis of the constrained channel, QED.)
  6. Since the number of strings that have count w is the F.N., and since all strings map to distinct pigeonholes, and the number of candidate pigeonholes is the F.N., every integer must be mapped to by exactly one code. QED

Smoother probability distributions

When we started thinking about this code for integers, one of our motivations was the idea that this code might provide a way to assign a smoother monotonic distribution on integers than \[ Pb(n) = \frac{1}{Z(\alpha)} \exp(-\alpha l(n)) \frac{1}{2^{l(n)}}, \] which has jumps between plateaus at the powers of two.

We have hunted through hundreds of rules for generating fibonacci-code strings, either starting from the head or from the tail, trying to find a generative probabilistic model that gives a smooth monotone distribution. We got close, but never found a solution that was monotone for all n.

What we did find, however, was a smooth generative model for the standard binary code! It is very simple. The resulting distribution is the classic

1/(n*(n+1))
distribution.

Generative model for binary numbers

The string is generated from the head, extending its tail by one bit at a time. The initial number is n=1, and the probability is 1.

When in state n, the choices are:

symbol to append state change probability
X output n 1/2
0 n → 2n (1/2) (n+1)/(2n+1)
1 n → 2n + 1 (1/2) n/(2n+1)

This code is demonstrated by BinaryDecode.py.


Other encoders and decoders

For completeness, here are rules for generating and decoding the standard binary code and the fibonacci code from the head and from the tail.
Encoder for binary, from head
Decoder for binary, from tail

The string is generated from the tail, extending its head by one bit at a time. The initial number is n=1, and length is L=0.

When in state n, the choices are:

symbol to prepend state change
(Root) output n
0 nn + 2L
LL +1
1 nn + 2L+1
LL +1

Encoder for binary, from tail
Encoder for fibo-binary, from head
Decoder for fibo-binary, from head

This is the 'fibonacci table' algorithm. Each 0 means 'move right along the current row'; each 1 means 'change to row n'.

The initial state is (n,p)=(1,1). [p denotes 'predecessor'.]

symbol to append state change
X output n
0 (n, p) → (n+p, n)
1 (n, p) → (2n+p+1, n+p+1)
This algorithm is implemented by Decode2.py

Encoder for fibo-binary, from tail

The string is created from the tail. The procedure follows the colour table Y,Z given earlier.

We make use of the standard fibonacci sequence F(0) = 1; F(1) = 1; F(2)=2; F(3)=3; F(4)=5...

The initial state is the integer to be encoded, n. The algorithm is given in Encode2.py.

ENCODE(n): (recursive algorithm)

  1. If n==1 print "X" and halt.
  2. Find the largest fibonacci number F(i) such that F(i)≤ n
  3. Compute how far we are above that number:
    p = n-F(i)
  4. If* p ≥ F(i-2)
    • return 1 + ENCODE(p)
  5. Else
    • return 0 + ENCODE(p+F(i-1))

* We assert at this point that i is ≥ 2, so it is ok to look for F(i-2)

Decoder for fibo-binary, from tail

The string is generated from the tail, extending its head by one bit at a time. The initial number is n=1, and length is L=0. This generator makes use of the standard fibonacci sequence F[0] = 1; F[1] = 1; F[2]=2; F[3]=3; F[4]=5...

When in state n, the choices are:

symbol to prepend state change
(Root) output n
0 nn + F[L]
LL +1
1 nn + F[L+3]
LL +2
This algorithm is implemented by the rather clumsy TailDecode2.py


David MacKay