Computer Algebra and Secret Codes
David Bowers, Suffolk College, Ipswich, UK
First published in Micromath, Vol 13/2, Summer 1997
In a recent Micromath article [1], Jan Winter describes how the high decimal precision of computer algebra systems such as Derive can benefit an investigation of the cycles of recurring digits in decimal fractions. There are also ways of exploiting Derive’s ability to work with very large numbers to generate interesting classroom activities.
By Key Stage 4, pupils are expected to know about prime numbers. A common task is to express an integer as the product of its prime factors. This is normally carried out routinely by repeatedly dividing by 2, 3, 5, . . . until unity is reached. While this procedure is not too demanding in simple cases, it can soon become time-consuming for large numbers. The "Factor" command available in Derive and on the hand-held Texas TI-92 can be carried out at the press of a button for very large numbers, including those with high factors (see Fig. 1). This can provide new motivation for pupils to appreciate the concept of numerical factors, and in particular the uniqueness of prime factorisation.

Figure 1: A Derive session demonstrating numerical factorisation
Gödel numbering
We introduce the idea of coding secret messages based on a simple type of Gödel numbering [see reference 2]. The letters of the alphabet are numbered in the obvious way, with A=1, B=2, . . . , Z=26. A word of n letters is encoded by representing it as the product of the first n primes each raised to a power corresponding to the letters of the word. For example,
CAT Þ 23.31.520 Þ 2288818359375000.
Decoding a message requires the ability to factorise large numbers. For example,
430519397099904000000000000 Þ 219.35.512.712 Þ SELL.
This is a laborious exercise, to say the least, for which traditional calculators offer no direct assistance due to the size of the numbers. (We note in passing that graphical calculators such as the Casio fx-7700 appear to accept input to a large number of digits, but in fact only work to about ten significant figures. Thus the calculation
12345678901234567890 - 12345678900000000000
returns 0 as the "answer".)
Derive, the TI-92 and other symbolic manipulators will accept and work exactly with very large numbers, and factorise ones of the "Gödel-type" almost instantaneously. (It takes a little longer if the prime factors themselves are large - try factorising the eleven-digit repunit 11111111111.) Thus new technology is a key to secret codes! Pupils can be asked to decode messages such as
1836660096 1262524628803026963045247176133890
or 30774536132812500000000 4086075469921875000000
or 9155273437500000 7536314998406250000000000000 138511447855224609375000000
and then make up and send similar coded messages to their friends. (The alert teacher will be on the lookout for numbers of the form 64(2k + 1) . . . )
Cracking the code by inspection
Perhaps inevitably, the initial novelty soon wears thin as pupils tire of typing in and copying down large numbers, with the associated risk of typing mistakes ("transmission errors"?). They can then be encouraged to look more closely at the encoded messages. Are they really so "secret", or are there ways of deducing some things about the letters that make up the word by just looking at the large number, without any calculating tools?
Take, for example, the word encoded as 19695703125000. Like most of the encoded words that the pupils will already have experienced, it ends with a string of zeros, which implies so many factors of 10 (or 2 ´ 5). Therefore the word above contains factors of 2p and 5r where p and r are each at least 3, at least one of them being exactly 3. When the factor of 23 ´ 53 is extracted (by knocking off the three zeros), we are left with an odd number ending in 5. Therefore the factors must be 23 and 5r, where r > 3. So the first letter of the word is C, and the third letter comes after C in the alphabet.
What else? Having knocked off the three zeros, we notice that the remaining string of digits ends with not just 5, not just 25 (or 52) but 125 (or 53). This string can therefore be expressed as 1000k + 125 or (8k + 1).53 meaning now that there is a factor of 5r where r > 6, implying that the third letter comes after F in the alphabet. (In fact, since the final four digits of the string are 3125, or 55, and the preceeding digit is 0, the string is of the form 100000k + 3125 or (32k + 1).55 , giving a factor of 5r where r > 8. However, it is unlikely that this would be noticed immediately upon inspection.)
Anything else? How many letters does our mystery word have? Since it is divisible by 5, it must have at least three letters. If it is divisible by 7, it would have at least four letters. Unfortunately, a test for divisibility by 7 is difficult to do "in your head" for large numbers (for details see [3]), so let us move on and test for divisibility for 11, the next prime. The procedure is quite simple, although not so well known. Alternately add and subtract the consecutive digits of the number; the number is divisible by 11 if and only if the final result is divisible by 11. Applied to our word, we get
+ 1 - 9 + 6 - 9 + 5 - 7 + 0 - 3 + 1 - 2 + 5 - 0 + 0 - 0 = -12
which is not divisible by 11. Thus there is no factor of 11, meaning our word has at most four letters.
Another well-known test for divisibility is that by 9. If the sum of the digits is a multiple of 9, then the number itself is divisible by 9. Admittedly 9 is not a prime number, but since 9 = 32 we might get some clues about the second letter of our word, since 3 is the second prime. For a large number, we would not actually add up all the digits, but "cast out" subgroups which total 9. Here, this means casting out 9, 9, 6 and 3, 7 and 2, as shown:
1 9 6 9 5 7 0 3 1 2 5 0 0 0
leaving 1 + 5 + 1 + 5 = 12, which is not divisible by 9. We deduce that the prime factor of the form 3q must have q=1, and so the second letter of our mystery word is A.
By reviewing and applying some fairly basic number facts, we have come to the stage where we know that the fourteen-digit code given above actually represents a word of the form CA _ _ or CA _ , with the third letter coming after F. And without using a calculator! Derive allows us to check our supposition, giving instantly 23.31.511.75 which corresponds to the word CAKE. Obviously, examples which allow this much progress need to be well chosen. How far can you get with decoding these:
7500
77760000000
91552734375000
3296164134046140
by inspection?
At this point, the Micromath reader might be perplexed at being asked to perform mental arithmetic rather than using the available technology to solve the problem immediately. This does, however, appear to be the way a "revised" mathematics curriculum might be developed in the current political climate. Let us not forget that this "secret code" problem was motivated by the technology, and technology provides the only efficient way of checking the hypotheses (namely the partial decoding) resulting from inspection. This kind of interplay between mental and technology-assisted mathematics could form the basis of a new style of pedagogy.
Developing the investigation
Pupils soon discover that a Gödel-type coding of words which are longer than the fairly trivial examples used above results in extremely large numbers which frequently scroll off the end of the computer screen. The challenge can now be set to modify the coding technique to minimise the length of the coded words.
It is quickly established that high powers generally make the number bigger than high bases. For example, 725 >> 257. Thus the coded words can be kept small if the powers used are kept small. Since the powers correspond to the letters, this would imply allocating low values to the most commonly used letters rather than the original arbitrary A=1, B=2, etc. How can we rank the letters of the alphabet according to their frequency of occurrence in English text? This is a perfect opportunity to allow pupils to meet various attainment targets in "Handling Data" by counting the letters on randomly chosen pages of text and pooling their results. This will probably result in the new allocation E=1, T=2, A=3, O=4 and so on. The task can be set to establish what order of improvement in coded word length is achieved using this new allocation.
Even so, long words will still result in long numbers. One way of reducing word length is to split up the message into chunks of three (say) consecutive letters,
FOR EXA MPL ELI KET HIS
which is not difficult to reconstruct. This means that no coded chunk will be greater than 224.325.526 , which with amazing good fortune is a 38-digit number which just fits exactly on the TI-92 screen! Furthermore, three-letter chunks ensure that all coded chunks will end with a string of zeros and can be expressed compactly using standard form. For example, using the original A=1, B=2, . . . allocation, the word ROT would be coded as 358722675E18. To decode a word in this format, simply factorise the number before the E and add the number after the E to the powers of 2 and 5. Thus
669462604992E7 Þ 26+7.321.50+7 Þ MUG.
Considerable classroom discussion can be generated by seeking ever more efficient ways of coding messages along these lines. Of course, the simpler the code, the easier it is to crack. We are moving away from the "hard" problem of factorising huge numbers towards an unambiguous numerical representation of symbolic strings, and consolidating an awareness of prime numbers and factorisation on the way.
References
1. Jan Winter, 1996, Using Derive as a giant calculator, Micromath 12.2
2. Douglas Hofstadter, 1985, Metamagical Themas, Penguin
3. David Wells, 1986, The Penguin Dictionary of Curious and Interesting Numbers