
Yesterday I attended a talk on the Voynich Manuscript (VM), at MIT, by Kevin Knight of USC's Information Sciences Institute. Here's a brief summary of his talk:
The manuscript consists of 235 pages on vellum, with color drawings of plants, nymphs, stars, etc. It contains about 30,000
words written in an unknown script, and is owned by Yale University.
It has a character set that has not been observed in any other document. It is broken up into sections called "herbal",
"astrological", "biological", "cosmological", "pharmacological", and a pure text section at the end. These names reflect the pictures in each section. For example, the "herbal" section contains pictures of unknown plants being grafted onto other plants. The "biological" section depicts small nudes in baths with interconnecting tubes of liquids. The "pharmacological" section shows something that has been interpreted as a medicine jar.
A cover letter of Joannes Marcus Marci of Cronland was found tucked in the manuscript. The letter claims that the book once belonged to Emperor Rudolf II and that Rudolf beliefed that Roger Bacon was the author.
There have been many attempts to decipher the book. One was made by William Newbold at the University of Pennsylvania, He claimed that each letter consisted of many other Greek letters, which were anagrams holding the real meaning of the manuscript, and "deciphered" it on this basis. His decipherment is now regarded as completely bogus.
Athanasius Kircher once owned the book, from 1665-1680.
The Voynich script consists of between 23 and 40 distinct characters. (It is hard to say for sure, since some characters appear to be compounds of others.) There are no signs of corrections, which suggests that the manuscript was copied from some other source. There is an unusual distribution of word lengths - most "words" are of lengths 3, 4, and 5 letters. Many words are doubled, and some are tripled.
The cryptographer William Friedman worked on the manuscript during World War II. There are many claimed decipherments. A 2004 Scientific American article by Gordon Rugg, however, suggests that the manuscript is just gibberish. Perhaps Voynich faked it himself.
Kevin Knight discussed some of his own attacks on the manuscript using clustering techniques. For example, if you try breaking up the English alphabet into two types, say a and b, and use expectation maximization to generate two clusters, you get AEIOUy as one cluster, and the consonants in another. Doing the same for the Voynich manuscript, however, doesn't generate anything particularly meaningful.
You could also try this kind of clustering with the words of the manuscript instead of the letters. When you do so, you get two clusters: the words in the "herbal", "astrological", and "pharmacological" sections predominantly fall into one cluster, and the words in the "biological" and "cosmological" sections predominantly fall into another. [To me, this suggests that the manuscript probably had at least two authors.]
Voynich "B" is the "biological" + "astrological" sections. You can then try to divide the words in this section into more classes. If you do this for English, you get a cluster with words like "my, a, an, the,..."; another with "and, but, next,...", another with "had, asked, could, have, are, is, would,...", another with "for, at, in, no, that, be, but,..." etc. If you do this for Voynich you also get clusters but the meaning is less clear.
My guess is that the manuscript is some form of hoax, but I'd be delighted to be proved wrong.