Category Archives: Voynich “Solutions”

Cheshire Reprised

16 May 2019         

A week ago I posted commentary on Gerard Cheshire’s “proto-Italic ” and “proto-Romance” solution for the VMS. At the time, his most recent paper was pay-to-view, so I had to restrict my comments to the previous open-access paper. Now the most recent version is open-access. Unfortunately, not much has changed from the previous version. You can see his April 2019 proto-Romance theory here.

What exactly do the terms “proto-Romance” and “proto-Italic” mean?

Proto-Romance

If you search for “proto-Romance”, you will find many references to “vulgar Latin” (also called colloquial Latin)—variations of Latin spoken by the common people (most of whom were illiterate) during the classical period of the Roman Empire.

The “classical period” of the Greeks and Romans spanned approximately 14 centuries up to about 6th century C.E. when the Roman Empire was no longer dominant. As Rome lost its grip, vernacular languages and local versions of Latin had the opportunity to evolve into modern languages such as Italian/Sardinian, Spanish, Portuguese, French (with Gaulish influence), and Romanian.

Extinct Languages and Undocumented Scripts

The prefix “proto-” comes from Greek πρωτο-. This refers to the first, or to something that comes before. So proto-Romance means before the Romance languages had fully emerged (from vulgar Latin), and proto-Italian script means an alphabet that was used before the script that became standard for writing medieval Italian. Medieval Italian script is essentially the same alphabet we use now except that the letterforms are more calligraphic than modern computer users are accustomed to seeing.

This brings us back to Cheshire, who is claiming that Voynichese is an extinct proto-Romance language in an undocumented proto-Italian script… something that existed about 1,000 years before the creation of the VMS.

How is that possible when the radiocarbon-dating and many of the iconographical and palaeological features of the VMS point to the early 15th century?

Cheshire’s Interpretation of Medieval Characters

Cheshire’s descriptions of individual glyphs, and his interpretations of the annotations on folio 116v, suggest that he is not familiar with medieval scripts.

It also seems that he hasn’t studied the frequency or distribution of the Voynich glyphs in the larger body of the main text, because he associates common letters and letter combinations with glyphs that are rare, or that have unusual positional characteristics. This point is so important, it bears repeating… Cheshire assigned substitution values for common letters to rare VMS glyphs, or glyphs that have positional characteristics that are not consistent with Romance languages.

Is it possible he never tested his system to see if it would generalize to larger chunks of text? Did he prematurely assume he had solved it?

Let’s look at some examples…

Cheshire’s Analysis and Transliteration of Voynich Glyphs

In his first example, Cheshire takes a glyph-shape that is known to palaeographers as the Latin “-cis” abbreviation (the letter c plus a loop that usually represents “is” and its homonyms). This shape is both a ligature and an abbreviation in languages that use Latin scribal conventions. It has not yet been determined what it means in the VMS, but its positional characteristics are similar to texts that use the Latin alphabet.

VMS researchers know this shape as EVA-g.

Cheshire transliterates it as a “ta” diphthong. It’s not a diphthong. A diphthong is a combination of two vowel sounds and “t” is clearly not a vowel. The terminology is wrong.

He then gives an explanation of the shape that doesn’t mesh with medieval interpretations of letter shapes. This is figure 26 from his paper (Source: tandfonline):

To say that this can be confused with the letter r and the letter n makes no sense to anyone accustomed to reading medieval manuscripts. It looks nothing like r or n. If Cheshire means it can be confused with his transliterated r or n, he should clarify and provide examples.

To get a sense of how this character was used in the medieval period, I have created a chart with examples of the “-cis” ligature/abbreviation that was common to languages that used Latin scribal conventions. I have sorted them by date.

This is not to imply that the Latin meaning and the VMS meaning are the same. The VMS designer may only have borrowed the shape, but it is important to note that the position of this glyph in the VMS is very similar to how it is positioned in Latin languages:

More important than the mistakes in reading medieval characters and linguistic terminology is that Cheshire did not address the basic statistics of VMS text and the fact that this glyph occurs primarily at the ends of words and sometimes the ends of lines. Thus, transliterating EVA-g as “ta” is highly questionable.

Perhaps Cheshire can justify this mismatch between letter frequency and position by saying that separate glyphs also exist for “t” and “a”, but when you put the various transliterations together, one finds that the character distribution of Romance-language glyphs and Cheshire transliterations are significantly out-of-synch.

For example, as in his previous paper, he chose one of the rarest glyphs in the VMS repertoire (EVA-x) to represent the letter “v”. In classical Latin and Romance languages, the letters “u” and “v” are essentially synonymous and very frequent. In this brief excerpt in modern characters, from Pliny the Younger, note how often u/v occurs:

Pic of letter frequency of U/V in classic Latin text by Pliny the Younger

If Voynichese were a proto-Romance language (some form of classical vulgar Latin), and EVA-x were transliterated to U/V and also F/PH, as per Cheshire’s system, one would expect to see this character more than 40,000 times in 200+ pages. Instead, this character occurs less than 50 times. That alone should create doubt in people’s minds about Cheshire’s “solution”.

So what has Cheshire done? He has assigned a different letter to represent “u”, but we know that in classical Latin, Etruscan, and Old Italic, “v” and “u” did not represent different letters even if both shapes were used (which they usually weren’t).

Even in the Middle Ages, when there were different shapes for “u” and “v”, most scribes used them interchangeably. In other words, “verba” might be written with the “v” shape in one phrase and with a “u” shape (uerba) in the next, just as “s” was written with several different shapes (without indicating any difference in sound).

This is the 23-character Latin alphabet in use around the time vulgar Latin was evolving into Romance languages:

Example of Roman alphabet

Perhaps Cheshire didn’t know that they were interchangeable shapes rather than two different letters when he created his transcription system. But if he did know, if he actually believes that “u” and “v” were distinct letters in proto-Romance languages, he will have to provide evidence, because historians, palaeographers, and linguists are going to be skeptical.

Beginning-Paragraph Glyphs

Voynich scholars have noticed there are disproportionate numbers of EVA-p/r and EVA-t/k characters at the beginnings of paragraphs. There is a possibility that some are pilcrows, or serve some other special function when found in this position.

Cheshire doesn’t appear to have noticed this unusual distribution (at least he doesn’t comment on this important dynamic in his paper) and translates the leading glyph in the same ways as the others. In his system, a very large number of paragraphs inexplicably begin with the letter “P”.

Some of his translations cannot be verified. For example, he used a drawing on f75r to demonstrate a single transliterated word “palina” on f79v. There’s no apparent relationship between them (other than what he contends), so how does an independent party determine if the translation is correct?

Tenuous Assertions

On f70r, he uses a circular argument to explain the transliteration of “opat” (which he says is “abbot”). He says the use of “opat” indicates “that proto-Romance reached as far as eastern Europe” because “opát survives to mean abbot in Polish, Czech and Slovak”.

We don’t need a dubious transliteration to tell us that proto-Romance languages reached eastern Europe. The existence of Romania demonstrates this rather well—it borders the Ukraine, and used to encompass parts of Bohemia. Bohemia included Hungary, Czech, and parts of eastern Germany, so transmission of vulgar Latin to Polish through Czech was a natural process.

Palaeographical Interpretations

There are problems with the way Cheshire describes the text on folio 116v. He refers to the script as “conventional Italics”. It is, in fact, a fairly conventional Gothic script, not “conventional Italics”.

Then he makes a strange statement that the second line on 116v is hybrid writing, that it is Voynichese symbols mixed with “prototype Italic symbols, as if the calligrapher had been experimenting with a crossover writing system”. It’s hard to respond to that because his statement is based on misreading the letters. Here is the text he referenced in his paper:

anchiton mehiton VMS 116v

Cheshire interprets this as “mériton o’pasaban + mapeós”

He misread a normal Gothic h as the letter “r” and a normal Gothic “l” as the letter “P”. In Gothic scripts, the figure-8 character is variously used to represent “s”, “d”, and the number 8, so it’s very familiar to medieval eyes, but he doesn’t seem to know that and interpreted it as a Voynich character that he transliterated to “n”.

If his reading of the letters is wrong, then his transliteration is going to be wrong, as well.

Zodiac Gemini Figures

Cheshire mentions the Gemini zodiac figures (the male/female pair), and states: “Both figures are wearing typical aristocratic attire from the mid 15th century Mediterranean.”

It takes research to determine the location and time period for specific clothing styles—it’s not something people just automatically know. Since Cheshire didn’t credit a source for this reference, I will. It’s possible he got the information from K. Gheuen’s blog.. Even if he didn’t, Gheuen’s blog is worth reading.

Flora and Fauna

I’m not going to deal with Cheshire’s fish identification. It’s just as dubious as the Janick and Tucker alligator gar. There are fish that are more similar to the VMS Pisces than Cheshire’s sea bass, and pointing out the fact that sea bass has “scales” is like pointing out that a bird has wings.

I was hopeful that Cheshire’s latest paper would be an improvement over his previous efforts, but I was disappointed.

Summary

It’s possible there is a Romance language buried somewhere in the cryptic VMS text (it was, after all, discovered in Italy, and the binding is probably Italian), but that is not what Cheshire is suggesting. He’s saying it’s an extinct proto-Romance language, without providing a credible explanation of how this information could have been transmitted a thousand years into the future.

There is a relentless publicity campaign going on right now to catapult Cheshire into the limelight. I’m not going to repeat the claims in the news release (they’re pretty outrageous), but even Superman would blush at the accolades being heaped on this unverified theory.

When I checked Cheshire’s doctoral research, I discovered it was in belief systems. Somehow that seems fitting.

J.K. Petersen

© Copyright 2019 J.K. Petersen, All Rights Reserved

Postscript 16 May 2019: The University of Bristol has retracted the Cheshire news release. You can see the retraction here for as long as they decide to make it available.

Cheshire reCAsT

7 May 2019

You may remember an announcement by Gerard Cheshire that he had found a proto-Italic solution for the VMS. There was no corroboration for his theory by any of the scholars who are well-acquainted with the text and, to date, I haven’t seen Cheshire provide an objective verifiable solution.

He has now completed his Ph.D. and is making a bold and possibly proposterous claim that he solved the Voynich Manuscript shortly after discovering it and that his so-called solution “was developed over a 2-week period in May 2017” [Tandfonline.com 2019 Apr 29].

Who would claim to solve the VMS and then post a series of papers (Jan. to Apr. 2018) based on a few isolated sections that do not provide a convincing solution? Proposing that it is an extinct language is no more valid than any other VMS theory.

Since I am not willing to pay $43 (or even $4) to download the current version of his paper, I will restrict my remarks to the last of the previous papers, dated April 2018, which I only just read for the first time today (the link to Cheshire’s paper redirects from The Bronx High School of Science student newspaper’s site to sites.google.com).

Cheshire’s “Linguistic Dating” Theory

In the introductory section Cheshire states, “…in this regard, manuscript MS408 is ‘manna from heaven’ to the linguistic community, as it offers the components necessary to compile a lexicon of proto-Romance words, thanks to the accompanying visual information.”

He then claims that his “proto-Italic alphabet is shown to be correct, so we know that the spelling of the words is also correct, even if unknown”, and then goes on to say that pages without illustrations “will, of course, be more of a challenge…”

Besides the dubious claim that the “proto-Italic alphabet is shown to be correct…”, I’d like to point out that most VMS folios include illustrations. If you can decipher 200 pages with help from illustrations, then the ones without shouldn’t be too difficult, considering that Voynichese is reasonably consistent from beginning to end.

Cheshire then claims labels are easier to interpret (personally I haven’t seen anyone translate the labels in any verifiable way, but let’s continue):

“The longer sentences are filled with conversational connectives, pronoun variants, singular-plural terms, gender specifics and so on, that make it necessary to identify the unambiguous marker words and then make sense of the equivocal words by a process of sequential logic.”

This stopped me in my tracks. One of the characteristics of the Voynichese that truly stands out is the similarity and repetitiveness of beginnings and endings. How can one identify singulars, plurals and gender specifics in text where the beginnings and ends appear to be stripped of their diversity? I guessed that Cheshire must be either shuffling spaces or breaking up tokens (or both).

The 9-Rotum Foldout as Example

Thumbnail of VMS 9-rotum foldout.

To demonstrate his claim that the VMS uses a proto-Romance language and proto-Italic alphabet, Cheshire presents a partial analysis of the 9-rotum foldout folio, which he refers to as the Tabula regio novem.

He claims the correlations, “…are beyond reasonable doubt in scientific terms. Most of the annotations are translated and transliterated with entire accuracy…”

Another bold claim that doesn’t live up, in my opinion. But let’s look at his analysis…

Cheshire identifies Rotum7 as a volcanic eruption. I think this is possible, based on visual similarity alone, and others have suggested this possibility. However, it could just as easily be an image of mountain springs (the source of water) or a river delta as it spreads out in an alluvial fan or… something else.

So how does Cheshire support his claim?

Rotum7 Translation

Cheshire transliterates the text around the circumference as follows [I’ve added a Voynichese transcript to make it easier for readers to compare them and to see how Cheshire has broken up VMS tokens to create “words”]:

om é naus o’monas o’menas omas o’naus orlaus omr vasaæe or as a ele/elle a inaus o ele e na æina olina omina olinar n os aus omo na moos é ep as or e ele a opénas os as ar vas opas a réina ol ar sa os aquar aisu na

Note that EVA-ot is alternately translated as part of a word or as a separate letter with apostrophe to separate it from the following chunk. The breaking of words in various ways is, of course, subjective interpretation, and would have to be verified by testing the more common divisions on larger chunks of text.

Cheshire translates the above passage as follows:

people and ship in unity take charge mothers/babies of ship to protect life-force pots [he says this is pregnant bellies] yet in he/she at inauspicious/unfavourable he/she is in a/one omen to look it is man not mouse epousee and embrace an opening thus you go but carefully to the queen to facilitate not getting wet with seawater

So before we look into the details of the translation, this supposed narrative seems to me to relate more to river basins and seaports than it does to volcanoes. Cheshire’s contention that this text helps pinpoint the location and time period of the VMS’s creation via a volcanic eruption can definitely be challenged.

But let’s look at the interpretation. Here are some observations:

  • Cheshire has chosen a rare character to represent f/ph, and u/v. Less than 50 instances of one of the most common letters in Latin and Italian in c. 38,000 words of text is hard to believe. In classical Latin versions of Ovid’s Metamorphoses, the u/v character would occur about 15,000 times in 38,000 words (that’s not even including the f).
  • There’s no word “inaus” in Latin, Italian, French, or Spanish (in fact, it’s more Germanic than Romance), so Cheshire has expanded it to mean inauspicious via Latin inauspicatus. Presumably he feels it’s acceptable to subjectively choose which tokens might be truncated.
  • Obviously Cheshire is using variations of “om” to mean homo/people, thus om (people) omas (mothers/babies), omo (man), but he chose to interpret “omenas” as o’menas (take charge) rather than as om enas (people swim). People swimming is arguably more consistent with the surrounding subject matter. This illustrates that his interpretation has a strong element of choice. I’m not even sure why o’menas would mean “take charge”.
  • Some of the translation seems rather nonsensical and hard to relate to volcanoes, such as “to look it is man not mouse and marry and embrace an opening thus you go carefully to the queen to avoid not getting wet with seawater”. Consider that “aisu” is neither Italian nor Latin and the grammar is seriously questionable.
  • I’m not sure why Cheshire seguéd to Persian for “moos” (mouse). Moos is an acceptable alternate spelling for “mus” in western languages. Perhaps it was to justify his choice of Persian to explain another word “omr” which has no equivalents in Romance languages. Going to non-Romance languages when a word doesn’t fit his theoretical framework introduces yet another level of subjective interpretation.
  • The choice of phrase-breaks is clearly also subjective. Cheshire separated “opénas” from “os” even though they go together better than combining “os” with the following phrase. The word “opénas” itself is questionable—it’s not likely to be expressed this way and it could be interpreted quite differently as a penalty, punishment, or even as sympathy.

Overall, there is only a vague coherence to it, one that does not evoke thoughts of volcanoes, and one that makes little grammatical sense.

In his summation of the text, Cheshire does not explain why text unrelated to volcanoes would confirm that the Rotum7 IS a volcano and avoids any explanation of why marriage and the queen would be included.

Confirmation Bias?

In the next section Cheshire identifies the symbol bottom-left as a compass (I personally think it looks more like a sextant, which was used for surveying as well as navigation, but I’m not sure what it represents). His transliteration is “op a æequ ena tas o’naus os o n as aus[pex]”, which he translates to “necessary to equal water balance of ship as it is propitious”.

A compass doesn’t really have anything to do with a ship’s water balance (and doesn’t relate to volcanoes either) and I would like to know why he says “op” means “necessary” when the root “neces-” is common to all major Romance languages. In Romance languages “op” is more likely to equate to “work/produce” than to “necessary”, and once again the grammar is abnormal.

From these two pieces of “translation”, Cheshire takes a logical leap that only two volcanoes might be plausible for Rotum7: Stromboli and Vulcano and states:

“…Vulcano is known to have erupted very violently in the year 1444, which corresponds with the carbon-dating of the manuscript velum: 1404-1438.”

He further translates the Rotum7 inner annotations as “of rock, both directions, not so hot, veers here, it twists, reducing, it slows, middling/forming, of rock it is”.

This could describe mountain springs (the source of water) just as easily as a volcanic eruption. I’m not denying that Rotum7 might be volcanic flow, it’s on my list of possibilities, only that Cheshire’s argument is not as definitive or scientific as he claims. Also, I would like an explanation of how he turned “oqunas asa” into “both directions”.

Origins of Glyph Shapes

Cheshire has this to say about VMS glyph shapes:

“…the symbol is an inverted v with a bar above. It seems to derive from the Greek letter Pi in lowercase (π),…”

I disagree. Pi was rarely written like EVA-x in medieval manuscripts. However, alpha and lambda are sometimes written this way, including Greek, Coptic, and old Russian scripts (I have collected many samples). I think it’s unlikely that EVA-x is based on the shape of Pi.

Rotum7 Side Labels

I can’t go through every translation point-by-point, but if you are reading along, on page 7 of his paper, you’ll notice Cheshire inserted the word “lava” many times when it wasn’t part of the translation. I don’t know if he was trying to convince us or himself.

Note that in two places, he translated “omon” (EVA-otod) as lava. Now take a look at this:

Cheshire translates EVA-otodey as omon ena and EVA-otody as omon ea. In his system, this translates to “lava largest” and “lava smaller”. If this system were applied consistently throughout the manuscript then we are looking at root-suffix constructions, with EVA-ey as largest and EVA-edy as smaller. This has significant implications for interpretation of the rest of the text but Cheshire didn’t address this.

If you’ve been paying attention to the translations, you might have noticed certain inconsistencies. Cheshire presents omo as people/humans and omon as lava, and now omona as “big man” (it’s not hard to follow the logic) but does not explain why these words would occur in other places in the manuscript where the context does not seem relevant. He also inserts increasing levels of subjective interpretation to explain the “story” behind the rosettes folio and asserts that Rotum8 depicts emergency refuge from the eruption and Rotum 9 is emergency relief in the form of free bread on tables.

Summary

As for the letters “o” that occur so frequently at the beginnings of words, Cheshire variously interprets them as conjunctions and articles. I’m not going to argue with this because I think it’s possible the over-abundant leading-“o” glyphs could have a special function as markers or grammatical entitites, but even with this flexibility, Cheshire’s grammar falls apart upon inspection. Even notes and labels usually exhibit certain patterns of consistency, that are not readily apparent in the translation.

I’m also not going to argue with the choice of location for these volcanoes (if they are volcanoes), because I’ve considered the Naples area many times, have blogged about it, and it’s still on my list of favored locations.

But I have trouble accepting the translation in its current form because

  • there are a lot of nonsensical word combinations,
  • there’s almost no grammar,
  • the letter distribution is quite different from Romance languages (it would take a whole blog to discuss this aspect of the text, but take 4 as an example, which almost exclusively is at the beginnings of tokens—Cheshire relates it to “d”, and “9” which is usually at the end and sometimes at the beginning, but almost never in the middle, which he designates as “a”),
  • the words still match the drawings if the drawings are interpreted differently (which means the relationship isn’t proven yet),
  • some of the transliterated “words” don’t show any relationship to Romance word-structures (and the author neglected to explain how specific non-Romance words were derived), and
  • the same words (e.g., “na”) are sometimes interpreted differently.

If Rotum7 turns out to be flows of water, rather than flows of lava, Cheshire’s arguments about time period and location are seriously weakened. Even if it turns out to be lava, the problems with the translation have to be addressed, because it seems more relevant to water than it does to lava.

Consider also that Cheshire’s word “naus” (EVA-daiin) is translated as nautical vessels, but the author doesn’t explain why this exceedingly common Voynich chunk, that is usually at the ends of tokens, would occur in almost every line, and sometimes more than once per line, throughout the manuscript.

Cheshire hasn’t given a satisfactory explanation of why a mid-15th-century scribe would use an undocumented proto-Italian script from c. 700 C.E. or earlier.

And let’s be honest, the translations are semantically peculiar. The human mind is designed to construct meaning from small clues, to fill in the gaps, so it’s easy to read meaning into almost any collection of semi-related words, but it’s very difficult to confirm anything that doesn’t quite hold together in normal ways.

J.K. Petersen

© Copyright 2019 J.K. Petersen, All Rights Reserved


Artifacts… Historic and Combinatorial

Bradley Hauer and Grzegorz Kondrak, of the Department of Computing Science at the University of Alberta, recently released a paper describing their algorithmic analysis of text in the Voynich Manuscript. Here is how the media presented it [underlines added for emphasis]:

  • Independent (UK): Mysterious 15th century manuscript decoded by computer scientists using artificial intelligence
  • Smithsonian: Artificial Intelligence Takes a Crack at Decoding the Mysterious Voynich Manuscript
  • artnet News: AI May Have Just Decoded a Mystical 600-Year-Old Manuscript That Baffled Humans for Decades
  • ScienceAlert: AI May Have Finally Decoded The Bizarre, Mysterious Voynich Manuscript
  • ExtremTech: AI May Have Unlocked the Secrets of the Mysterious Voynich Manuscript

Contrary to the headlines, Hauer and Kondrak make no claim in their paper of having used artificial intelligence (note that the source code is in perl, an excellent language for parsing text, but not usually the first choice for AI routines, unless combined with other software). Nor do they claim they have decoded the VMS. In fact, they explicitly stated:

The results presented in this section could be interpreted either as tantalizing clues for Hebrew as the source language of the VMS, or simply as artifacts of the combinatorial power of anagramming and language models.

There you have it—right up front, “…artifacts of the combinatorial power of anagramming…”

If you aren’t sure what that means, I’ll explain it with some examples…

“Finally, we consider the possibility that the underlying script is an abjad, in which only consonants are explicitly represented.”

First, let’s imagine the VMS were a natural language encrypted without vowels (and without anagramming). In English, three letters like mnt could potentially represent the following words:

mint, minty, minute, mount, Monet, Monty, amount, minuet, enmity

… and that’s just in English. There are thousands of possibilities in other languages.

Imagine if arbitrary anagramming were permitted, as well. The number of interpretations of mnt becomes much greater:

mint, minty, minute, mount, Monet, Monty, amount, minuet, enmity, autumn, anytime, mutiny, ataman, inmate, amenity, atman, ament, amniote, manito, tamein, matin, meant, onetime, toneme, matinee, etamin, motion, etymon, animate, anatomy, emotion…

There is a certain subjective flexibility inherent in 1) anagramming, 2) choosing vowels that seem to work best, and 3) choosing the language that seems most similar to the resulting text.

The Terms of Engagement

In their paper, the researchers declare their focus specifically as “monoalphabetic substitution” ciphers.

How well does this apply to the Voynich Manuscript?

Monoalphabetic ciphers were common in the Middle Ages, and still are, so it is not unreasonable to develop algorithms to crack them

Anagrammed texts are not unusual either, but one would hope that if the VMS text were anagrammed, it would be in some regular way, otherwise the possible interpretations (assuming meaningful text can be extracted), increases exponentially (that’s what the researchers mean by “combinatorial power”). If you have 20 possible interpretations for the first word, and another 20 for the second word, and so on… the number of ways in which the combined words can be decrypted goes into the stratosphere.

The Basic Steps

The researchers state that their first step is to identify the encrypted language. To accomplish this, they are working with a data bank of text samples in natural languages to test and fine-tune the recognition and decryption software. They claim up to 97% accuracy for 380 [natural] languages, and 93% from a smaller pool of 50 arbitrarily anagrammed ciphers in five languages.

So far, so good. Working from this “proof of concept”, they decided to try the software on the Voynich Manuscript—an intriguing experiment.

“However, the biggest obstacle to deciphering the [Voynich] manuscript is the lack of knowledge of what language it represents.”

I would agree—this is a frequent stumbling block to deciphering encrypted text—software to expedite the process would probably be welcomed. Even when the underlying language is known, some codes can be hard to crack, but it should be kept in mind that the VMS may not represent a natural language (or any language).

  • If it is meaningful text, it might be multiple languages, heavily abbreviated, or a synthetic language. There are precedents. In the 12th century, Hildegard von Bingen invented a language that was part cipher, part rule-set, and part glossary lookup. In the 13th century, Roger Bacon invented numerous methods of encrypting text. In the 14th and 15th centuries, Latin texts in many languages were so heavily abbreviated they resembled shorthand. By the 16th century, as the use of Latin faded and global exploration increased, scholars were inventing universal languages to bridge the communication gap.
  • There are also fantasy languages. Edward Talbot/Kelley ingratiated himself  with John Dee by “channeling” angelic language conveyed by spirits in a looking glass. This combined effort produced a “language” now known as Enochian. They also poured over charts in the Book of Soyga, trying to make sense of text that had been algorithmically encoded in a stepwise fashion in page-long charts.

What these examples illustrate is that decipherable and not-so-decipherable texts in many different forms did exist in the Middle Ages.

The Process of Decryption

The best code-breakers are usually good at context-switching, pattern recognition, and lateral thinking… If it isn’t this, then maybe it’s this [insert a completely different form of attack].

Context-switching is not inherent in brute-force methods of coding. Even artificial intelligence programmers struggle to create algorithms that can “think outside the box”. If you have seen the movie “AlphaGo” about the development of Google’s game-playing software that was pitted against Lee Sedol, world-champion Go player, you’ll note near the end that even these programmers admit they used a significant amount of brute-force programming to deal with common patterns that occur in certain positions on the board (known as joseki).

Most software programming is about anticipating scenarios (and building in pre-scripted responses). It is not so easy to write code that analyzes and tries to process inscrutable data in entirely new ways, without human intervention. Many so-called “expert systems” have no AI programming at all. They are essentially very large keyed and prioritized databases. The only thing they “learn” is which lookups the user does most often and this is a simple algorithm that can sometimes be more of a hindrance than a help.

But to get back to Hauer and Kondrak’s attack on the Voynich Manuscript…

The researchers admit that a native Hebrew speaker declared the decrypted first sentence as “not quite a coherent sentence” and that  “a couple of spelling corrections” were made to the text, after which it was fed into Google Translate. Even after this double intervention, the resulting grammar is questionable and I would argue that the Google translation of a couple of specific words is also questionable. Keep in mind that Google’s software is designed to try to make sense of imperfect text.

Too Little, Too Soon

It’s not the fault of the researchers that the press declared this as a solution achieved through artificial intelligence, because neither of these claims is made in their paper, but even so, when attempting to decipher coded information, one has to be very cautious about reading too much into small amounts of text. Sometimes what looks like a pattern falls apart when one examines the bigger picture.

Take, as an example, the system proposed by Stephen Bax in 2014, in which he announced he had decoded about a dozen words. When his substitution system is used to decrypt a full paragraph or even a full sentence on any page of the manuscript, the result is gibberish. What he had was a theory, not a “provisional” decoding. There’s no way to prove one has a solution if it doesn’t generalize to larger blocks of text. Bax is not alone in thinking he had solved the VMS (or parts of it)—many proposed solutions do work in a spotty fashion, but only because they ignore the vast amounts of text that don’t fall into line.

A few words, or even an isolated sentence that seems to make sense here or there, can be found in the VMS in many languages. I’ve located hundreds of words and sometimes full phrases in Greek, Spanish, Portuguese, Latin, and other languages, but I do not have a solution or even a partial solution. The VMS includes thousands of word-tokens in different combinations, almost all of which are going to match something in some language, especially languages with similar statistical properties.

Summary

Hauer and Kondrak have some interesting technology. I can think of many practical uses for it, and some of their graphs provide additional perspective on the VMS. But before everyone jumps on the next bandwagon and declares the VMS solved, I suggest they read the research paper first.

J.K. Petersen

Copyright © 2018, All Rights Reserved

The Strong Solution          6 Feb. 2016

The Strange Story of Leonell Strong

Antiquarian Wilfrid Voynich rediscovered the VMS in a cache of old books in Italy but failed to uncover the contents of the text.

Antiquarian Wilfrid Voynich rediscovered the VMS in a cache of old books in Italy but never solved the mystery of the text.

In 1945, Leonell Strong claimed to have solved the mysterious text of the Voynich Manuscript. He was not the first to attempt to decipher it after antiquarian Wilfrid Voynich acquired it and brought it to America as the Great War broke out in Europe.

In his lifetime, Wilfrid Voynich, a book dealer, corresponded with many people in an effort to decode the VMS and solidify its provenance. If it could be connected with important historical figures, the value would increase and Voynich, a businessman, would profit from his investment.

Voynich died in 1930, no wiser about the contents of the manuscript than when he began. After his death, his wife, Ethel Voynich, continued to try to unlock its secrets, to no avail. William Friedman, an eminent cryptologist, initiated a study group to decipher it in 1944 but, with the war looming large (and perhaps because of lack of progress), the study group was disbanded, in 1946.

You can read an extensive history and ongoing research at voynich.nu.

The manuscript was eventually sold to Hans P. Kraus, who also failed to decode it or sell it at his asking price of $160,000. Kraus eventually donated it to the Beinecke Library, in 1969, where it remains to this day. Before this happened however, Leonell Strong, cancer scientist and amateur cryptographer, came into the picture around the same time Friedman’s study group was trying to decode the manuscript.

The Strong Approach

Voy93rThumbLeonell Strong claimed to have decrypted the text based on analyzing photostats of two of the VMS folios, which he refers to as Folio 78 and Folio 93. There had already been articles about the manuscript published by John M. Manly and Hugh O’Neill in Speculum, in 1921 and 1944, so he was not starting from a blank slate. Based on its format and illustrations, it was already assumed by the 1940s that it might be an herbal and medical text with a particular emphasis on women’s health.

Strong was eager to publish the medical-related information he felt he had uncovered, but he didn’t explain his solution because he wanted to decode more of the pages and was earnestly trying to acquire more photostats.

Strong claimed the reason he didn’t want to reveal his decryption method was because of “present war conditions”. My guess is that he felt the information in the manuscript, if any of it provided unique insights into medieval remedies, would constitute a treasure trove of publishable articles and if he was the first to decipher it, he could benefit from writing up his discoveries. If he revealed his decryption scheme too soon, others might get the data first.

Despite considerable efforts—that were apparently rebuffed—he never received any additional  pages. It has been said that Strong died without revealing his methods, but there are notes to his thought process and if you follow those notes you can puzzle out what he did and where he went wrong and why we are still trying to decode the VMS.

Publications

ScienceJun1945ClipStrong described some of his findings in an article in Science (June 1945), in which he summarizes the background of the manuscript, including the assumption, by O’Neill (1944) that the manuscript must post-date the journeys of Columbus because the VMS includes New World plants (a theme revived in January 2014 by Tucker and Talbot in HerbalGram).

Strong claimed that the VMS was based on “… a double system of arithmetical progressions of a multiple alphabet…” and that the VMS author was familiar with ciphers discussed by Trithemius, Porta, and Selenius as well as one of Leonardo da Vinci’s documents. These historic treatises date from the late 1400s to the 1600s, long after the VMS is thought to have been penned.

StongGlyphsStrong also claimed that certain of the “peculiar” glyphs in the VMS are mirror images of Italian letters but doesn’t explain exactly which VMS letters he means.

Given that Strong wasn’t very good at reproducing the VMS characters himself (the slants, connections, and pen sequence are mostly wrong), his analysis of what inspired the shapes is questionable—VMS shapes are found in many alphabets, including those around the Mediterranean and those in ancient documents recording dead languages.

Strong made further assumptions about what constitutes the VMS “alphabet”. In his chart, he excluded “j” and “z” and included both “u” and “v”. This works for some languages, but not for others. Clearly his assumptions were already influencing his choice of how the information was encoded, before he had barely begun, and his charts further indicate that he never looked beyond a substitution code, even if approached in a reverse numeric fashion.

Anthony Askham—the VMS Author

Many have criticized Strong’s decryption scheme based on his contention that the author of the VMS is Anthony Askham, an English academic active in the mid-1500s. I think the more important question is whether Strong’s decryption process was viable and accurate. Conjecture about who wrote it can come later and the decryption itself shouldn’t be discounted because the hypothesis about who wrote it may be wrong.

StrongLangChartI won’t go through Strong’s entire process here, it’s too long for one article (and there’s no point in detailing a method that doesn’t work), but he created a series of frequency analyses of characters and mapped them to similar analyses of a few European languages and, after assuming which one most closely matched the VMS, he created charts trying to relate various Latin characters to VMS characters for that language, dating each attempt over a series of weeks.

Where Strong Becomes Weak

And now we get to the important part and the reason Strong’s method, already based on a series of possibly incorrect assumptions, doesn’t work. But first, what were the results of his decryption? Here’s a sample of the decrypted text which he describes as medieval English:

WIT SEEK TO EDIT NOT IDLE/IDEL? FOKLUORE FIT ES ME I MEATH TRUNNG IQUERI SELFLI O’ER IT NICLY RUTEN GLAVE QUIR ONGI SEM TE BELI’D

Apparently, Strong was told in no uncertain terms that this was not medieval English and made some later efforts to map the text to Gaelic, apparently without success (or maybe he just gave up).

So why is the text above not medieval English?

To list a few more obvious examples..

  • They don’t have the word “seek” in Old English. In the sense of searching for something, they say áséc or sēċan ‎or, if you’re seeking out something, you can say gitan or begeten. In old Norse and Dutch it’s søk/soek and German, suchan. In Middle English, sēċan became seken.
  • Meath isn’t a word, nor is trunng, although -rung was a common suffix in Old English (e.g., clatrung describes a clattering sound).
  • Iqueri isn’t a word in medieval English. It looks more like Latin and while Latin was often mixed in with Old English, it was not usually done in this way and doesn’t mean anything unless you break it into two words.
  • Selfli isn’t a word, although self– can be used as a prefix (as in selflicne which can mean self-centered or self-satisfied). If the words around it made sense, you could argue that selfli was an abbreviation for selflicne, but the context doesn’t appear to support this interpretation.

Taken together, there are too many words that aren’t really words, they just look familiar (I’ll explain why below), and the grammar doesn’t pan out either. Even if you evaluate it as “note form” writing, it doesn’t appear to have coherent meaning.

Let’s take another passage, quoted by Strong in his article submission, that seems more credible:

HSAWE-TRE APLE ETTEN VNLICH ARUMS CAN DRAVE WICKS AIR FROM SPLEEN: LIKE SISLE HE DRIS GAS AUT OVARI.

This seems as though it might be real medical information, about eating apples and using arum (which Strong interprets as alum without explaining why it might be alum rather than arum lily) and driving air from one’s spleen as well as driving gas from the ovary.

To understand why this isn’t any more credible than the previous quotation, you have to look at how Strong arrived at these words. Did he really decrypt the letters or did he look at many possible combinations of letters and simply guess, for each individual word, what it might be?

The Madness in the Method

How did Strong arrive at these tokens that look so much like real words?

StrongCorresChartOnce he had a system worked out for mapping the VMS letters to Latin letters, he began evaluating each VMS word-token on its own against a list of “alphabets” he had developed for decipherment. In other words, he had several rows of letters (based on letter frequencies) that each VMS letter might represent. Note the column numbers on the far left. He was saying that A could be any of several VMS glyphs, B could be any of several glyphs, etc., on through the alphabet.

Even if you ignore all of his previous assumptions about language and which glyphs constitute the “alphabet”, and his assumptions about character frequency (based on already deciding on the underlying language), even if all those assumptions were correct, here’s where Strong over-reaches in his eagerness to find meaning in the VMS characters.

Strong created a set of index cards with the possible letter correspondences to each VMS glyph. You can see three of the word-tokens recorded in this example in terms of possible letters from the chart mentioned above.

The first has eight different possible interpretations of the six glyphs in the word token, the second has eight interpretations for five glyphs and the third he wasn’t so sure of (it may comprise less common glyphs) and thus he only proposed five for the five glyphs in the third example.

StrongIndexChart

Under each one is the decrypted word. Strong has written ciphre, swais and lunar. How did he arrive at these? From what I can see, he took a letter from each column and combined them with the others until it became something that looked like a word.

He doesn’t appear to be following a mathematical model even though he described it as a mathematical cipher. In fact, examining all the available index cards, it looks like he inserted letters when he couldn’t create a word in a linear fashion. I have no proof of this, but based on the words noted on 13 index cards, it strongly appears as though his word formation process was subjective. There’s no sign of him uncovering a key, as would be needed for the Porta cypher, or of him necessarily having the alphabetic sequence correct, an important aid in deciphering double ciphers with this structure.

StrongIndexChart2If Strong could come up with a word by using a letter from each column, he did so. If he couldn’t get all of them to work together, he made something up to fill in the gaps. The words themselves surely came from his own vocabulary, since other word combinations are possible but he didn’t list them. For example, a token he interpreted as “childe” (which works for the first three columns but not the remaining two) could also be deciphered as POLLIS, DOGFAR, COWHAG, PURPLO, SOWGAS, LOGLAD, LOWGAS, FORLAG, OWLPAR, or several others, using only the letters listed and not adding anything that isn’t (and that’s only if you look for English-sounding tokens).

The next one, interpreted as YOV (YOU?), can just as easily be read as YOR, TOR, POT, GOT, GLO, PIT, TIT, GOO, POO, or POX using his system, so he’s not only subjectively creating the words, he’s subjectively choosing which, out of many possible words, might fit with the words that precede or follow it and then fitting those into his assumption that the text was about plants and medicine.

It’s easy to assume from the drawings that the text is about medical folklore, and that might be the simplest explanation, but we don’t know for certain if the person who created the drawings also added the text. There are herbals from that period that contain only images, the text was never added, so it’s possible the text was added to the VMS by someone else and is sensitive political commentary or historical, rather than relating to plants. Maybe an unfinished herbal compendium was taken into enemy territory as a ruse (the way a botanist was included in one of the European spying expeditions to the Ottoman palace). Perhaps spy observations were added around the drawings.

Summary

Strong assumed English was the underlying language of the VMS based on creating frequency charts for only a few languages and on the assumption that each VMS glyph represented one character. From that very significant assumption, he tried to create English-sounding words by juggling his letter frequency charts and their derived possible alphabets.

MouseOrchestraUnfortunately, even with a subjective infusion of natural-sounding syllables, most of the decrypted text is nonsense and none of it fits any known version of medieval English from the 14th to 17th centuries.

Strong will be remembered for his contributions to oncology and the study of genetics in mice, but his status as a cryptographer will have to remain in the amateur category—a hobby, which means we still have a mystery to solve.

J.K. Petersen

 

© Copyright 2016 J.K. Petersen, All Rights Reserved