Tag Archives: cryptography

Did Cicco Simonetta Bomb at Code-Breaking?

First a Few Words…

hose who know me know that I actively avoid looking at previous research about the VMS and have probably only read about 1/50th of what is out there. I hate spoilers and movie trailers—I enjoy the journey and the element of surprise.

If a new puzzle or game comes out, something like a Rubik’s cube, then lock me in a room and I’m happy. If you give me a book on how to solve it, or even the smallest of hints, I’m not happy—I want to solve it myself.

If I have an hour to spend reading someone’s analysis of the VMS or looking at the VMS itself, I usually choose the VMS. I like primary sources. If I have to learn a new language or other skills to understand it, that’s fine. It’s hard to find the time, but the effort is worth it.

Then along comes the Voynich forum and a personal dilemma… I want to support the forum. It’s a good thing because not everyone has blog-space and it provides them a more neutral environment to publish their findings than someone else’s blog. But it’s difficult to actively support a forum without reading it and if I’m reading it, I should be contributing, as well—to give something back. So… the peaceful days in my little cave are over and I’m now part of the “Voynich community”.

It’s not a bad thing, times change and we have to adapt, and I’ve met people I like and respect, but I’m in this weird twilight zone—I’ve only read a small portion of the prior research, which means I have no idea what people are talking about on some of their blogs!

Which brings us to the topic of today’s blog…

Enticed by a blogosphere note on the Voynich forum, I visited Nick Pelling’s Cipher Mysteries site today, where he posted a summary of Philip Neal’s translation of Cicco Simonetta’s treatise on decipherment.

I’ve barely heard of Philip Neal and I know nothing about Cicco Simonetta, so I was happy to see a summary, but I had a what-the-heck? reaction as soon as I started reading it. Who was this Cicco Simonetta dude and where did he get this information? I couldn’t believe my eyes and had to look up the full translation to confirm my impression… and then was even more surprised. It wasn’t some cockamamie 20th-century misunderstanding of 15th-century code-breaking, this was written in the 15th century!

The only way I can think of to explain my reaction is to go through the major points. It’s dated 1474, Pavia, as a treatise on extracting ciphered writings.

Note that Simonetta appears to be describing only Italian or Latin as possible languages for the ciphered text, even though there were many ciphered documents in German, Spanish, and French in the general region of northern Italy. At least I hope he’s only talking about Italian when he says “vulgar language”, because the generalizations only make sense in that light.

Simonetta’s Suggestions for 15th-century Code-Breaking

Evaluate the Word Endings

First Simonetta suggests looking at endings to determine if the code is in Latin or “the vulgar tongue” and counsels that five or less variations indicate vulgar tongue.

Right away we know Simonetta must be assuming that there are no null characters, that the spaces are real (not contrived or arbitrary), and that this is a one-to-one substitution code, otherwise it’s impossible, without significant analysis (and a little bit of luck) to determine which parts of the code are word endings.

Is it valid for Simonetta to make this assumption in the 15th century?

Sometimes.

Many codes were, in fact, one-to-one substitution codes, but it’s certainly not a given—it’s an extremely low level of encipherment. If there’s enough text, you can simply stare at it for a while and the word-structure starts to become clear (you begin to see where the vowels and consonants are) and then the general language group becomes easier to recognize and, if you can narrow it down to a language group, after a while words start popping out at you.

This is what happened when I recently read a long manuscript in a dead simple substitution code based on astrological symbols. After a few pages, it was clear that it was probably Latin, and then words like “frigida” and “elleborus niger” started popping out. It’s like playing a game where they show you three out of nine letters, but you get to see a whole paragraph, not just one word, and the brain puts the pieces together. After a couple of dozen pages, you can simply read it.

But not all codes are one-to-one substitution codes. In 15th-century Italy, one-to-many/many-to-one/with-null codes were common. In the 1400s, Tranchedino collected many such codes. Several symbols could stand for one letter, several letters could be expressed with one symbol, and several null characters were often included, all in a single cipher. In addition to the alphabetic rules, many names were ciphered from a glossary, rather than following the rules for the rest of the text. In other words, there’s no consistency in the way glyphs correspond to letters that can be used to analyze the text. And thus, there’s no way to evaluate word endings or any individual letter in the manner Simonetta suggests.

Look for One-Character Words

Simonetta goes on to say that if there are many words represented by one cipher, that the code is in the vulgar tongue (Italian) and is rarely Latin because in Latin “there be no words presented by one only letter or cipher saving four words…” Again, this presupposes that the spaces are real but is also deeply perplexing coming from someone with a “fine education” in classical languages, because it’s not true.

Simonetta’s generalization completely ignores the multitude of abbreviations that were regularly used in Latin. Sometimes whole sentences were written with one-character abbreviations. “Et” was frequently written with the character 7. D stood for domine or dominus, A for anno. I could go on for two paragraphs citing all the examples. There’s no basis for assuming ciphered text would be written out in full Latin when use of abbreviations was so ingrained.

And guess what… I almost snorted my drink when I noticed, in Simonetta’s own treatise, that he uses common one-character Latin abbreviations such as q for “qui” or “quo” and p for “per” or “pro”, thus contradicting himself in his own writings. In a cipher it’s easy to create a distinction between “per” or “pro” by the length or slant of part of the glyph (it’s difficult for decrypters to know which variations of the pen are part of the handwriting and which ones carry meaning, as Voynich researchers themselves have surely noticed).

Pay Attention to Letter Endings

After some details about “vulgar language” word patterns, Simonetta counsels the Latin decrypters to examine letters at the ends of words, pointing out that “the most part of Latin words conclude either in a vowel, or in s, or in m, or in t…”. Once again, this completely ignores the way Latin was commonly written. Word endings were often omitted entirely, sometimes with a line over the word or a swoop of the tail standing in for the missing letters. There are also many terminal ligatures. The letters “is” might be spelled out at the end of one word and then abbreviated with a simple stroke on another—the meaning is the same, only space (or habit) dictates which one is chosen. Simonetta’s writing uses this convention as well, so it’s odd that he would not consider this possibility.

Summary

I’d like to try to redeem Simonetta by saying that his advice might be useful for decoding simple substitution codes in Italian, but Italian, German, French, and Spanish scribes used many of the same abbreviation conventions as Latin, which means the same caveats apply.

Even simple substitution codes sometimes manipulate the position of the spaces. As I’ve mentioned near the bottom in a previous blog, Pal. Germ 597 (a manuscript that includes a number of paragraphs in code) has a page of plaintext broken into syllables. Even a simple adjustment to the spacing, one of the easiest ways to manipulate a substitution code, makes it difficult to determine word length or to find word endings as per Simonetta—other methods are more effective.

As food for thought, I’ll leave a typical example from Tranchedino’s collection and you can judge for yourself whether any of Simonetta’s advice is useful for decrypting 15th-century ciphers. You may also notice a few glyphs are similar to VMS glyphs but I think it’s probably because they are common symbols, not because they’re directly related:

J.K. Petersen

© Copyright 2016 J.K. Petersen, All Rights Reserved

The Strong Solution          6 Feb. 2016

The Strange Story of Leonell Strong

Antiquarian Wilfrid Voynich rediscovered the VMS in a cache of old books in Italy but failed to uncover the contents of the text.

Antiquarian Wilfrid Voynich rediscovered the VMS in a cache of old books in Italy but never solved the mystery of the text.

In 1945, Leonell Strong claimed to have solved the mysterious text of the Voynich Manuscript. He was not the first to attempt to decipher it after antiquarian Wilfrid Voynich acquired it and brought it to America as the Great War broke out in Europe.

In his lifetime, Wilfrid Voynich, a book dealer, corresponded with many people in an effort to decode the VMS and solidify its provenance. If it could be connected with important historical figures, the value would increase and Voynich, a businessman, would profit from his investment.

Voynich died in 1930, no wiser about the contents of the manuscript than when he began. After his death, his wife, Ethel Voynich, continued to try to unlock its secrets, to no avail. William Friedman, an eminent cryptologist, initiated a study group to decipher it in 1944 but, with the war looming large (and perhaps because of lack of progress), the study group was disbanded, in 1946.

You can read an extensive history and ongoing research at voynich.nu.

The manuscript was eventually sold to Hans P. Kraus, who also failed to decode it or sell it at his asking price of $160,000. Kraus eventually donated it to the Beinecke Library, in 1969, where it remains to this day. Before this happened however, Leonell Strong, cancer scientist and amateur cryptographer, came into the picture around the same time Friedman’s study group was trying to decode the manuscript.

The Strong Approach

Voy93rThumbLeonell Strong claimed to have decrypted the text based on analyzing photostats of two of the VMS folios, which he refers to as Folio 78 and Folio 93. There had already been articles about the manuscript published by John M. Manly and Hugh O’Neill in Speculum, in 1921 and 1944, so he was not starting from a blank slate. Based on its format and illustrations, it was already assumed by the 1940s that it might be an herbal and medical text with a particular emphasis on women’s health.

Strong was eager to publish the medical-related information he felt he had uncovered, but he didn’t explain his solution because he wanted to decode more of the pages and was earnestly trying to acquire more photostats.

Strong claimed the reason he didn’t want to reveal his decryption method was because of “present war conditions”. My guess is that he felt the information in the manuscript, if any of it provided unique insights into medieval remedies, would constitute a treasure trove of publishable articles and if he was the first to decipher it, he could benefit from writing up his discoveries. If he revealed his decryption scheme too soon, others might get the data first.

Despite considerable efforts—that were apparently rebuffed—he never received any additional  pages. It has been said that Strong died without revealing his methods, but there are notes to his thought process and if you follow those notes you can puzzle out what he did and where he went wrong and why we are still trying to decode the VMS.

Publications

ScienceJun1945ClipStrong described some of his findings in an article in Science (June 1945), in which he summarizes the background of the manuscript, including the assumption, by O’Neill (1944) that the manuscript must post-date the journeys of Columbus because the VMS includes New World plants (a theme revived in January 2014 by Tucker and Talbot in HerbalGram).

Strong claimed that the VMS was based on “… a double system of arithmetical progressions of a multiple alphabet…” and that the VMS author was familiar with ciphers discussed by Trithemius, Porta, and Selenius as well as one of Leonardo da Vinci’s documents. These historic treatises date from the late 1400s to the 1600s, long after the VMS is thought to have been penned.

StongGlyphsStrong also claimed that certain of the “peculiar” glyphs in the VMS are mirror images of Italian letters but doesn’t explain exactly which VMS letters he means.

Given that Strong wasn’t very good at reproducing the VMS characters himself (the slants, connections, and pen sequence are mostly wrong), his analysis of what inspired the shapes is questionable—VMS shapes are found in many alphabets, including those around the Mediterranean and those in ancient documents recording dead languages.

Strong made further assumptions about what constitutes the VMS “alphabet”. In his chart, he excluded “j” and “z” and included both “u” and “v”. This works for some languages, but not for others. Clearly his assumptions were already influencing his choice of how the information was encoded, before he had barely begun, and his charts further indicate that he never looked beyond a substitution code, even if approached in a reverse numeric fashion.

Anthony Askham—the VMS Author

Many have criticized Strong’s decryption scheme based on his contention that the author of the VMS is Anthony Askham, an English academic active in the mid-1500s. I think the more important question is whether Strong’s decryption process was viable and accurate. Conjecture about who wrote it can come later and the decryption itself shouldn’t be discounted because the hypothesis about who wrote it may be wrong.

StrongLangChartI won’t go through Strong’s entire process here, it’s too long for one article (and there’s no point in detailing a method that doesn’t work), but he created a series of frequency analyses of characters and mapped them to similar analyses of a few European languages and, after assuming which one most closely matched the VMS, he created charts trying to relate various Latin characters to VMS characters for that language, dating each attempt over a series of weeks.

Where Strong Becomes Weak

And now we get to the important part and the reason Strong’s method, already based on a series of possibly incorrect assumptions, doesn’t work. But first, what were the results of his decryption? Here’s a sample of the decrypted text which he describes as medieval English:

WIT SEEK TO EDIT NOT IDLE/IDEL? FOKLUORE FIT ES ME I MEATH TRUNNG IQUERI SELFLI O’ER IT NICLY RUTEN GLAVE QUIR ONGI SEM TE BELI’D

Apparently, Strong was told in no uncertain terms that this was not medieval English and made some later efforts to map the text to Gaelic, apparently without success (or maybe he just gave up).

So why is the text above not medieval English?

To list a few more obvious examples..

  • They don’t have the word “seek” in Old English. In the sense of searching for something, they say áséc or sēċan ‎or, if you’re seeking out something, you can say gitan or begeten. In old Norse and Dutch it’s søk/soek and German, suchan. In Middle English, sēċan became seken.
  • Meath isn’t a word, nor is trunng, although -rung was a common suffix in Old English (e.g., clatrung describes a clattering sound).
  • Iqueri isn’t a word in medieval English. It looks more like Latin and while Latin was often mixed in with Old English, it was not usually done in this way and doesn’t mean anything unless you break it into two words.
  • Selfli isn’t a word, although self– can be used as a prefix (as in selflicne which can mean self-centered or self-satisfied). If the words around it made sense, you could argue that selfli was an abbreviation for selflicne, but the context doesn’t appear to support this interpretation.

Taken together, there are too many words that aren’t really words, they just look familiar (I’ll explain why below), and the grammar doesn’t pan out either. Even if you evaluate it as “note form” writing, it doesn’t appear to have coherent meaning.

Let’s take another passage, quoted by Strong in his article submission, that seems more credible:

HSAWE-TRE APLE ETTEN VNLICH ARUMS CAN DRAVE WICKS AIR FROM SPLEEN: LIKE SISLE HE DRIS GAS AUT OVARI.

This seems as though it might be real medical information, about eating apples and using arum (which Strong interprets as alum without explaining why it might be alum rather than arum lily) and driving air from one’s spleen as well as driving gas from the ovary.

To understand why this isn’t any more credible than the previous quotation, you have to look at how Strong arrived at these words. Did he really decrypt the letters or did he look at many possible combinations of letters and simply guess, for each individual word, what it might be?

The Madness in the Method

How did Strong arrive at these tokens that look so much like real words?

StrongCorresChartOnce he had a system worked out for mapping the VMS letters to Latin letters, he began evaluating each VMS word-token on its own against a list of “alphabets” he had developed for decipherment. In other words, he had several rows of letters (based on letter frequencies) that each VMS letter might represent. Note the column numbers on the far left. He was saying that A could be any of several VMS glyphs, B could be any of several glyphs, etc., on through the alphabet.

Even if you ignore all of his previous assumptions about language and which glyphs constitute the “alphabet”, and his assumptions about character frequency (based on already deciding on the underlying language), even if all those assumptions were correct, here’s where Strong over-reaches in his eagerness to find meaning in the VMS characters.

Strong created a set of index cards with the possible letter correspondences to each VMS glyph. You can see three of the word-tokens recorded in this example in terms of possible letters from the chart mentioned above.

The first has eight different possible interpretations of the six glyphs in the word token, the second has eight interpretations for five glyphs and the third he wasn’t so sure of (it may comprise less common glyphs) and thus he only proposed five for the five glyphs in the third example.

StrongIndexChart

Under each one is the decrypted word. Strong has written ciphre, swais and lunar. How did he arrive at these? From what I can see, he took a letter from each column and combined them with the others until it became something that looked like a word.

He doesn’t appear to be following a mathematical model even though he described it as a mathematical cipher. In fact, examining all the available index cards, it looks like he inserted letters when he couldn’t create a word in a linear fashion. I have no proof of this, but based on the words noted on 13 index cards, it strongly appears as though his word formation process was subjective. There’s no sign of him uncovering a key, as would be needed for the Porta cypher, or of him necessarily having the alphabetic sequence correct, an important aid in deciphering double ciphers with this structure.

StrongIndexChart2If Strong could come up with a word by using a letter from each column, he did so. If he couldn’t get all of them to work together, he made something up to fill in the gaps. The words themselves surely came from his own vocabulary, since other word combinations are possible but he didn’t list them. For example, a token he interpreted as “childe” (which works for the first three columns but not the remaining two) could also be deciphered as POLLIS, DOGFAR, COWHAG, PURPLO, SOWGAS, LOGLAD, LOWGAS, FORLAG, OWLPAR, or several others, using only the letters listed and not adding anything that isn’t (and that’s only if you look for English-sounding tokens).

The next one, interpreted as YOV (YOU?), can just as easily be read as YOR, TOR, POT, GOT, GLO, PIT, TIT, GOO, POO, or POX using his system, so he’s not only subjectively creating the words, he’s subjectively choosing which, out of many possible words, might fit with the words that precede or follow it and then fitting those into his assumption that the text was about plants and medicine.

It’s easy to assume from the drawings that the text is about medical folklore, and that might be the simplest explanation, but we don’t know for certain if the person who created the drawings also added the text. There are herbals from that period that contain only images, the text was never added, so it’s possible the text was added to the VMS by someone else and is sensitive political commentary or historical, rather than relating to plants. Maybe an unfinished herbal compendium was taken into enemy territory as a ruse (the way a botanist was included in one of the European spying expeditions to the Ottoman palace). Perhaps spy observations were added around the drawings.

Summary

Strong assumed English was the underlying language of the VMS based on creating frequency charts for only a few languages and on the assumption that each VMS glyph represented one character. From that very significant assumption, he tried to create English-sounding words by juggling his letter frequency charts and their derived possible alphabets.

MouseOrchestraUnfortunately, even with a subjective infusion of natural-sounding syllables, most of the decrypted text is nonsense and none of it fits any known version of medieval English from the 14th to 17th centuries.

Strong will be remembered for his contributions to oncology and the study of genetics in mice, but his status as a cryptographer will have to remain in the amateur category—a hobby, which means we still have a mystery to solve.

J.K. Petersen

 

© Copyright 2016 J.K. Petersen, All Rights Reserved