Monthly Archives: November 2016

Letter Patterns, EVA-j

t-hunayninitialhere’s a glyph in the Voynich Manuscript EVA font-set that is mapped to the “j” key, because it resembles a j to contemporary eyes.

In the 15th century, however, the letter j barely existed. Many European languages used a soft “j” (similar to a “y” as in “you”) and it was written as an “i” preceding another vowel, as in IOANNES (Johannes) and IVLIVS (Julius).

The “j” wasn’t even part of the alphabet—it evolved gradually from an embellished capital “i” that was used for names.

To the medieval eye, the “j” shape was not a letter, it was a Latin abbreviation written as a ligature (two shapes combined together for comfortable writing—something I’ve mentioned in previous blogs about the Voynich glyphs). Here’s an example of -ris, from a 14th-century manuscript, decomposed into its parts.

The letter "r"is on the left and is combined with the shape on the right, which is an abbreviation for "-is".

The letter “r”on the left is combined with the shape on the right, a common Latin abbreviation for “-is” to create the suffix “-ris”.

Depending on the shape of the first stroke, this can stand for “-ris”, “-tis”, or “-cis” and, in some contexts it was also used for the suffix “-rum”, instead of the more common 4-shaped “-rum”.

Origins of VMS Glyph Shapes

The Voynich Manuscript borrows many conventions from Latin, so it’s reasonable to assume that the inspiration for the EVA-j glyph-shape was probably the Latin -ris. It’s also interesting to note that in Latin, -ris occurs more frequently than -cis, and this is also true in the VMS. Whether this has anything to do with the meaning of the glyph or whether it is a case of misdirection (mimicry of Latin shapes without intending the same meaning) is not known but it’s noteworthy that -ris can occur at the end of a word almost anywhere in a Latin sentence, whereas it tends to occur at or near the ends of lines in the Voynich manuscript. The shape is the same; the positional patterns are different.

It’s also noteworthy that almost any letter can occur before -ris/-tis/-cis in Latin, whereas in the Voynich Manuscript it is usually preceded by the EVA-a glyph, as in the following examples:

ajexamplesBut EVA-j is not limited to following the a-glyph. It doesn’t happen often, but it can follow other shapes:


The aj combination is the most frequent, but many other glyphs can precede the EVA-j shape, some of which are unclear as to whether they are “o”, “a”, or something else.

It’s difficult to tell which VMS glyphs are 1) combined shapes meant to be read as one glyph, or 2) combined shapes intended as multiple-glyph ligatures, but there’s some evidence that the Latin -is shape (the righthand side of the -ris) might be a separate glyph in the Voynich manuscript. There are instances where the -is loop is completely disconnected from the previous stroke and some where it is preceded by other glyphs besides the “r”, thus suggesting it may be able to stand alone:


In these examples, the -is glyph is separated from the previous glyph and is preceded by something other than the “r” shape, thus suggesting it may be a separate glyph and possibly used as a ligature.

In Latin, it’s uncommon for the -ris shape to appear anywhere other than the end of a word and even more unusual for two of them to occur in sequence unless they happen to be variations (e.g., -ris followed by -tis). Midword positions are infrequent in the VMS, as well, but they do occur:


In the VMS, “aj” is usually found at the ends of words, usually at the ends of lines, but it is sometimes written midword, as in these examples.

Many transcriptions of the VMS text do not recognize the distinction between the straight “aj” and the curved “aj” (which is part of the reason I created my own transcription), but it might be important to acknowledge the difference partly because they are separate suffixes in Latin, but also because they appear to be clearly distinguished from one another in adjacent examples in the same VMS word-tokens. For example, here we see the -ris shape both preceding and following the -cis shape:


In the first example, there are two -ris shapes and one that may be either -cis with a short stem or a different character entirely. The second and third examples are less ambiguous, however. In both, the -cis glyph precedes the -ris glyph and it appears that the distinction is deliberate, as was the custom in medieval Latin.


If we assume that the looped part of the aj glyph is the right-hand side of a ligature, and could potentially be combined with other glyphs, then we have to look for other instances of its use.

As I illustratedIsRisCisVM back in January (and mentioned in even earlier blogs), the gallows character on the right may be composed of two parts, as well. Even if it is, what the glyph means is anyone’s guess. This shape has different interpretations in different languages—it can be “Il” in French, “lis” in Latin, “Item” in German, and sometimes even a very abbreviated “peri” in Greek. It’s also possible that it’s a capitulum, modifier, or marker, and the similarity to the looped shape in “aj” is coincidental.

Note that the gallows glyph also has certain positional peculiarities that differ from “aj”. It’s frequently preceded by “o” rather than “a”, it’s not usually found at the ends of words or the ends of lines, and might be a counterpart to the gallows glyph with two loops.

roundstraightdOne other detail worth noting is that some of the EVA-d characters have a straight rather than looping stem. Is it possible this shape is a short-stemmed -cis or “j” rather than a “d”? In some places the distinction between them is more dramatic than in this example but are they different enough to be considered different glyphs?

Questions like this can’t be answered by shape alone. Position and frequency have to be considered, as well, to see if they behave differently. I’ve done this kind of analysis on some of the other morphologically similar glyphs, but I haven’t had time to evaluate the short-stemmed -cis to see if it’s different from EVA-d.

J.K. Petersen

© Copyright 2016 J.K. Petersen, All Rights Reserved



Entering the Entropy Zone

I’ve been trying to find a way to introduce the concept of entropy without loading it full of mathematical formulas. The word “entropy” is often invoked when comparing the quantity, frequency, and position of the VMS glyphs, which is easier to describe in numbers than in words. After some consideration, I decided that at least some aspects of text analysis could be described with charts and examples rather than with numbers.

Imagine an ice cube—frozen water. The molecules are linked in a tighter, more ordered structure. When heat is applied, the structure changes, becomes looser, and exhibits higher entropy.


This illustration is over-simplified but can still give an idea of how water molecules are more tightly ordered as ice and more loosely associated and disordered, as steam, thus illustrating states of lower and higher entropy. Similar relationships can be found in text. The association of the VMS glyphs to one another, and their relative quantity and frequency within this arrangement, can be studied and compared to ciphered texts and natural languages and expressed as numerical values.

If you’ve read my previous blogs, you’ve probably noticed I talk about the “structure” of the VMS text being different from natural languages. I gave a nutshell version of it in the blog about creating text that looks more like Voynich text where I described some of the ordering and relationships that are characteristic of the selected sample. I did not write out rules for the entire manuscript because that would take 20 blogs, but the concept can be applied to the text as a whole once it is understood that the glyphs tend to be ordered in a specific way.

So how does the idea of entropy apply to text? Maybe this too, is easier to explain with a diagram.entropychalkboard

  • On the left is an alphabet. By definition, an alphabet contains a specific character set, commonly consisting of consonants and vowels (although not every language has vowels), usually in a specific order decided by convention. In terms of text, an alphabet is relatively low entropy.
  • In the middle are words consisting of nouns, verbs, and a couple of adjectives. Even though it uses the same characters as the alphabet on the left, the characters have greater variance in where they are in relation to other letters and may be used more than once. The letters exhibit higher entropy than the alphabet.
  • On the right is alphabet soup. The letters don’t have to follow any particular order, direction, or spatial relationship to other letters. Alphabet soup has high entropy compared to words, it’s somewhat chaotic (but that’s okay, it tastes good).

Entropy and the VMS

capitali n the Voynich world, there is an oft-quoted statistic that the text exhibits low entropy compared to natural languages. It has been said that only one or two languages come close (with Hawaiian being one of them).

This comes as no surprise if one looks closely at the Voynich text. I created my own transcription of the entire manuscript several years ago, so I had no choice but to examine and evaluate every letter, every space, and one can’t help noticing how certain combinations repeat, and how certain letters re-occur in the same positions with surprising frequency. Line structure follows patterns also, with specific glyphs falling at the beginnings or ends of lines more often than one might expect.

How does the entropy of Voynich text compare to other 15th-century manuscripts? This is a broad and complex question, far beyond the scope of a blog whose purpose is to introduce the idea without all the math, but it probably wouldn’t hurt to show one example (note that entropy and repetition are related but not identical concepts—I’ll deal with repetition more specifically in a separate blog).

Comparing Two Snippets

Here’s an example from folio 81r I chose because the page layout reminds me of a song or poem and it’s not too hard to find 15th-century poetry for comparison. Poetry tends to be more repetitious and regimented than regular text, so I thought a medieval poem might resemble VMS text more than regular narrative text.

Excluding the fragments beginning with “o” on the right, and assuming the “9” and the “o” on the left are single characters, there are 23 word-tokens, and 20 repeated sequences of three characters (I was bleary-eyed from lack of sleep when I first wrote this, so I corrected this paragraph Nov. 10th).


Note that the repeating 3-glyph sequences are always in the same positions at the beginnings or ends of word-tokens. This is not a pattern we typically associate with natural languages except in specific forms of text such as prayers. poetry, or lists.

Compare this to a 22-word snippet from a 15th-century cosmology-themed rhyming poem in Italian that includes 6 repeated sequences:


In this example, there are also three 3-character sequences, but each one repeats only twice. Since this is a rhyming poem, two sequences are at the ends of words (and lines) but, unlike the VMS, the “chi” sequence appears in the middle of one word and at the beginning of another—it’s not positionally constrained.

Here’s another example, from one of the large-plant pages:


I colorized the sample to make it easier to see the patterns. Note that for the purposes of this example, I made the assumption that the “4o” sequence is intended to be together (this appears to be the case in most of the manuscript, but there are exceptions where the “4” appears without the “o”).


Even though the formatting and apparent subject matter of this plant page is quite different from the previous example, there are clearly many similarities, such as a high percentage of repeating sequences: the “4o” combination is almost always followed by a gallows character, the “c” and “r” shapes with tails are at the ends of words, the “9” is usually at the ends of words and frequently follows EVA-ch or EVA-sh, and the Latin “-ris, -cis” abbreviation (EVA-j) is always at the ends of lines (in other parts of the manuscript “j” appears elsewhere, but not as frequently as at the ends of lines). As I’ve mentioned on previous blogs, the structure is quite rigid.

Entropy is measured in a number of ways—it is not limited to repeating glyph sequences. Measures of word-length, character variability, and individual character combinations are all taken into consideration. Notice that the position of characters in relation to each other is more variable in the Italian example and the character set is larger. Most of the VMS text is expressed with about 17 to 20 of the more common glyph-shapes. The old Italic alphabet had only 17 characters, so it’s not an unworkable number but it’s fewer than most alphabets of the time and significantly less if you consider the various diacritical marks and abbreviation symbols that were in regular use. It’s also significantly less if any of the VMS glyphs are markers, nulls, or modifiers.


These snippets are only examples—they don’t mean anything by themselves. Genuine research requires hundreds or sometimes hundreds-of-thousands of samples and many different kinds of comparisons. For a draft tutorial on entropy as it applies to the Voynich manuscript, you can read Anton’s post on the Voynich forum. For mathematical studies of entropy, you can consult scientific journals and blogs, and books such as CryptoSchool by Joachim von zur Gathen. For a basic introduction, however, you can look through the VMS and see that the above patterns are common to the text as a whole—glyph-groups tend to repeat, and the same glyph-groups end up in the same positions much of the time, with variation in letter-position being very constrained, all of which tend to lower the entropy.

Does this argue against the VMS being natural language?


But that’s a subject for another blog.

J.K. Petersen

© Copyright 2016 J.K. Petersen, All Rights Reserved