Construction of the Voynich Manuscript Text

In previous blogs, I gave quick examples of the rule-dependent way in which Voynich Manuscript glyphs are combined (it’s far beyond the scope of a blog to define the entire rule-set, so I used a single page of text as an example and have been working on a long paper that describes the overall manuscript more fully). I also pointed out some of the more common atomic units, as I think of them.

Since that time I’ve been trying to think of a way to make these patterns and relationships easier to understand.

Hopefully this visualization method can illustrate why computational and linguistic attacks that assess individual glyphs may not yield fruitful results. The VMS has very particular ways of combining glyphs that affect not only which ones appear next to one another with greater frequency, and in what order, but also controls word-length in unique ways.

  • Note, as mentioned in the diagram below, a VMS double-c-shape (adjacent EVA-e glyphs) appears to function as a single unit in much the same way as a double-c shape in Carolingian script (right) represents the single letter “a”.
  • Certain glyphs, like EVA-d and EVA-s appear to function as single glyphs unless paired in very specific ways (with certain glyphs in certain positions, such as EVA-dy at the ends of words).
  • Note the prevalence of combinations like EVA- or, ar, oI, 4o, ly, che, sho, and ey. These pair-syllables, in various combinations with glyphs that function as singles, characterize the entire manuscript, including text in labels and wheels.
  • Note also the difficulty of assessing whether 4o is intended as separate glyphs or as a pair (in some tokens, it could be assigned either way and there are a few other combinations with this characteristic). I have included examples of both possible interpretations of 4o-tokens in the following illustration with the caveat that I am less sure of the 4o- breakdowns than most of the others.

Despite the difficulty of distinguishing singles from pairs with complete accuracy, I think these short examples come close and may help illustrate how the VMS text differs from common natural language patterns and patterns evident in medieval ciphered texts, and especially why one-to-one substitution systems have so far been unsuccessful.


Also, give some thought as to how paired glyphs affect entropy and word length…

Paired glyphs greatly increase the number of letters or sounds a system could potentially represent. For example, if you had only o, a, r, and x, and placed them in a grid as pairs, your four glyphs could yield 16 pair-glyphs plus the four original glyphs to represent 20 letters or sounds. There aren’t as many combinations as this in the VMS, because glyph order is deliberately restricted and it’s not practical to put mirror pairs next to each other as they are hard to distinguish without extra spaces, but even so, the concept applies—entropy increases

As to word length… the VMS word-tokens are already short compared to natural languages, but if some of the glyphs are paired, word-length decreases further. If one is looking for letter or sound correspondence in text that has a large number of paired glyphs, then it’s more likely that they represent syllables, fragments, or abbreviations, rather than full words.

So, enough discussion… here are two examples that I grabbed arbitrarily. They’re short, but hopefully long enough to get the ideas across.

You can click on the image to see it full-sized  (you may have to click again when the new tab opens to read the small print):


Postscript (after getting some much-needed sleep): I hope it is apparent from my previous comments that these are examples, not a definitive breakdown. In the illustration, I have broken down the “4o” words and some of the “9” words (EVA-y) in both ways to show both possibilities—with the 4 and 9 as singles and as pairs, because there is evidence elsewhere in the manuscript that both are possible interpretations. Some pairs (the common ones) are much more consistent and discernible than 4 and 9 word-tokens and I have a long list of stats for some of the more consistent pairs.

The distinction is important because pairs and singles may have different classes of meaning. For example, in Latin, the 9 character (which frequently functions as a single in the VMS, except when paired with EVA-d and possibly EVA-e) expands into prefixes like con- and com- and suffixes like -us and -um, which brings up the question of whether pairs might represent letters and singles might represent abbreviations, as were commonly used in medieval scripts, or (another possibility) whether they were intended to be differentiated in some other way, such as singles representing nulls, modifiers, or markers and pairs representing something else (letters, sounds, or concepts). Note that the singles are often at the beginnings of paragraphs and word-tokens, and also sometimes form one-glyph word-tokens. It is further possible that the high preponderance of “o” glyphs (particularly those in the first position) might be evidence of a pairing process intended to make tokens come out in a certain way (with a particular pattern or length).

All this assumes, of course, that the VMS text is meaningful, something that has not been proven. The pattern of pairs and singles could just as easily have been devised to make it easier to write meaningless text that looks like syllables and abbreviations. I still have a certain cautious optimism that there is meaning behind the text and will post another blog soon that explores some of the details of gallows characters that haven’t yet been discussed.

                                                                                                                                   J.K. Petersen

© Copyright 2017 J.K. Petersen, All Rights Reserved

7 thoughts on “Construction of the Voynich Manuscript Text

    1. J.K. Petersen Post author

      Nick, you’ll have to forgive me for not having read The Curse. I haven’t read any books on the Voynich Manuscript yet but when I find time, yours is probably the one I will read. Most of the others appear to me to be wishful unsupported theories rather than real research.

      Thanks for the link. I took a look at your interpretation and I see you faced the same problems I’ve been experiencing… You’ve marked EVA-sh as a single on the first line and as part of a pair on the last line and the gallows sometimes as parts of a pair and sometimes as singles. This is a dilemma I’ve been trying to solve for a long time (which is why I didn’t post this years ago). I don’t know if you have a rationale to justify one choice over the other, but I don’t have a cohesive theory yet—I’m still trying to discern a precedence rule-set for seeming inconsistencies that works for the whole manuscript (or a big chunk of it).

      In most cases, it appears that we agree. I’m fairly certain that EVA-d, -y, and -s can function alone and I notice you indicated the same thing. I think your interpretation that EVA-sh may function as a single and EVA-ch as part of a pair might be correct, but I cannot find a consistent pattern for EVA-sh, so I’ve left it on the table for now, but it’s my feeling that EVA-r mostly functions as a single (except when prefaced by an explicit pair-glyph like “a” or “o”). Where you indicated “ch-or” on the last two lines, for example, I think there’s a possibility this might be “cho-r”, but it’s hard to know for certain because “or” is one of the most common pairs, with about 400 instances where it stands alone and more than 2,000 where it falls within a word, and the part I haven’t figured out yet is which combination (or IF one combination) takes precedence when two common patterns are combined.

      I notice you indicated “ain” as a pair-on-its-own (or single, depending on how one looks at it). I am still undecided about dain…

      I agree that the “d” can stand alone, but I think it’s quite possible that “ai” is a pair (not part of what follows) and what follows might be a single or pair depending on the length. I don’t have a way of indicating this yet on my chart (I sometimes put a dot on the n or a crescent underneath, if it has an extra “i” with n). My reasoning for the paired nature of “ai” is that “ai” appears throughout the VMS (more than 6,000 times) with many different prefixes and suffixes and appears to function as a pair in most instances. What follows “ai” in “dai–” is quite variable.

      I’ve noticed the Takahashi transcription often ignores the variations at the end of “dain” (which I transcribe as ßaiv, ßaiw, ßaiuv, ßaiuw and which may be related to the variants ßav and ßaw) which means the Takahashi version isn’t good raw material for determining “dain” single/pair patterns. It also doesn’t acknowledge what I strongly believe is a double-c glyph intended to be read as a single glyph (when there are two several in a row, as in cccc, the Takahashi transcription frequently leaves out one or two of the c shapes as if he assumed that the extra c-shapes were some kind of error). This problem occurs when the person creating the transcript consciously or unconsciously applies natural-language patterns to VMS text.

      It doesn’t surprise me to find out you’ve documented the pairing—you’re obviously observant about textual patterns. What does surprise me is how many researchers seem oblivious to this rather important aspect of the text and don’t consider it when doing computational attacks. Some pairs are unclear, but others are unambiguous (like or, ar, ol, al, dy, aj, and a few others) and IF the creator of the Voynich text intended them to resolve into one sense-unit then analyses based on single-glyph frequency counts and letter patterns won’t tell us anything significant. Even if a transcription that identifies paired glyphs is imperfect, it may yield better information than one that ignores the paired nature of the text.

      1. Nick Pelling

        In the end, once you accept that EVA o functions artificially in qo, ol, and or, it’s very hard to sustain the illusion that it functions naturally in any other circumstances.

        And similarly for EVA a: if you accept that al, ar, and am behave artificially, it should be a very short step indeed to accept that a in daiin etc is also artificial.

        Where then does the artificiality end? Once you start seeing the underlying [language] express itself at a token level of groups of glyphs, most of the apparent paradoxes disappear, I believe. 🙂

      2. nicolas georges

        C’est un Livre Hébraique…..voir documents fournis pour sa lecture
        À Edwin Schroeder de Yale…juan José garcia de Burgos ..nicholas Gibbs…ext..ext… lecture en CLAIRE!
        Pour avoir ces documents me contacter sur
        Nicolas g

        1. J.K. Petersen Post author

          Nicholas Gibbs does not have enough knowledge of Medieval Latin to know what is in the VMS. In the “translation” he offered in the Times Literary Supplement, he incorrectly expanded what he perceived to be Latin abbreviations. For example, he did not know that the translation for the Latin scribal “9” character is dependent on position, and that is one of the first and most basic conventions that paleographers and medieval scribal historians learn.

  1. nicolas georges

    Le Codex voynich est un livre d’enseignements hébraïques de la TORAH
    Un enfant pourrait le Lire!
    Nicolas g


Leave a Reply

Your email address will not be published. Required fields are marked *