Hidden Markov Models

Terry Ritter

A Ciphers By Ritter Page

It appears that we do not know very much about Hidden Markov Models (HMM's).

2000-08-20 John Savard: "Having heard from some posts that the concept of "Hidden Markov Models" is relevant to cryptanalysis, I did some web searching . . . ."
2000-08-21 Douglas A. Gwyn: "Basically, all that matters . . . is that it runs forward and backward through the observed output using the actual transitions to successively refine an initial estimate of the HMM parameters"
2000-08-22 Gunnar Evermann: "Baum-Welch only finds a _local_ maximum . . . ."
2000-08-22 Douglas A. Gwyn: "In practice in cryptologic applications, a good initial estimate is often easy to devise . . . ."
2000-08-22 Gunnar Evermann: "I was thinking of continuous HMMs as used in e.g. speech recognition."
2000-08-22 Douglas A. Gwyn: "Baum-Welch-Eagon MLE can be thought of as "training" the model using that particular output string; then the model can be "run" to generate a synthetic output sequence with the same parameters that were fitted to the original actual data."
2000-08-23 Douglas A. Gwyn: "The model is simply a finite number of (internal) states that have fixed state-transition probabilities (ordinary Markov model) and the additional feature that the only thing observable is output that depends probabilistically on the transition."
2000-08-21 John Savard: ". . . the URL should be . . . ."
2000-08-21 Tim Tyler: "Try instead . . . ."
2001-02-18 John A. Malley: "I just found (yesterday) a good (IMO) introduction to statistical language learning, hidden markov models and probabilistic grammars . . . ."
2001-02-20 Douglas A. Gwyn: "There is now a vast literature on technology and applications of HMMs, but there are only a couple of articles I know of in the open literature specifically addressing application to cryptanalysis . . . ."

Subject: Hidden Markov Models on web site!
Date: Sun, 20 Aug 2000 21:47:56 GMT
From: jsavard@fNrOeSePnAeMt.edmonton.ab.ca (John Savard)
Message-ID: <39a050fc.14626826@news.ecn.ab.ca>
Newsgroups: sci.crypt
Lines: 17

Having heard from some posts that the concept of "Hidden Markov
Models" is relevant to cryptanalysis, I did some web searching, and
gained enough understanding to put a partial introduction on my web
page at

http://home.ecn.ab.ca/~jsavard/co041003.htm

This introduction only goes up to the Viterbi algorithm; I did not
understand such matters as Baum-Welch estimation well enough to
attempt to describe them, but I did note the name to assist those
seeking to do further reading. Also, since Kullback-Leibler distance
(between probability distributions - related to Friedman's chi test)
was mentioned, I checked, and yes, that was Solomon Kullback, among
those who broke PURPLE.

John Savard
http://home.ecn.ab.ca/~jsavard/crypto.htm

Subject: Re: Hidden Markov Models on web site!
Date: Mon, 21 Aug 2000 14:22:20 GMT
From: "Douglas A. Gwyn" <gwyn@arl.army.mil>
Message-ID: <39A13B1C.3A367077@arl.army.mil>
References: <39a050fc.14626826@news.ecn.ab.ca>
Newsgroups: sci.crypt

John Savard wrote:
> ... I did not understand such matters as Baum-Welch estimation
> well enough to attempt to describe them, ...

Basically, all that matters (unless you have to implement it)
is that it runs forward and backward through the observed
output using the actual  transitions to successively refine an
initial estimate of the HMM parameters, and the algorithm has
been proved to be stable and convergent.  The result is a set
of parameters for the HMM for which the observed output is at
least as likely as for any other set of parameters.  That's
an example of Maximum Likelihood Estimation, which is widely
used in data analysis.  Maximum-Likelihood methods generally
fit the observed data *too* tightly, but it's a reasonable
criterion that is mathematically tractable, and in practice it
often produces good results.

Subject: Re: Hidden Markov Models on web site!
Date: 22 Aug 2000 16:57:34 +0100
From: Gunnar Evermann <ge204@eng.cam.ac.uk>
Message-ID: <mqq8ztp2sr5.fsf@eng.cam.ac.uk>
References: <39A13B1C.3A367077@arl.army.mil>
Newsgroups: sci.crypt
Lines: 18

"Douglas A. Gwyn" <gwyn@arl.army.mil> writes:

> John Savard wrote:
> > ... I did not understand such matters as Baum-Welch estimation
> > well enough to attempt to describe them, ...
> 
> Basically, all that matters (unless you have to implement it) is
> that it runs forward and backward through the observed output using
> the actual transitions to successively refine an initial estimate of
> the HMM parameters, and the algorithm has been proved to be stable
> and convergent. The result is a set of parameters for the HMM for
> which the observed output is at least as likely as for any other set
> of parameters.

Baum-Welch only finds a _local_ maximum, it is not guaranteed to find
the global maximum.

  Gunnar

Subject: Re: Hidden Markov Models on web site!
Date: Tue, 22 Aug 2000 17:47:39 GMT
From: "Douglas A. Gwyn" <gwyn@arl.army.mil>
Message-ID: <39A2BCBB.3519D9EF@arl.army.mil>
References: <mqq8ztp2sr5.fsf@eng.cam.ac.uk>
Newsgroups: sci.crypt
Lines: 24

Gunnar Evermann wrote:
> Baum-Welch only finds a _local_ maximum,
> it is not guaranteed to find the global maximum.

Yes; as I said, it refines an initial estimate.
If the initial estimate is too far off, for certain
kinds of system the convergence will indeed be to a
local maximum-likelihood peak (over the parameter
space) that is lower than the global maximum.

In practice in cryptologic applications, a good
initial estimate is often easy to devise, for example
equiprobability.  The end result produces categories
in some order that might be different from what would
have been obtained from a different initial estimate,
but (usually) they are the same *categories*.  E.g.
(nouns,adjectives,verbs) as opposed to (verbs,nouns,
adjectives), if the units are (thinly disguised) PT
words, or (vowels,consonants) if the units are
(thinly disguised) PT letters.  (Actually, real
linguistic categories are not quite that simple, but
the general idea is right.  There are also uses to
unravel internal structure of algorithmic systems,
but I don't know of any good examples I can present.)

Subject: Re: Hidden Markov Models on web site!
Date: 22 Aug 2000 21:01:58 +0100
From: Gunnar Evermann <ge204@eng.cam.ac.uk>
Message-ID: <mqqu2cd12vd.fsf@eng.cam.ac.uk>
References: <39A2BCBB.3519D9EF@arl.army.mil>
Newsgroups: sci.crypt
Lines: 25

"Douglas A. Gwyn" <gwyn@arl.army.mil> writes:

> Gunnar Evermann wrote:
> > Baum-Welch only finds a _local_ maximum, it is not guaranteed to
> > find the global maximum.
> 
> Yes; as I said, it refines an initial estimate. If the initial
> estimate is too far off, for certain kinds of system the convergence
> will indeed be to a local maximum-likelihood peak (over the
> parameter space) that is lower than the global maximum.
> 
> In practice in cryptologic applications, a good initial estimate is
> often easy to devise, for example equiprobability. The end result
> produces categories in some order that might be different from what
> would have been obtained from a different initial estimate, but
> (usually) they are the same *categories*.

Ok, fair enough. I was thinking of continuous HMMs as used in e.g.
speech recognition. There the initialisation is more difficult and
potentially more important than for the discrete case.

-- 
       Gunnar Evermann
Speech, Vision & Robotics Group
Cambridge University Engineering Department

Subject: Re: Hidden Markov Models on web site!
Date: Tue, 22 Aug 2000 17:58:10 GMT
From: "Douglas A. Gwyn" <gwyn@arl.army.mil>
Message-ID: <39A2BF32.8FD94FD7@arl.army.mil>
References: <39A2AAE0.EDE1B100@t-online.de>
   <39A13B1C.3A367077@arl.army.mil>
Newsgroups: sci.crypt
Lines: 24

Mok-Kong Shen wrote:
> Can the HMM be used for predicting sequences?

Sure.  Baum-Welch-Eagon MLE can be thought of as "training"
the model using that particular output string; then the
model can be "run" to generate a synthetic output sequence
with the same parameters that were fitted to the original
actual data.  If you preload the model state to match a
particular (generally non-training) observed string, then
when you run it it produces a MLE prediction for subsequent
observations.

> Are there any literatures on its applications in crypto?

Well, there is considerable literature, but not much of
it is available to the general public.  There was one
Master's thesis or Doctoral dissertation (I forget which)
and as I recall a couple of articles in Cryptologia.
(I'd have to rummage through my files to find specifics.)
Also, I managed to get one of D'Imperio's NSATJ articles
on HMM applied to analysis of the "hands" in the Voynich
manuscript declassified two years ago; it is summarized
on Jim Reed's Voynich page but so far as I know has not
been made available on-line in its entirety.

Subject: Re: Hidden Markov Models on web site!
Date: Wed, 23 Aug 2000 13:23:06 -0400
From: "Douglas A. Gwyn" <DAGwyn@null.net>
Message-ID: <39A4087A.DFF32218@null.net>
References: <39A2D7B8.58451824@t-online.de>
   <39A2BF32.8FD94FD7@arl.army.mil>
Newsgroups: sci.crypt
Lines: 68

Mok-Kong Shen wrote:
> From what I read in this thread I got the (maybe wrong)
> impression that the sequences dealt with are character
> strings. Would HMM be powerful enough to deal with bit
> strings (which are at a finer granularity) from some
> not too bad PRNGs (to predict future output)?

HMM is a general technique, not related to text characters
except when you choose to apply it to them.  The model
is simply a finite number of (internal) states that
have fixed state-transition probabilities (ordinary Markov
model) and the additional feature that the only thing
observable is output that depends probabilistically on
the transition.  (Each piece of output for a single
transition can be *called* a "character" for convenience
in discussion, but the only assumed property is that it
is uniquely identifiable.  The output "character set"
might be the set {0,1}, for example.)  So the output
contains clues about what is going on "inside" the model,
but the clues are indirect since the same output can
result from many possible transitions.  There are further
clues about the state sequence contained in the *sequence*
of outputs.  The process of fitting the model to an
observed output sequence amounts to estimating what
sequence of states (thus transitions) would be the most
likely way of producing that output.  Of course, that
guess could be wrong, but given enough output the
trained model (specified by the parameters for the best
fit) usually tends to reflect what is actually going on
inside the (hidden) system.  It's sort of like a fuzzy

How successful one is at fitting an HMM is largely a
function of how well the assumed number of states
"resonates with" the internal system structure.  You
could try various assumptions and see if any produce
good fits.  Presumably there is an objective measure
of goodness of fit of an HMM to the observations,
although it isn't used directly in the usual fitting
algorithm.  Informally, if the transition and output
probability matrices are quite far from random, there
is a good fit.  For example, a transition matrix
   A   B   C
A 0.7 0.1 0.2
B 0.2 0.5 0.3
C 0.1 0.4 0.5
shows a fairly good categorization, in that the system
tends to stay in the same state, and there are decided
biases in transitions out of the state (e.g. C->A is
4 times less likely than C->B).  Suppose the character
set is {0,1} and the output probabilities are:
out 0:  A   B   C
     A 0.1 0.7 0.2
     B 0.3 0.1 0.6
     C 0.6 0.2 0.2
P(out 1) = 1 - P(out 0)
That shows that most "1"s occur when the system remains
in the same state (at this level of modeling), which is
a potentially valuable clue about how the system works.
Of course, the *actual* system structure would involve
a lot more than 3 states; if we had picked the exact
number *and* had enough data, all the matrix entries
would be very close to 0 or 1 since the system is
deterministic at that level.  By using fewer states,
we lose the details and end up with a probabilistic
model, but some trends can still be seen, and these
often indicate some large-scale aspects of how the
system actually functions.

Subject: Re: Hidden Markov Models on web site!
Date: 21 Aug 2000 15:45:01 -0700
From: jsavard@ecn.ab.ca ()
Message-ID: <39a1a2dd@ecn.ab.ca>
References: <39A15873.6D0A79DE@t-online.de>
   <39a050fc.14626826@news.ecn.ab.ca>
Newsgroups: sci.crypt
Lines: 11

Mok-Kong Shen (mok-kong.shen@t-online.de) wrote:
: John Savard wrote:

: > http://home.ecn.ab.ca/~jsavard/co041003.htm

: The page was not accessible. Could you please look
: after the matter?

Apologies: the URL should be

http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm

Subject: Re: Hidden Markov Models on web site!
Date: Mon, 21 Aug 2000 23:44:45 GMT
From: Tim Tyler <tt@cryogen.com>
Message-ID: <Fzo1yL.DEE@bath.ac.uk>
References: <39A15873.6D0A79DE@t-online.de>
   <39a050fc.14626826@news.ecn.ab.ca>
Newsgroups: sci.crypt
Lines: 11

Mok-Kong Shen <mok-kong.shen@t-online.de> wrote:
: John Savard wrote:

:> http://home.ecn.ab.ca/~jsavard/co041003.htm

: The page was not accessible. [...]

Try instead: http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm
--
__________  Lotus Artificial Life  http://alife.co.uk/  tt@cryogen.com
 |im |yler  The Mandala Centre   http://mandala.co.uk/  Namaste.

Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption"
Date: Sun, 18 Feb 2001 22:23:07 -0800
From: "John A. Malley" <102667.2235@compuserve.com>
Message-ID: <3A90BBCB.E8977ACF@compuserve.com>
References: <3A90A968.4195911B@null.net>
   <slrn99116d.5df.sjmeyer@localhost.localdomain>
Newsgroups: sci.crypt
Lines: 26

"Douglas A. Gwyn" wrote:
> 
[snip]
> 
> What you mean is a PT source with very low entropy.  Such a
> source makes the cryptanalysis difficult for nearly any system,
> not just the kind you describe.
> 
> Note that genetic protein sequences have nonnegligible entropy.
> Indeed, HMMs have been used to characterize (model) them.

There's been a number of references to Hidden Markov Models and
cryptanalysis in various threads over the past few months.  I  just
found (yesterday) a good (IMO) introduction to statistical language
learning, hidden markov models and probabilistic grammars - "Statistical
Language Learning" by Eugene Charniak, MIT Press, 1993 - ISBN
0-262-53141-0 - at a Borders bookstore.

Are there other books on the subject you'd recommend - especially any
covering cryptanalysis with hidden markov models?

Thanks for any pointers, 

John A. Malley
102667.2235@compuserve.com

Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption"
Date: Tue, 20 Feb 2001 04:27:34 GMT
From: "Douglas A. Gwyn" <DAGwyn@null.net>
Message-ID: <3A91F272.4F598F3C@null.net>
References: <3A90BBCB.E8977ACF@compuserve.com>
Newsgroups: sci.crypt
Lines: 18

"John A. Malley" wrote:
> Are there other books on the subject you'd recommend - especially any
> covering cryptanalysis with hidden markov models?

There is now a vast literature on technology and applications of
HMMs, but there are only a couple of articles I know of in the open
literature specifically addressing application to cryptanalysis, and
unfortunately I can't recommend them, because they don't show any
real advantage over other well-known approaches.  That's not to say
that there *aren't* good applications for the technology, just that
it is not evident from what has been published.

A couple of years ago I ran across an interesting application of HMMs
to the problem of "hands" in the Voynich manuscript, and managed to
get it declassified (except for one example and a few names); there
is a brief summary of its results in Jim Reed's Voynich bibliography
at URL http://www.research.att.com/~reeds/voynich/bib.html
Maybe some day I'll make a copy available on a Web site.

Terry Ritter, his current address, and his top page.

Last updated: 2001-06-27

Hidden Markov Models

Terry Ritter

A Ciphers By Ritter Page

Contents