Hidden Markov Models


Terry Ritter


A Ciphers By Ritter Page


It appears that we do not know very much about Hidden Markov Models (HMM's).


Contents


Subject: Hidden Markov Models on web site! Date: Sun, 20 Aug 2000 21:47:56 GMT From: jsavard@fNrOeSePnAeMt.edmonton.ab.ca (John Savard) Message-ID: <39a050fc.14626826@news.ecn.ab.ca> Newsgroups: sci.crypt Lines: 17 Having heard from some posts that the concept of "Hidden Markov Models" is relevant to cryptanalysis, I did some web searching, and gained enough understanding to put a partial introduction on my web page at http://home.ecn.ab.ca/~jsavard/co041003.htm This introduction only goes up to the Viterbi algorithm; I did not understand such matters as Baum-Welch estimation well enough to attempt to describe them, but I did note the name to assist those seeking to do further reading. Also, since Kullback-Leibler distance (between probability distributions - related to Friedman's chi test) was mentioned, I checked, and yes, that was Solomon Kullback, among those who broke PURPLE. John Savard http://home.ecn.ab.ca/~jsavard/crypto.htm
Subject: Re: Hidden Markov Models on web site! Date: Mon, 21 Aug 2000 14:22:20 GMT From: "Douglas A. Gwyn" <gwyn@arl.army.mil> Message-ID: <39A13B1C.3A367077@arl.army.mil> References: <39a050fc.14626826@news.ecn.ab.ca> Newsgroups: sci.crypt John Savard wrote: > ... I did not understand such matters as Baum-Welch estimation > well enough to attempt to describe them, ... Basically, all that matters (unless you have to implement it) is that it runs forward and backward through the observed output using the actual transitions to successively refine an initial estimate of the HMM parameters, and the algorithm has been proved to be stable and convergent. The result is a set of parameters for the HMM for which the observed output is at least as likely as for any other set of parameters. That's an example of Maximum Likelihood Estimation, which is widely used in data analysis. Maximum-Likelihood methods generally fit the observed data *too* tightly, but it's a reasonable criterion that is mathematically tractable, and in practice it often produces good results.
Subject: Re: Hidden Markov Models on web site! Date: 22 Aug 2000 16:57:34 +0100 From: Gunnar Evermann <ge204@eng.cam.ac.uk> Message-ID: <mqq8ztp2sr5.fsf@eng.cam.ac.uk> References: <39A13B1C.3A367077@arl.army.mil> Newsgroups: sci.crypt Lines: 18 "Douglas A. Gwyn" <gwyn@arl.army.mil> writes: > John Savard wrote: > > ... I did not understand such matters as Baum-Welch estimation > > well enough to attempt to describe them, ... > > Basically, all that matters (unless you have to implement it) is > that it runs forward and backward through the observed output using > the actual transitions to successively refine an initial estimate of > the HMM parameters, and the algorithm has been proved to be stable > and convergent. The result is a set of parameters for the HMM for > which the observed output is at least as likely as for any other set > of parameters. Baum-Welch only finds a _local_ maximum, it is not guaranteed to find the global maximum. Gunnar
Subject: Re: Hidden Markov Models on web site! Date: Tue, 22 Aug 2000 17:47:39 GMT From: "Douglas A. Gwyn" <gwyn@arl.army.mil> Message-ID: <39A2BCBB.3519D9EF@arl.army.mil> References: <mqq8ztp2sr5.fsf@eng.cam.ac.uk> Newsgroups: sci.crypt Lines: 24 Gunnar Evermann wrote: > Baum-Welch only finds a _local_ maximum, > it is not guaranteed to find the global maximum. Yes; as I said, it refines an initial estimate. If the initial estimate is too far off, for certain kinds of system the convergence will indeed be to a local maximum-likelihood peak (over the parameter space) that is lower than the global maximum. In practice in cryptologic applications, a good initial estimate is often easy to devise, for example equiprobability. The end result produces categories in some order that might be different from what would have been obtained from a different initial estimate, but (usually) they are the same *categories*. E.g. (nouns,adjectives,verbs) as opposed to (verbs,nouns, adjectives), if the units are (thinly disguised) PT words, or (vowels,consonants) if the units are (thinly disguised) PT letters. (Actually, real linguistic categories are not quite that simple, but the general idea is right. There are also uses to unravel internal structure of algorithmic systems, but I don't know of any good examples I can present.)
Subject: Re: Hidden Markov Models on web site! Date: 22 Aug 2000 21:01:58 +0100 From: Gunnar Evermann <ge204@eng.cam.ac.uk> Message-ID: <mqqu2cd12vd.fsf@eng.cam.ac.uk> References: <39A2BCBB.3519D9EF@arl.army.mil> Newsgroups: sci.crypt Lines: 25 "Douglas A. Gwyn" <gwyn@arl.army.mil> writes: > Gunnar Evermann wrote: > > Baum-Welch only finds a _local_ maximum, it is not guaranteed to > > find the global maximum. > > Yes; as I said, it refines an initial estimate. If the initial > estimate is too far off, for certain kinds of system the convergence > will indeed be to a local maximum-likelihood peak (over the > parameter space) that is lower than the global maximum. > > In practice in cryptologic applications, a good initial estimate is > often easy to devise, for example equiprobability. The end result > produces categories in some order that might be different from what > would have been obtained from a different initial estimate, but > (usually) they are the same *categories*. Ok, fair enough. I was thinking of continuous HMMs as used in e.g. speech recognition. There the initialisation is more difficult and potentially more important than for the discrete case. -- Gunnar Evermann Speech, Vision & Robotics Group Cambridge University Engineering Department
Subject: Re: Hidden Markov Models on web site! Date: Tue, 22 Aug 2000 17:58:10 GMT From: "Douglas A. Gwyn" <gwyn@arl.army.mil> Message-ID: <39A2BF32.8FD94FD7@arl.army.mil> References: <39A2AAE0.EDE1B100@t-online.de> <39A13B1C.3A367077@arl.army.mil> Newsgroups: sci.crypt Lines: 24 Mok-Kong Shen wrote: > Can the HMM be used for predicting sequences? Sure. Baum-Welch-Eagon MLE can be thought of as "training" the model using that particular output string; then the model can be "run" to generate a synthetic output sequence with the same parameters that were fitted to the original actual data. If you preload the model state to match a particular (generally non-training) observed string, then when you run it it produces a MLE prediction for subsequent observations. > Are there any literatures on its applications in crypto? Well, there is considerable literature, but not much of it is available to the general public. There was one Master's thesis or Doctoral dissertation (I forget which) and as I recall a couple of articles in Cryptologia. (I'd have to rummage through my files to find specifics.) Also, I managed to get one of D'Imperio's NSATJ articles on HMM applied to analysis of the "hands" in the Voynich manuscript declassified two years ago; it is summarized on Jim Reed's Voynich page but so far as I know has not been made available on-line in its entirety.
Subject: Re: Hidden Markov Models on web site! Date: Wed, 23 Aug 2000 13:23:06 -0400 From: "Douglas A. Gwyn" <DAGwyn@null.net> Message-ID: <39A4087A.DFF32218@null.net> References: <39A2D7B8.58451824@t-online.de> <39A2BF32.8FD94FD7@arl.army.mil> Newsgroups: sci.crypt Lines: 68 Mok-Kong Shen wrote: > From what I read in this thread I got the (maybe wrong) > impression that the sequences dealt with are character > strings. Would HMM be powerful enough to deal with bit > strings (which are at a finer granularity) from some > not too bad PRNGs (to predict future output)? HMM is a general technique, not related to text characters except when you choose to apply it to them. The model is simply a finite number of (internal) states that have fixed state-transition probabilities (ordinary Markov model) and the additional feature that the only thing observable is output that depends probabilistically on the transition. (Each piece of output for a single transition can be *called* a "character" for convenience in discussion, but the only assumed property is that it is uniquely identifiable. The output "character set" might be the set {0,1}, for example.) So the output contains clues about what is going on "inside" the model, but the clues are indirect since the same output can result from many possible transitions. There are further clues about the state sequence contained in the *sequence* of outputs. The process of fitting the model to an observed output sequence amounts to estimating what sequence of states (thus transitions) would be the most likely way of producing that output. Of course, that guess could be wrong, but given enough output the trained model (specified by the parameters for the best fit) usually tends to reflect what is actually going on inside the (hidden) system. It's sort of like a fuzzy How successful one is at fitting an HMM is largely a function of how well the assumed number of states "resonates with" the internal system structure. You could try various assumptions and see if any produce good fits. Presumably there is an objective measure of goodness of fit of an HMM to the observations, although it isn't used directly in the usual fitting algorithm. Informally, if the transition and output probability matrices are quite far from random, there is a good fit. For example, a transition matrix A B C A 0.7 0.1 0.2 B 0.2 0.5 0.3 C 0.1 0.4 0.5 shows a fairly good categorization, in that the system tends to stay in the same state, and there are decided biases in transitions out of the state (e.g. C->A is 4 times less likely than C->B). Suppose the character set is {0,1} and the output probabilities are: out 0: A B C A 0.1 0.7 0.2 B 0.3 0.1 0.6 C 0.6 0.2 0.2 P(out 1) = 1 - P(out 0) That shows that most "1"s occur when the system remains in the same state (at this level of modeling), which is a potentially valuable clue about how the system works. Of course, the *actual* system structure would involve a lot more than 3 states; if we had picked the exact number *and* had enough data, all the matrix entries would be very close to 0 or 1 since the system is deterministic at that level. By using fewer states, we lose the details and end up with a probabilistic model, but some trends can still be seen, and these often indicate some large-scale aspects of how the system actually functions.
Subject: Re: Hidden Markov Models on web site! Date: 21 Aug 2000 15:45:01 -0700 From: jsavard@ecn.ab.ca () Message-ID: <39a1a2dd@ecn.ab.ca> References: <39A15873.6D0A79DE@t-online.de> <39a050fc.14626826@news.ecn.ab.ca> Newsgroups: sci.crypt Lines: 11 Mok-Kong Shen (mok-kong.shen@t-online.de) wrote: : John Savard wrote: : > http://home.ecn.ab.ca/~jsavard/co041003.htm : The page was not accessible. Could you please look : after the matter? Apologies: the URL should be http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm
Subject: Re: Hidden Markov Models on web site! Date: Mon, 21 Aug 2000 23:44:45 GMT From: Tim Tyler <tt@cryogen.com> Message-ID: <Fzo1yL.DEE@bath.ac.uk> References: <39A15873.6D0A79DE@t-online.de> <39a050fc.14626826@news.ecn.ab.ca> Newsgroups: sci.crypt Lines: 11 Mok-Kong Shen <mok-kong.shen@t-online.de> wrote: : John Savard wrote: :> http://home.ecn.ab.ca/~jsavard/co041003.htm : The page was not accessible. [...] Try instead: http://home.ecn.ab.ca/~jsavard/crypto/co041003.htm -- __________ Lotus Artificial Life http://alife.co.uk/ tt@cryogen.com |im |yler The Mandala Centre http://mandala.co.uk/ Namaste.
Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption" Date: Sun, 18 Feb 2001 22:23:07 -0800 From: "John A. Malley" <102667.2235@compuserve.com> Message-ID: <3A90BBCB.E8977ACF@compuserve.com> References: <3A90A968.4195911B@null.net> <slrn99116d.5df.sjmeyer@localhost.localdomain> Newsgroups: sci.crypt Lines: 26 "Douglas A. Gwyn" wrote: > [snip] > > What you mean is a PT source with very low entropy. Such a > source makes the cryptanalysis difficult for nearly any system, > not just the kind you describe. > > Note that genetic protein sequences have nonnegligible entropy. > Indeed, HMMs have been used to characterize (model) them. There's been a number of references to Hidden Markov Models and cryptanalysis in various threads over the past few months. I just found (yesterday) a good (IMO) introduction to statistical language learning, hidden markov models and probabilistic grammars - "Statistical Language Learning" by Eugene Charniak, MIT Press, 1993 - ISBN 0-262-53141-0 - at a Borders bookstore. Are there other books on the subject you'd recommend - especially any covering cryptanalysis with hidden markov models? Thanks for any pointers, John A. Malley 102667.2235@compuserve.com
Subject: Re: "RSA vs. One-time-pad" or "the perfect enryption" Date: Tue, 20 Feb 2001 04:27:34 GMT From: "Douglas A. Gwyn" <DAGwyn@null.net> Message-ID: <3A91F272.4F598F3C@null.net> References: <3A90BBCB.E8977ACF@compuserve.com> Newsgroups: sci.crypt Lines: 18 "John A. Malley" wrote: > Are there other books on the subject you'd recommend - especially any > covering cryptanalysis with hidden markov models? There is now a vast literature on technology and applications of HMMs, but there are only a couple of articles I know of in the open literature specifically addressing application to cryptanalysis, and unfortunately I can't recommend them, because they don't show any real advantage over other well-known approaches. That's not to say that there *aren't* good applications for the technology, just that it is not evident from what has been published. A couple of years ago I ran across an interesting application of HMMs to the problem of "hands" in the Voynich manuscript, and managed to get it declassified (except for one example and a few names); there is a brief summary of its results in Jim Reed's Voynich bibliography at URL http://www.research.att.com/~reeds/voynich/bib.html Maybe some day I'll make a copy available on a Web site.

Terry Ritter, his current address, and his top page.

Last updated: 2001-06-27