The DAGGER API and Usage

An Easy-To-Use Cipher Engine

Terry Ritter

DAGGER is a set of ciphering routines which add serious information security to a larger overall program. DAGGER is available in "portable C" and 80x86 assembly language.

To use DAGGER, the programmer must first allocate a 13K structure called a DRT (Dagger Record Type). The programmer then calls the DAGGER SetUp routine which initializes the DRT from a key.

After SetUp, the programmer calls the DAGGER Encipher and Decipher routines which each use a DRT to cipher an arbitrary-size memory buffer "in place." That's it!

DAGGER can be used for LAN data, disk ciphering, or other applications. With multiple DRT's, DAGGER can easily interleave operations on different keys.

Note that DAGGER is not appropriate for every possible ciphering situation. In particular, DAGGER is vulnerable to a substantial defined-plaintext attack and should not be used in ways which support such attack. DAGGER is appropriate for local storage protection, and for end-to-end communications ciphering. DAGGER is less appropriate for network hops which carry messages from many users over a long time without changing keys.

Introduction
The API
Portable C
80x86 Assembly Language
Usage

Introduction to DAGGER

DAGGER is a fast, reasonably-secure software cipher. Because all software processing takes time, there is inevitably a tradeoff between strength and speed in a software cipher. In DAGGER we have an especially good tradeoff, because we can use our own patented Dynamic Substitution technology.

As with any good cipher, the easiest ways to expose DAGGER-protected information probably would not include trying to "break" the cipher. Instead, it is important to consider how plaintext documents are developed, kept, displayed and disposed, who has access to secure data storage, and whether or not workers can be bribed into revealing information. There are probably many other avenues to any protected information which would be far simpler than trying to "break" the cipher.

How is DAGGER Different?

DAGGER is easy to use! In particular, DAGGER is designed to not impose upon a surrounding system. Existing software systems generally do not have to be re-designed to use DAGGER. Thus, DAGGER holds out the novel promise of "drop in" data security.

DAGGER is extremely general in accepting virtually any sort of key, and any amount of data for ciphering. DAGGER ciphers the data "in place," and never needs even one additional byte of storage for the result. DAGGER allows data "frames" to be deciphered in random order, and yet still protects against "known plaintext" attacks.

The API

The DAGGER API consists of one data structure type called DRT and exactly three routines: SetUp, Encipher and Decipher. That's it!

The DRT Structure

The DRT (Dagger Record Type) structure takes about 13KB and holds the various tables and values used in operation. It is allocated by the programmer. In most cases, the programmer will not need to understand the DRT structure at all, and will just allocate storage for it. However, the structure is discussed in detail at the end of this section.

The SetUp Function

The SetUp call initializes a DRT structure from a User Key. The User Key may be any contiguous sequence of bytes; it could be an ASCII text phrase or a sequence of random values. The C prototype looks like this:

     void
     DaggerSetUp( DRT *DR,
                  BYTE key[], WORD keybyct, BYTE StrengthBytes );

Mainly, DaggerSetUp takes a pointer to a DRT structure, and a pointer to a contiguous key, plus a key length. Then it fills the DRT.

The StrengthBytes value limits the strength of the DAGGER cipher; a value of 5 would restrict the cipher to 40 bits of randomized key. When DAGGER is used internationally, using a StrengthBytes value higher than allowed by regulation could have serious repercussions.

The Encipher Function

The encipher call uses an initialized DRT to encipher a buffer. The buffer may be any contiguous sequence of bytes, and the buffer is ciphered in place: cipher bytes replace the original bytes. Here is the C prototype:

     void
     DaggerDrtEnc( BYTE DatBuf[], WORD byct, DRT *DR, BOOL Restart );

When TRUE, the Restart value causes the cipher to "start over"; this is the normal situation when data frames may be deciphered out of sequence or at random. By starting over each time, each frame is enciphered independently.

When the Restart value is FALSE, the cipher does not "start over," but simply continues from the previous enciphering. This could be used for enciphering whole files, when the data are known to have a fixed sequence. It could not be used for low-level LAN ciphering, because error-correction may cause a frame to be re-sent out of sequence.

The Decipher Function

The decipher call looks just like encipher, and uses an initialized DRT to decipher a buffer. The buffer may be any contiguous sequence of bytes, and the buffer is ciphered in place: cipher bytes replace the original bytes. Here is the C prototype:

     void
     DaggerDrtDec( BYTE DatBuf[], WORD byct, DRT *DR, BOOL Restart );

When TRUE, the Restart value causes the cipher to "start over"; this is the normal situation when data frames may be deciphered out of sequence or at random. Any data errors in the ciphertext which occur during transmission or storage will be confined to a single data frame.

When the Restart value is FALSE, the cipher does not "start over," but simply continues from the previous deciphering. Any data-transmission errors which occur will be compounded and expanded in subsequent data frames.

DRT Structure Details

Here are the DRT (Dagger Record Type) definitions:

     #define SUBBOXLAST   255
     #define DRTPRELAST  1023
     #define DRTRANDLAST 4095
     #define DRTSHUFLAST 1023

     typedef struct {
           BYTE  FwdOrg[ SUBBOXLAST+1 ], FwdUse[ SUBBOXLAST+1 ];
           BYTE  InvOrg[ SUBBOXLAST+1 ], InvUse[ SUBBOXLAST+1 ];
           WORD  PreAR[ PREARLAST+1 ];
           WORD  RandAR[ RANDARLAST+1 ];
           WORD  ShufAR[ SHUFARLAST+1 ];
           WORD  RNGencCur, RNGdecCur, RNGorg, Dummy;
     } DRT;

SUBBOXLAST defines the last element in a 256 element "SUBBOX" array. This is a "substitution box," an array dedicated to substitution transformations. The values in the array are some permutation of the index values 0..255.

DRTPRELAST defines the last element in the "PRE" area of the DRT structure.

DRTRANDLAST defines the last element in the "RAND" area of the DRT structure.

DRTSHUFLAST defines the last element in the "SHUF" area of the DRT structure.

FwdOrg is a subbox containing the initial Dynamic Substitution state. This state is created by shuffling a "standard" ordering (0,1,2,...,255) using the values from the SHUF area. The FwdOrg state is retained unchanged so that the cipher may later "restart" in exactly the same state. The FwdOrg area is originally used to hold the hashed and strength-limited key values during key expansion; initializing FwdOrg automatically destroys those values.

FwdUse is a subbox which holds the dynamically-changing cipher state. FwdUse is initialized by copying from FwdOrg on each restart.

InvOrg is a subbox containing the inverse of FwdOrg. It is also retained unchanged.

InvUse is a subbox which holds the dynamically-changing inverse cipher state. InvUse is initialized by copying from InvOrg on each restart.

PreAR is a 2KB (1k word) array which provides the necessary room to build up from deg 9 and deg 89 to the deg 1279 expansion polynomial.

RandAR is a 8KB (4k word) array is used to store the "random" values used by the cipher. All values in this area were originally created by the deg 1279 expansion polynomial and were subsequently non-linearized by the jitterize mechanism.

ShufAR is a 2KB (1k word) array which provides room for the jitterize mechanism to delete about 10% of the traversed data, and yet leave the RandAR area fully processed. In addition, the values at the start of ShufAR are used to shuffle the original forward substitution box which is the initial Dynamic Substitution state. The entire process of expansion, nonlinearization, and shuffling are necessary in order to generate the cipher state from a key.

RNGencCur is the current encipher index into RandAR, and is set from RNGorg on Restart. If set externally, RNGencCur must be masked by an AND with RANDARLAST to keep the value inside RandAR.

RNGdecCur is the current decipher index into RandAR, and is set from RNGorg on Restart. If set externally, RNGdecCur must be masked by an AND with RANDARLAST to keep the value inside RandAR. Having separate encipher and decipher indexes allows the same DRT structure to be used both for enciphering and deciphering, which is useful for data storage ciphering.

RNGorg is the starting value for the RandAR indexes. If set externally, RNGorg must be masked by an AND with RANDARLAST to keep the value inside RandAR. RNGencCur and RNGdecCur are copied from RNGorg on Restart. RNGorg could be used as a small but efficient message key, for message key values somehow made available by the rest of the system.

Dummy is used to round out the structure to an even length in four-byte elements.

Portable C

The portable C version is intended to function the same -- that is, encipher the same data in the same way, given the same key -- on any machine having a C compiler. These good intentions may be almost impossible to meet fully, but the portable version should do fairly well.

Integer Size

The usual problems with different size integers on different machines are localized by defining appropriate storage types. In this case, BYTE, WORD and LWD are intended to represent 8-bit, 16-bit, and 32-bit unsigned values respectively; these can be re-defined for particular machines and compilers as appropriate. In general, larger storage elements could be used, except that we expect to be able to index the User Key, data, and substitution tables as BYTE arrays and the first two of these are likely to be defined outside the cipher system. All computations are allowed to overflow. No signed values are needed or used.

Byte Ordering

Another common portability problem is byte-ordering, and this might seem to be a particular problem for DAGGER, because the first phase of the set-up algorithm fills the DRT structure with WORD values, and a subsequent phase uses BYTE values from the DRT to permute a translation table. However, the BYTE interpretation of this storage is made portable by reading only WORD values, and then explicitly selecting the high or low BYTE of those values using the least-significant-bit of a count value.

The second phase of setup creates the BYTE-array translation tables on top of the WORD-array storage used during key extension. This also is not a byte-ordering problem because the BYTE tables are not set up until after that area ceases to be used for WORD access, and they are not used for BYTE access until after they have initialized as arrays of BYTE values.

It might be possible to define the DRT structure as a union, but since the first phase of set up simply uses the structure as a contiguous WORD array, it is hard to imagine how a union could possibly be clearer than simply casting the structure pointer to a WORD pointer.

Storage within C Structures Not Necessarily Contiguous

C structure elements are not guaranteed to be contiguous, which theoretically might result in some implementations having unused storage between structure elements. This could be a problem because it would appear as extra storage when considered as an array of words. This would result in a shifted sequence in the RNG table, which would certainly cipher differently. Defining a union would not change the problem.

However, since the early and intermediate structures all have a power-of-two size larger than any reasonable register, it is unlikely that any C implementation would waste storage between them. The few lone WORD values which reside in the structure are carefully placed at the far end, where any extra storage should have no effect on the algorithm.

Optimization

Because the Portable C version is "portable," it is also non-optimal. The portable version can be modified to optimize a particular environment; while the resulting version may not be portable, that may not matter. The normal way to build efficient versions on new architectures is first to get the portable version working under a small driver. Then make small changes in (a copy of!) the portable program, each time measuring speed and checking that it will encipher to and decipher from the original version.

It is difficult to say what optimization would be most important, because that depends upon the normal usage of the system. At first glance one might think to run through the set-up process and especially reduce the inefficient portable access to byte values within words, but if set-up is infrequent, making SetUp efficient is almost irrelevant.

It will generally be worthwhile to improve the ciphering inner loops as much as possible. But these loops are already fairly tight C, so it is not clear how much more can be done. For example, the data-array access is already done by moving pointer, instead of individual indexings, and the Dynamic Substitution appears to require explicit indexings, and so cannot be improved the same way. Optimizing the HIBYTE and LOBYTE macros for a particular architecture should yield something, but these are only used once each in the inner loops. Other than using a smart compiler, it is not clear that much more can be done at C level (I would love to hear about any improvements which may be found). Assembly-language could help a lot, of course.

One possible speed optimization in C would be to introduce new local pointers to the DRT elements used in the inner loops. It may well be that some compilers will have trouble recognizing the indexing notation as a single instruction, and by pre-computing the start of each element, simple and fast indexing may be possible even on less-than-optimal compilers.

It would be possible to make separate ciphering routines for each statically-allocated structure, thus allowing direct memory access to the structure instead of indirect access through a pointer. This might (conceivably) save up to 30% in execution time, but would also be dangerously inflexible.

80x86 Assembly Language

The assembly-language version uses the same interface as the portable C version, but is considerably faster. Upon assembly, the resulting .OBJ file can be imported into C for use.

The 80x86 assembly-language version is 16-bit code, generally optimized for 80486 operation. Surprisingly, most 486 optimizations remain compatible with the 8088. The 486 optimizations simply tend to use different instruction sequences than one would have preferred to use on the earlier processors.

The same code is easily expanded to use 32-bit addresses and pointers, and should require little more than using expanded register names. The cipher algorithm proper does not need, and is not particularly helped by, 32-bit operations. There would be some advantage in being able to use 32-bit pointer offsets, thus possibly avoiding the need to save and change segments, and avoiding also the need for a segment-override for each data access. Speed might increase somewhat.

Machine Addressing

Intel base-plus-constant offset-plus-register offset addressing fits this application well. In general, we have a pointer to a DRT structure (base), then must select a particular array from the DRT (constant offset), and index into that array (register offset). This can all be done in one instruction, and, on the 486, that instruction takes just two cycles. But this code will function even on a lowly 8088.

It could be a little faster to pre-compute the starting addresses of the necessary tables and save each address in a local. This would support single-cycle 486 access (and would still be 8088 compatible), provided that the necessary addresses were already in the registers. But the pre-compute and each local access to get the addresses into registers would be overhead.

Extra Register Storage

Limited register storage is a perennial problem for assembly-language programming. Because the 486 has 32-bit registers and the algorithm needs only 16-bit words, we might seek to use the hidden higher word for temporary storage. The 486 has a BSWAP instruction which will rotate the current word into the higher word and vice versa.

Unfortunately, even the one cycle BSWAP instructions quickly become significant overhead in an inner loop, and it is confusing to use them this way. In the end, it was possible to avoid using BSWAP in return for a single memory decrement of a local per loop (for decipher only), at a presumed extra cost of one 486 cycle. This also retained 8088 compatibility.

High Level Interface

The classic style of assembly-language programming passes values in registers. While this seems both simple and obvious, it turns out to be not quite as simple as it looks, since the passed values may have to be stored on the stack anyway, to free up register resources for use in the target routine. Ideally, each called routine would protect (that is, stack) all used registers, but this would be almost as expensive as a full-blown HLL procedure-call protocol.

It is just barely possible to pass values in registers with the final DAGGER API (although some internal routines could have problems). But passing values in registers is awkward for calling assembly-language from a high-level-language, since there would have to be some intermediate routine to grab the values from the HLL call stack and set up the registers for the assembly-language. Instead of duplicating the interface in this way, it was decided to use the HLL interface at the assembly-language level.

Another issue is the particular HLL interface, the parameter-passage sequence and responsibility for removing parameters from the stack. Frankly, the Pascal conventions are both smaller and faster (if only just a little). However, the assembly-language was set up to use either Pascal or C protocols, and both were linked into a C driver to make sure this would work. It did.

Usage

The DAGGER API is easy to use: First, allocate storage for a DRT structure, either static or dynamic. Next, use SetUp to process a User Key and set up the DRT structure. Then, use Encipher or Decipher to cipher data in place, using a particular DRT. That's about it.

Multiple Key's Mean Multiple DRT's

Clearly, the DRT structure holds the "state" of the cipher; multiple DRT's can hold multiple such states. Obviously, the same code can cipher between multiple targets (with a different for each), simply by using different DRT's. Alternately, it would be easy to use two or more DRT structures and multiple-cipher the same data; this could produce the effect of a substantially stronger cipher. Another option would be to convert from one key to another by first deciphering under one DRT and then enciphering under another.

Encipher/Decipher in a Single DRT

Both Encipher and Decipher can share the same DRT, provided they use the same key (of course), and provided that they handle Restart properly. When both ciphering modes share one DRT, they also share a single FwdUse subbox; Restart must be asserted after changing between Encipher and Decipher (to re-init FwdUse). Thus, it would not be possible to share a DRT if enciphering and deciphering operations were intermixed and Restart not used. For most expected applications, Restart will be used on each frame, so there is no problem sharing a DRT. (The alternative is simply to use a separate DRT.)

In data storage applications, just a single key is commonly used, so using a single DRT is just fine. On the other hand, in LAN applications, it is better practice to use a different key in each direction than to try and share a DRT. We want to minimize the amount of data ciphered under any one key, and one way to help do this is to use a separate key each way.

Good Keys are Important

The only unique thing DAGGER really needs is a key. And, since DAGGER processes each key with multiple CRC polynomials, we do not care whether or not the key is ASCII text or random data.

Cryptographically speaking, any key must contain sufficient "uncertainty" to make a "brute force" attack unreasonable. In general, each key should contain 80 bits of "uniqueness" and "uncertainty." It is impossible to have a short secure key, and this is true for any cipher.

If the key is ASCII text, each one should be at least 25 or 30 characters long and preferably more; each should be a "phrase" instead of a "password." Each ASCII key should contain some numbers and/or special characters, and should have peculiar punctuation, spacing and/or capitalization, to make the phrase different from anything ever printed on paper.

If the key is "random" binary, it should be at least as long as the maximum StrengthBytes value ever to be used. It would help if the key were created by some process which could be considered "really" random. We want to avoid simply issuing keys as sequential values from some random number generator, unless that data is itself first enciphered. We might consider accumulating deviations in LAN usage in a significant amount of CRC state for use as key values.

When ciphering LAN data (or any form of data communications), using the same key for transmissions in both directions is unnecessarily risky. It is recommended that a different key (a different DRT) be used to cipher in each direction

Key Distribution

DAGGER is a "conventional" (or "one-key," or "symmetric," or "secret key") cipher; that is, enciphering and deciphering are both performed with exactly the same key. This is ideal for protecting most forms of stored data, since the original user does not have to give the key to anyone else.

A "secret key" cipher is also ideal for use on a LAN where there is a "key server" and each station has its own secret key to the server node. When two stations wish to communicate, they request a common key from the server, which sends a temporary key to each station, each enciphered in the correct station key.

However, when cryptography is used for data communications, key delivery can be awkward, because each and every key must be somehow delivered to the far end (and retained) in absolute secrecy. On the other hand, this requirement also assures that each end gets exactly the same key, something which simple "public key" systems rarely guarantee.

So-called "public key" (or "two key," or "asymmetric") technology can simplify remote key delivery. But, because public-key ciphering is extremely slow, it is only rarely used to encipher data; instead, public-key technology is normally just used to deliver keys for a conventional cipher. DAGGER could be the conventional cipher.

In-The-Field Automatic Strength Upgrade

As time passes and computers become faster, a "brute force" attack on a limited-size key will become practical for increasing numbers of organizations. Presumably the government will recognize the problem and, at some point, increase the size of keys authorized for international ciphering.

In LAN or WAN usage, it is always necessary to transport session keys for the data cipher. We can prepare for the future by normally having larger than necessary keys and also transporting the strength value. The strength value will limit the keys to the acceptable strength level. But when the day comes to upgrade the system, it will only be necessary to have the key server send a larger strength value to have all nodes automatically upgrade their cipher strength. This would be an in-the-field automatic strength upgrade without changing the fielded code at all.

We Can Avoid Restart (Given Sequential Ciphering Without Errors)

During ciphering, the Dynamic Substitution "state" or "permutation" changes; this is what makes this combiner hard to attack. Normally, Restart will be set TRUE, and the system will start over again for each and every data frame. But, if we have an installation which has the right conditions, we can avoid starting over every time. We can avoid doing a Restart on every data frame if:

we know that we will always decipher data frames in exactly the same sequence in which they were enciphered, and
we are relatively sure there will be no data errors. (A data error in one frame will be retained and will propagate through any remaining data frames until the next Restart.)

The first consideration eliminates storage systems which need random access to data.

The second consideration eliminates most low-level data communications, except those which will detect errors and re-synchronize (and can issue a Restart on that condition). Note that any data frames in transit could be lost unnecessarily in this situation. Normally, a low-level communications protocol will only issue "resend" requests for particular bad frames, and will not restart the communication stream after every single error. An error-correcting modem might do this, however.

On the other hand, the Restart parameter allows systems with small data buffers to decipher data which were enciphered in a single huge buffer -- provided the deciphering end can detect the the start of the larger enciphering buffer.

We Can Use Message Keys (If Data Expansion is Acceptable)

For applications which could use more security, and which can accept data expansion, it is possible to use message keys. Basically, message keys are just random values which are enciphered before the data, and which are necessary to decipher the data proper. If the message keys are really random, there can be no statistical or known-plaintext attack on their enciphering.

A short but sweet form of message key can be implemented in DAGGER by explicitly setting the RNGencCur and RNGdecCur values in the DRT structure. The message key would be a one-word random value at some known location in the data frame (probably the start, to accommodate variable-length data frames). The sending end would encipher this value using some base key, and then would use the (unenciphered) message key word (which should be masked by AND with DRTRANDLAST to keep the value inside RandAR) to explicitly set RNGencCur before enciphering the rest of the frame. The deciphering end would decipher the message-key word with the base key, use it to set the deciphering RNGdecCur value (again masking first by AND with DRTRANDLAST), then decipher the rest of the frame. This would provide a 12-bit message key, which is not great, but would be fairly fast, and would seem to be a very serious obstacle to any "chosen plaintext" attack.

A more effective but certainly more imposing form of message key can be implemented by sending a larger amount of random data under the base key. Ideally, we might send some unique permutation of the 256 byte values. By enciphering this message key data before the real data, we randomize the state of the Dynamic Substitution table before the real data are encountered. Note that this can be a substantial data expansion.

Yet another form of message key would simply append message key material to the base key, and then run SetUp on that resulting key. This will accommodate message keys of arbitrary size, but will require a fairly-lengthy initialization on each message key.

All forms of message key can be used simultaneously.

Terry Ritter, his current address, and his top page.

Last updated: 1996-05-31