Lightweight Cryptography Primitives
Bugs in the reference implementations

In early 2020 as I worked on this library I found some minor bugs in the second round reference implementations of some NIST algorithms. The authors were notified so that the problems could be corrected in the official reference implementations.

This page documents the problems that I found, and the fixes, for the benefit of other implementors. This library implements the fixed version of the algorithms.

# COMET

In the proc_pt and proc_ct code, the first step is to XOR a domain separation value of 0x20 into the rolling Z state:

That will XOR the value into the first byte of the Z state. All other places in the code where domain separation is done XOR's the value into the last byte of the Z state. The above code should instead be:

This bug affects all four versions of COMET.

There was also a problem with carry propagation in the SPECK-64-128 implementation which caused the block cipher to generate different outputs from the standard version of SPECK-64-128. In the encryption phase, the following code appears:

//ct = (ROR(ct, 8) + ct) ^ RK[i]
for(u8 j=0; j<WSZ; j++){
ct_temp[WSZ+j] = (ct[WSZ+((j+1)%WSZ)] + ct[j]);
ct_temp[WSZ+j] += carry;
//set next carry
carry = (ct_temp[WSZ+j] < ct[WSZ+((j+1)%WSZ)]) || (ct_temp[WSZ+j] < ct[j]);
ct_temp[WSZ+j] ^= RK[i*WSZ+j];
}

The line that sets the next carry gives the wrong result if the incoming carry value was 1 and the bytes being added were both 0xFF. The following is the fixed version of "set next carry":

if (carry)
carry = (ct_temp[WSZ+j] <= ct[WSZ+((j+1)%WSZ)]) || (ct_temp[WSZ+j] <= ct[j]);
else
carry = (ct_temp[WSZ+j] < ct[WSZ+((j+1)%WSZ)]) || (ct_temp[WSZ+j] < ct[j]);

# ESTATE

The specification says that if both the associated data and the plaintext are zero-length, then the algorithm should encrypt the nonce with tweak = 8 and then return immediately. The reference code was missing a return statement from the mac() function:

...
if(adlen == 0 && ptlen == 0)
{
// generate tag when both ad and pt are empty
twks = 0x08;
return; // <-- MISSING LINE
}
// generate tag when ad and/or pt are non-empty
twks = 0x01;
...

The effect is to completely ignore the conditional code and to start every MAC computation by encrypting the nonce with tweak = 1.

# HYENA

As part of the block operation, HYENA XOR's a 64-bit delta value into the upper half of the 128-bit feedback state:

for(i=8; i<15 ;i++)
{
feedback[i] ^= Delta[i-8];
}

That 15 should be a 16. There are two places in the reference code where this problem occurs, in Feedback_TXT_Enc() and Feedback_TXT_Dec().

# ORANGE

The reference implementation of the PHOTON-256 permutation in ORANGE has some byte order issues in store32() and load32(). The store32() function uses little-endian byte order 0 1 2 3, but the byte order of load32() is mixed up: 2 1 0 3 instead of 0 1 2 3.

This asymmetry means that the reference implementation does not implement PHOTON-256 correctly. Correcting the byte order makes the ORANGE implementation of the permutation give the same results as the PHOTON-Beetle implementation of the permutation.

In the specification of the underlying block cipher, the key schedule "MixAndRotateRows" step contains the following requirement:

The second row R1, third row R2, and fourth row R3 are left-rotated of 8, 15, 18 positions.

The reference implementation has a macro called "left_rotate" to perform this step but it actually performs a right-rotation by n bits instead of a left-rotation:

#define left_rotate(row,n) \
row = (row >> n) | (row << (32-n));

It is debatable as to whether this is a bug or simply a consequence of the bit ordering (left most cells of the row positioned in the least significant bits). From a security perspective, the direction of rotation probably doesn't matter so this issue isn't serious.

The test vectors in the specification were also clearly generated with a right-rotating implementation of the key schedule. This library implements the "bug" as-is.

# SPIX

Section 2.2.6 Finalization of the SPIX specification says that the final tag should be extracted from bytes 7, ..., 15, 24, ..., 31 of the finalized state. That 7 should be an 8 instead. The reference implementation is correct.