An information hiding technique based on ternary Hamming codes is presented below.
- Introduction
- Ternary Hamming codes
- Efficiency and distortion
- Example using Python
- Message encoding
- A complete Python implementation
- References
Introduction
In the article Binary Hamming Codes in Steganography
we have seen how to hide information with binary codes using operations
Bearing in mind that the operations we perform are of type
The theory behind ternary Hamming codes is the same as that of binary codes. The only difference is that instead of performing operations modulo
On the other hand, to hide information in a byte we will need to work with the value of the byte modulo
Ternary Hamming Codes
Let’s see an embedding example. We are going to use
Notice that instead of using
Suppose that after selecting a group of 13 bytes from the media in which we want to embed the message, and performing the modulo 3 operation, we obtain the following cover vector:
Recall that we also need a matrix that contains in its columns all possible combinations, except the vector of zeros. In this case, in addition, we will have to eliminate the linearly dependent vectors.
One option would be the following:
And finally, we also need the message that we want to hide. Let’s hide for example:
If we calculate the hidden message in our vector
Logically, it is not the one we want to hide. So we look for which column of M is responsible:
It is column 9 of matrix M. Therefore, to obtain the stego vectpr
It could be the case that, in matrix M, we do not find the column we are looking for, but we do find a linear combination. In this case we would add 1 (or subtract 2).
When the recipient of the message gets the stego vector from the media, they can extract the message by:
Efficiency and distortion
Instead of using
If we have used ternary codes for the insertion
In this case the process would be the same as before, changing the value of the module to
The same idea would work for other values of
In the following graph you can see a comparison of different n-ary codes. To calculate the payload we will use:
And to calculate the efficiency:
This allows us to draw a graph to see the efficiency with respect to the
payload for different values of
As can be seen in the graph, the higher
However, there is something that cannot be overlooked. The distortion introduced by binary codes is the same as that introduced by ternary codes, since in both cases we only perform
In the graph you can also see the theoretical limit for each type of code. As can be seen, these codes do not reach the theoretical capacity.
Example using Python
Let’s look at the Python code that allows us to perform these operations. First we need to prepare the array
M = np.array([
[1, 0, 0, 0, 1, 1, 1, 0, 2, 1, 2, 1, 1],
[0, 1, 0, 1, 0, 1, 1, 1, 0, 2, 1, 2, 1],
[0, 0, 1, 1, 1, 0, 1, 2, 1, 0, 1, 1, 2]
])
To embed the message
import numpy as np
def embed(M, c, m, n):
s = c.copy()
col_to_find = (M.dot(c)-m)%n
position = 0
for v in M.T:
if np.array_equal(v, col_to_find):
s[position] = (s[position] - 1)%n
break
elif np.array_equal((v*2)%n, col_to_find):
s[position] = (s[position] + 1)%n
break
position += 1
return s
To extract the embedded message, just perform the
def extract(M, s, n):
return M.dot(s)%n
Now let’s repeat the example using Python:
m = [2, 0, 2]
c = [0,1,0,0,2,1,2,2,2,0,1,0,2]
s = embed(M, c, m, 3)
new_m = extract(M, s, 3)
>>> new_m
array([2, 0, 2])
Message encoding
Everything seems to indicate that it is more appropriate to use a ternary code than a binary code, since it gives us a higher capacity for the same level of distortion. However, computers represent information in binary, so using a ternary code often requires extra work.
Specifically, we need to be able to go from binary to ternary, and vice versa. This functionality is provided by the function base_repr() of the Numpy library.
Let’s see an example. We will first convert from binary to a decimal number:
>>> binary_string = "1010101011111111101010101010010"
>>> num = int(binary_string, 2)
>>> num
1434441042
Next, we will represent this number in base 3:
>>> ternary_string = np.base_repr(num, base=3)
>>> ternary_string
'10200222011011012000'
To convert back to binary we will do it in a similar way:
>>> num = int(ternary_string, 3)
>>> binary_string = np.base_repr(num, base=2)
>>> binary_string
'1010101011111111101010101010010'
A complete Python implementation
A complete implementation, including encoding and decoding the message, before and after the embedding, is provided in GitHub.
Here is an example where we hide data in an image:
import imageio
cover = imageio.imread("image.png")
message = "Hello World".encode('utf8')
hc = HC3(3)
stego = cover.copy()
stego[:,:,0] = hc.embed(cover[:,:,0], message)
imageio.imsave("stego.png", stego)
stego = imageio.imread("stego.png")
extracted_message = hc.extract(stego[:,:,0])
print("Extracted message:", extracted_message.decode())
References
- Fridrich, J. (2009). Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press.