Consistent Overhead Byte Stuffing
From Wikipedia, the free encyclopedia
Consistent Overhead Byte Stuffing (COBS) is an
algorithm for encoding data bytes that results in efficient, reliable, unambiguous
packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets.
Byte stuffing is a process that transforms a sequence of data bytes
that may contain 'illegal' or 'reserved' values into a potentially
longer sequence that contains no occurrences of those values. The extra
length of the transformed sequence is typically referred to as the
overhead of the algorithm. The COBS algorithm tightly bounds the worst
case overhead, limiting it to no more than one byte in 254. The
algorithm is computationally inexpensive and its average overhead is low
compared to other unambiguous framing algorithms.
[1]
Packet framing and stuffing
When packet data is sent over any serial medium, a
protocol
is needed by which to demarcate packet boundaries. This is done by
using a special bit-sequence or character value to indicate where the
boundaries between packets fall. Data stuffing is the process that
transforms the packet data before transmission to eliminate any
accidental occurrences of that special framing marker, so that when the
receiver detects the marker, it knows, without any ambiguity, that it
does indeed indicate a boundary between packets.
COBS takes an input consisting of bytes in the range [0,255] and
produces an output consisting of bytes only in the range [1,255]. Having
eliminated all zero bytes from the data, a zero byte can now be used
unambiguously to mark boundaries between packets. This allows the
receiver to synchronize reliably with the beginning of the next packet,
even after an error. It also allows new listeners, which might join a
broadcast stream at any time, to reliably detect the beginning of the
first complete packet in the received byte stream.
With COBS, all packets up to 254 bytes in length are encoded with an
overhead of exactly one byte. For packets over 254 bytes in length the
overhead is at most one byte for every 254 bytes of packet data. The
maximum overhead is therefore roughly 0.4% of the packet size, rounded
up to a whole number of bytes. COBS encoding has low overhead (on
average 0.23% of the packet size, rounded up to a whole number of bytes)
and furthermore, for packets of any given length, the amount of
overhead is virtually constant, regardless of the packet contents.
Zero Pair Elimination
An optimization that can reduce overhead for common payloads which
contain pairs of zero bytes is to reduce the maximum encodable sequence
length, freeing some codes to encode sequences terminated by pairs of
zeros. In this case, bytes in the range [1,223] have the same meaning in
as in the normal mode, the code 224 is used to encode a sequence of 223
bytes with no zero termination, and the remaining codes [225,255]
encode sequences of length [1,30] terminated by a pair of zero bytes.
This variation can achieve negative overhead (compression) for some
sequences however it does complicate the en/decoding process.
Packet format
COBS encodes the input data as a series of variable length blocks.
Each block, which may contain from 1 to 255 bytes, begins with a single
byte that specifies the number of bytes in the block (including the
length byte).
When decoding, a zero byte is appended to the decoded output after
each block. As a special case, no zero is added after a block which
begins with 0xFF.
Example encodings (block contents marked up in bold):
|
Plaintext |
Encoded with COBS |
1. |
0x00 |
0x01 0x01 |
2. |
0x11 0x22 0x00 0x33 |
0x03 0x11 0x22 0x02 0x33 |
3. |
0x11 0x00 0x00 0x00 |
0x02 0x11 0x01 0x01 0x01 |
4. |
0x01 0x02 ... 0xFF |
0xFF 0x01 0x02 ... 0xFE 0x02 0xFF |
There exists one complication in this format: in the case 2. above,
an extra 0x00 appears at the end of the decoded output. The only way to
encode a block that does not end in zero is for it to have 254 bytes of
contents, but the last block may be shorter than that. To solve this
issue, a single trailing zero, if present, is removed by the decoder. If
the real plaintext ends in a zero, an additional zero is added after
it.
Implementation
/*
* StuffData byte stuffs "length" bytes of
* data at the location pointed to by "ptr",
* writing the output to the location pointed
* to by "dst".
*/
#define FinishBlock(X) (*code_ptr = (X), code_ptr = dst++, code = 0x01)
void StuffData(const unsigned char *ptr,
unsigned long length, unsigned char *dst)
{
const unsigned char *end = ptr + length;
unsigned char *code_ptr = dst++;
unsigned char code = 0x01;
while (ptr < end)
{
if (*ptr == 0)
FinishBlock(code);
else
{
*dst++ = *ptr;
code++;
if (code == 0xFF)
FinishBlock(code);
}
ptr++;
}
FinishBlock(code);
}
/*
* UnStuffData decodes "length" bytes of
* data at the location pointed to by "ptr",
* writing the output to the location pointed
* to by "dst".
*/
void UnStuffData(const unsigned char *ptr,
unsigned long length, unsigned char *dst)
{
const unsigned char *end = ptr + length;
while (ptr < end)
{
int i, code = *ptr++;
for (i=1; i<code; i++)
*dst++ = *ptr++;
if (code < 0xFF)
*dst++ = 0;
}
}
References
^ Cheshire, Stuart; Baker, Mary. "Consistent Overhead Byte Stuffing". ACM. Retrieved November 23, 2010.