Cyclic Redundancy Check (CRC) is an algorithm that is used to detect errors in digital communication and storage systems. CRC comes with varying bit sizes that determine how long the checksum it produced. The commonly used CRC are 8-bit, 16-bit, 32-bit and 64-bit.
Calculation
Convert to Binary
Given we have a piece of data, whether it is a string or a JPG file, it needs to be converted into its binary representation respectively.
"hello world" ---> 01101000 01100101 ...
Image.jpg ---> 01101100 01100100 ...
Select a Divisor
Both the sender and receiver will need to mutually agree on the divisor (a.k.a. generator polynomial). Refer to Best CRC Polynomials to select the best divisor in relation to the checksum size.
For example, if 101101
(6-bit) is selected as the divisor, this is how it looks like in the mathematical representation.
Another example for 10011000
(8-bit) is as follows.
For CRC 32-bit, the divisor degree is up to 31.
Perform Division
Division can be performed in both binary form and mathematical form. Here is a Youtube tutorial that illustrates the division steps.
From the division, we'll get the quotient and the remainder. The remainder is the checksum and will always be 1 bit lesser than the divisor.
Append Remainder to Data
The last step is to append the remainder (checksum) to the original data.
For example,
1001 (data) ++ 101 (checksum) --> 1001101 (final)
The receiver side will interpret the data correctly with its checksum and perform the validation thenceforth.
Usages
Frame Check Sequence
In transmission protocol such as TCP, a Frame Check Sequence is a piece of data appended to the header for error detection on the receiver's end. In this case, it is the CRC32 checksum as shown below.
File Integrity Verification
CRC checksum is also provided sometimes when we are downloading files from the internet so that we can verify the integrity of the file once the file has been successfully downloaded.
File Compression
File compression algorithms on the other hand use CRC checksum to verify the file integrity by comparing the CRC checksum after compression.
Limitations
Despite being an extremely useful checksum algorithm, CRC also comes with its inherent weaknesses that makes it unsuitable for some occasion.
- Security: CRC is a non-cryptographic hash function which makes it a bad candidate when security is the utmost priority. It is not resistant to intentional tampering. Read more about cryptographic hash function and non-cryptographic hash function on this amazing blog post by Anderson Dadario.
- Collision: Since the checksum is just the remainder calculated by the algorithm, the probability of collision exists albeit very low.
- Error Detection Capability: Good for detecting single-bit error but not multi-bit errors especially in large datasets due to the collision nature of the algorithm mentioned above.
Examples
C#
In C#, Crc32 can be found in System.IO.Hashing namespace that needs to be installed from nuget.org separately.
using System.Text;
using System.IO.Hashing;
const string originalString = "hello world";
var encodedString = Encoding.UTF8.GetBytes(originalString);
var checksum = Crc32.Hash(encodedString);
Console.WriteLine(BitConverter.ToInt32(checksum)); // 222957957
Python
Crc32 hashing utility can be found in the built-in zlib
compression library.
import zlib
data = b'hello world'
checksum = zlib.crc32(data)
print(checksum) # 222957957