87 lines
4.4 KiB
Markdown
87 lines
4.4 KiB
Markdown
# crc32
|
|
CRC32 hash with x64 optimizations
|
|
|
|
This package is a drop-in replacement for the standard library `hash/crc32` package, that features SSE 4.2 optimizations on x64 platforms, for a 10x speedup.
|
|
|
|
[![Build Status](https://travis-ci.org/klauspost/crc32.svg?branch=master)](https://travis-ci.org/klauspost/crc32)
|
|
|
|
# usage
|
|
|
|
Install using `go get github.com/klauspost/crc32`. This library is based on Go 1.5 code and requires Go 1.3 or newer.
|
|
|
|
Replace `import "hash/crc32"` with `import "github.com/klauspost/crc32"` and you are good to go.
|
|
|
|
# changes
|
|
* Oct 20, 2016: Changes have been merged to upstream Go. Package updated to match.
|
|
* Dec 4, 2015: Uses the "slice-by-8" trick more extensively, which gives a 1.5 to 2.5x speedup if assembler is unavailable.
|
|
|
|
|
|
# performance
|
|
|
|
For *Go 1.7* performance is equivalent to the standard library. So if you use this package for Go 1.7 you can switch back.
|
|
|
|
|
|
For IEEE tables (the most common), there is approximately a factor 10 speedup with "CLMUL" (Carryless multiplication) instruction:
|
|
```
|
|
benchmark old ns/op new ns/op delta
|
|
BenchmarkCrc32KB 99955 10258 -89.74%
|
|
|
|
benchmark old MB/s new MB/s speedup
|
|
BenchmarkCrc32KB 327.83 3194.20 9.74x
|
|
```
|
|
|
|
For other tables and "CLMUL" capable machines the performance is the same as the standard library.
|
|
|
|
Here are some detailed benchmarks, comparing to go 1.5 standard library with and without assembler enabled.
|
|
|
|
```
|
|
Std: Standard Go 1.5 library
|
|
Crc: Indicates IEEE type CRC.
|
|
40B: Size of each slice encoded.
|
|
NoAsm: Assembler was disabled (ie. not an AMD64 or SSE 4.2+ capable machine).
|
|
Castagnoli: Castagnoli CRC type.
|
|
|
|
BenchmarkStdCrc40B-4 10000000 158 ns/op 252.88 MB/s
|
|
BenchmarkCrc40BNoAsm-4 20000000 105 ns/op 377.38 MB/s (slice8)
|
|
BenchmarkCrc40B-4 20000000 105 ns/op 378.77 MB/s (slice8)
|
|
|
|
BenchmarkStdCrc1KB-4 500000 3604 ns/op 284.10 MB/s
|
|
BenchmarkCrc1KBNoAsm-4 1000000 1463 ns/op 699.79 MB/s (slice8)
|
|
BenchmarkCrc1KB-4 3000000 396 ns/op 2583.69 MB/s (asm)
|
|
|
|
BenchmarkStdCrc8KB-4 200000 11417 ns/op 717.48 MB/s (slice8)
|
|
BenchmarkCrc8KBNoAsm-4 200000 11317 ns/op 723.85 MB/s (slice8)
|
|
BenchmarkCrc8KB-4 500000 2919 ns/op 2805.73 MB/s (asm)
|
|
|
|
BenchmarkStdCrc32KB-4 30000 45749 ns/op 716.24 MB/s (slice8)
|
|
BenchmarkCrc32KBNoAsm-4 30000 45109 ns/op 726.42 MB/s (slice8)
|
|
BenchmarkCrc32KB-4 100000 11497 ns/op 2850.09 MB/s (asm)
|
|
|
|
BenchmarkStdNoAsmCastagnol40B-4 10000000 161 ns/op 246.94 MB/s
|
|
BenchmarkStdCastagnoli40B-4 50000000 28.4 ns/op 1410.69 MB/s (asm)
|
|
BenchmarkCastagnoli40BNoAsm-4 20000000 100 ns/op 398.01 MB/s (slice8)
|
|
BenchmarkCastagnoli40B-4 50000000 28.2 ns/op 1419.54 MB/s (asm)
|
|
|
|
BenchmarkStdNoAsmCastagnoli1KB-4 500000 3622 ns/op 282.67 MB/s
|
|
BenchmarkStdCastagnoli1KB-4 10000000 144 ns/op 7099.78 MB/s (asm)
|
|
BenchmarkCastagnoli1KBNoAsm-4 1000000 1475 ns/op 694.14 MB/s (slice8)
|
|
BenchmarkCastagnoli1KB-4 10000000 146 ns/op 6993.35 MB/s (asm)
|
|
|
|
BenchmarkStdNoAsmCastagnoli8KB-4 50000 28781 ns/op 284.63 MB/s
|
|
BenchmarkStdCastagnoli8KB-4 1000000 1029 ns/op 7957.89 MB/s (asm)
|
|
BenchmarkCastagnoli8KBNoAsm-4 200000 11410 ns/op 717.94 MB/s (slice8)
|
|
BenchmarkCastagnoli8KB-4 1000000 1000 ns/op 8188.71 MB/s (asm)
|
|
|
|
BenchmarkStdNoAsmCastagnoli32KB-4 10000 115426 ns/op 283.89 MB/s
|
|
BenchmarkStdCastagnoli32KB-4 300000 4065 ns/op 8059.13 MB/s (asm)
|
|
BenchmarkCastagnoli32KBNoAsm-4 30000 45171 ns/op 725.41 MB/s (slice8)
|
|
BenchmarkCastagnoli32KB-4 500000 4077 ns/op 8035.89 MB/s (asm)
|
|
```
|
|
|
|
The IEEE assembler optimizations has been submitted and will be part of the Go 1.6 standard library.
|
|
|
|
However, the improved use of slice-by-8 has not, but will probably be submitted for Go 1.7.
|
|
|
|
# license
|
|
|
|
Standard Go license. Changes are Copyright (c) 2015 Klaus Post under same conditions.
|