8.4 KiB
roaring
This is a go version of the Roaring bitmap data structure.
Roaring bitmaps are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Apache Druid (Incubating), LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, Pilosa, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin.
Roaring bitmaps are found to work well in many important applications:
Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)
The roaring
Go library is used by
- Cloud Torrent: a self-hosted remote torrent client
- runv: an Hypervisor-based runtime for the Open Containers Initiative
- InfluxDB
- Pilosa
- Bleve
This library is used in production in several systems, it is part of the Awesome Go collection.
There are also Java and C/C++ versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. We have a format specification.
This code is licensed under Apache License, Version 2.0 (ASL2.0).
Copyright 2016-... by the authors.
References
- Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
- Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 2016. http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
- Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549
Dependencies
Dependencies are fetched automatically by giving the -t
flag to go get
.
they include
- github.com/willf/bitset
- github.com/mschoch/smat
- github.com/glycerine/go-unsnap-stream
- github.com/philhofer/fwd
- github.com/jtolds/gls
Note that the smat library requires Go 1.6 or better.
Installation
- go get -t github.com/RoaringBitmap/roaring
Example
Here is a simplified but complete example:
package main
import (
"fmt"
"github.com/RoaringBitmap/roaring"
"bytes"
)
func main() {
// example inspired by https://github.com/fzandona/goroar
fmt.Println("==roaring==")
rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
fmt.Println(rb1.String())
rb2 := roaring.BitmapOf(3, 4, 1000)
fmt.Println(rb2.String())
rb3 := roaring.New()
fmt.Println(rb3.String())
fmt.Println("Cardinality: ", rb1.GetCardinality())
fmt.Println("Contains 3? ", rb1.Contains(3))
rb1.And(rb2)
rb3.Add(1)
rb3.Add(5)
rb3.Or(rb1)
// computes union of the three bitmaps in parallel using 4 workers
roaring.ParOr(4, rb1, rb2, rb3)
// computes intersection of the three bitmaps in parallel using 4 workers
roaring.ParAnd(4, rb1, rb2, rb3)
// prints 1, 3, 4, 5, 1000
i := rb3.Iterator()
for i.HasNext() {
fmt.Println(i.Next())
}
fmt.Println()
// next we include an example of serialization
buf := new(bytes.Buffer)
rb1.WriteTo(buf) // we omit error handling
newrb:= roaring.New()
newrb.ReadFrom(buf)
if rb1.Equals(newrb) {
fmt.Println("I wrote the content to a byte stream and read it back.")
}
// you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}
If you wish to use serialization and handle errors, you might want to consider the following sample of code:
rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
buf := new(bytes.Buffer)
size,err:=rb.WriteTo(buf)
if err != nil {
t.Errorf("Failed writing")
}
newrb:= New()
size,err=newrb.ReadFrom(buf)
if err != nil {
t.Errorf("Failed reading")
}
if ! rb.Equals(newrb) {
t.Errorf("Cannot retrieve serialized version")
}
Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:
8 + 9 * ((long)x+65535)/65536 + 2 * N
That is, given a fixed overhead for the universe size (x), Roaring
bitmaps never use more than 2 bytes per integer. You can call
BoundSerializedSizeInBytes
for a more precise estimate.
Documentation
Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring
Goroutine safety
In general, it should not generally be considered safe to access
the same bitmaps using different goroutines--they are left
unsynchronized for performance. Should you want to access
a Bitmap from more than one goroutine, you should
provide synchronization. Typically this is done by using channels to pass
the *Bitmap around (in Go style; so there is only ever one owner),
or by using sync.Mutex
to serialize operations on Bitmaps.
Coverage
We test our software. For a report on our test coverage, see
https://coveralls.io/github/RoaringBitmap/roaring?branch=master
Benchmark
Type
go test -bench Benchmark -run -
To run benchmarks on Real Roaring Datasets run the following:
go get github.com/RoaringBitmap/real-roaring-datasets
BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -
Iterative use
You can use roaring with gore:
- go get -u github.com/motemen/gore
- Make sure that
$GOPATH/bin
is in your$PATH
. - go get github.com/RoaringBitmap/roaring
$ gore
gore version 0.2.6 :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"
Fuzzy testing
You can help us test further the library with fuzzy testing:
go get github.com/dvyukov/go-fuzz/go-fuzz
go get github.com/dvyukov/go-fuzz/go-fuzz-build
go test -tags=gofuzz -run=TestGenerateSmatCorpus
go-fuzz-build github.com/RoaringBitmap/roaring
go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200
Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.
Alternative in Go
There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring
For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.