A linear search can be slow. Each element must tested each time. When element count multiplies, this becomes prohibitive. An index is needed.
With suffixarray
, an index is built. We use byte slices (which can be based on strings). We call New to create an index, and Lookup to use that index to find positions.
This example uses suffixarray
with strings. It first converts the string
literal to a byte slice. Then it calls New on the byte slice.
new()
method is called with a byte slice argument. Pass -1 to get all indexes found, and any other number to get a specific count.int
slice. These are the positions of the indexes found.int
slice, so we can use a for-range
loop to iterate over its results.package main import ( "fmt" "index/suffixarray" ) func main() { // A byte string. value := []byte("cat dog cat cat bird") // Create a new suffixarray index. index := suffixarray.New(value) // Find all instances of this byte slice in the source slice. cats := index.Lookup([]byte("cat"), -1) fmt.Println(cats) // Display all the substrings starting at each index found. for i := range cats { fmt.Println(string(value[cats[i]:])) } // Find just one index. catsOne := index.Lookup([]byte("cat"), 1) fmt.Println(catsOne) // Find another index. birds := index.Lookup([]byte("bird"), -1) fmt.Println(birds) }[12 8 0] cat bird cat cat bird cat dog cat cat bird [12] [16]
Does suffixarray
perform faster than a simple for
-loop search? Here I test suffixarray
versus nested-for loops. Both versions find 3 results.
suffixarray
and the Lookup method.for
-loops to search for the bytes in each position of the string
.suffixarray
version is faster. But please note that the suffixarray.New
method is not timed.package main import ( "fmt" "index/suffixarray" "time" ) func main() { // The data we are searching. value := []byte("cat dog cat cat bird") // The data we search for. test := []byte("cat") // Create index with suffixarray. index := suffixarray.New(value) t0 := time.Now() // Version 1: use suffixarray method. for i := 0; i < 1000000; i++ { cats := index.Lookup(test, -1) if len(cats) != 3 { return } } t1 := time.Now() // Version 2: build slice of indexes with for-loops. for i := 0; i < 1000000; i++ { results := []int{} for start := 0; start < len(value); start++ { equal := true for c := 0; c < len(test); c++ { if start + c >= len(value) { break } if test[c] != value[start + c] { equal = false break } } if equal { results = append(results, start) } } if len(results) != 3 { return } } t2 := time.Now() // Benchmark results. fmt.Println(t1.Sub(t0)) fmt.Println(t2.Sub(t1)) }214.4069ms, Uses suffixarray, Lookup 290.3657ms, For-loops
With suffixarray
we achieve an algorithmic optimization of search. We can retrieve multiple matching indexes at once. This module implements optimizations on its own.