suffixarray ExamplesUse the index/suffixarray module to search a byte slice for a substring in a fast way.
This page was last reviewed on Sep 14, 2023.
Suffixarray. A linear search can be slow. Each element must tested each time. When element count multiplies, this becomes prohibitive. An index is needed.
With suffixarray, an index is built. We use byte slices (which can be based on strings). We call New to create an index, and Lookup to use that index to find positions.
An example. This example uses suffixarray with strings. It first converts the string literal to a byte slice. Then it calls New on the byte slice.
Start The new() method is called with a byte slice argument. Pass -1 to get all indexes found, and any other number to get a specific count.
Result The Lookup method returns an int slice. These are the positions of the indexes found.
Info Lookup returns an int slice, so we can use a for-range loop to iterate over its results.
package main import ( "fmt" "index/suffixarray" ) func main() { // A byte string. value := []byte("cat dog cat cat bird") // Create a new suffixarray index. index := suffixarray.New(value) // Find all instances of this byte slice in the source slice. cats := index.Lookup([]byte("cat"), -1) fmt.Println(cats) // Display all the substrings starting at each index found. for i := range cats { fmt.Println(string(value[cats[i]:])) } // Find just one index. catsOne := index.Lookup([]byte("cat"), 1) fmt.Println(catsOne) // Find another index. birds := index.Lookup([]byte("bird"), -1) fmt.Println(birds) }
[12 8 0] cat bird cat cat bird cat dog cat cat bird [12] [16]
Benchmark, Lookup versus loops. Does suffixarray perform faster than a simple for-loop search? Here I test suffixarray versus nested-for loops. Both versions find 3 results.
Version 1 This finds all instances of the bytes "cat" in the data. It uses suffixarray and the Lookup method.
Version 2 This uses nested for-loops to search for the bytes in each position of the string.
Result The suffixarray version is faster. But please note that the suffixarray.New method is not timed.
package main import ( "fmt" "index/suffixarray" "time" ) func main() { // The data we are searching. value := []byte("cat dog cat cat bird") // The data we search for. test := []byte("cat") // Create index with suffixarray. index := suffixarray.New(value) t0 := time.Now() // Version 1: use suffixarray method. for i := 0; i < 1000000; i++ { cats := index.Lookup(test, -1) if len(cats) != 3 { return } } t1 := time.Now() // Version 2: build slice of indexes with for-loops. for i := 0; i < 1000000; i++ { results := []int{} for start := 0; start < len(value); start++ { equal := true for c := 0; c < len(test); c++ { if start + c >= len(value) { break } if test[c] != value[start + c] { equal = false break } } if equal { results = append(results, start) } } if len(results) != 3 { return } } t2 := time.Now() // Benchmark results. fmt.Println(t1.Sub(t0)) fmt.Println(t2.Sub(t1)) }
214.4069ms, Uses suffixarray, Lookup 290.3657ms, For-loops
Summary. With suffixarray we achieve an algorithmic optimization of search. We can retrieve multiple matching indexes at once. This module implements optimizations on its own.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Sep 14, 2023 (edit).
© 2007-2023 Sam Allen.