Split
When processing text in Rust, we often need to separate apart the values in the strings. Parts are separated with delimiter chars or strings.
When we invoke the split function, we get an iterator. This can be used in a for
-loop, or it can be collected into a Vec
of strings.
To begin, we use split()
in the simplest way possible. We first declare a string
literal (test) that has a delimiter char
—here it is a semicolon.
char
argument. We do not convert or collect the iterator.for-in
loop. We print each value, getting the results of split as we proceed.fn main() { let test = "cat;bird"; // Step 1: get iterator from splitting on character. let values = test.split(';'); // Step 2: print all results from iterator. for v in values { println!("SPLIT: {}", v) } }SPLIT: cat SPLIT: bird
Suppose we wish to have more complex logic that tests for a delimiter. We can use a closure, or a function, to test chars.
Whitespace_test
returns true if the argument matches.fn whitespace_test(c: char) -> bool { return c == ' ' || c == '\n'; } fn main() { let test = "cat dog\nbird"; // Call split, using function to test separators. let values = test.split(whitespace_test); // Print results. for v in values { println!("SPLIT: {}", v) } }SPLIT: cat SPLIT: dog SPLIT: bird
Split_whitespace
There is another function in the Rust standard library that always splits on any whitespace. This is split_whitespace
.
string
that has 5 parts separated by various whitespace characters. We split it apart.fn main() { let terms = "bird frog tree\n?\t!"; // Split on whitespace. for term in terms.split_whitespace() { println!("{}", term); } }bird frog tree ? !
Split
ascii whitespaceIf we have a string
that is known to have ASCII delimiters, we can use split_ascii_whitespace
. This is a good solution when we are sure we just have ASCII.
fn main() { let terms = "bird frog tree\n?\t!"; // Has the same results as split_whitespace. for term in terms.split_ascii_whitespace() { println!("{}", term); } }bird frog tree ? !
Split
and parseIt is possible to split apart a string
and parse each number in the string. This code is often used for parsing in text files containing numbers.
string
on spaces, and then parse each resulting string
in the iterator with the parse()
function.fn main() { let test = "123 456"; let values = test.split(' '); for v in values { // Parse each part. let parsed: u32 = v.parse().unwrap(); // Add 1 to show that we have a u32 value. println!("SPLIT PARSE: {} {}", parsed, parsed + 1) } }SPLIT PARSE: 123 124 SPLIT PARSE: 456 457
Suppose we want to get a vec
from the split function. The easiest way to do this is to call collect()
with the TurboFish operator to specify the desired type.
fn main() { let source = String::from("a,b,c"); // Use collect to get a vector from split. let letters = source.split(',').collect::<Vec<&str>>(); println!("{:#?}", letters); }[ "a", "b", "c", ]
Suppose we have a file of key-value pairs, with a key and value separated by an equal sign on each line. With Rust we can parse this file into a HashMap
of string
keys.
open()
and then loop over the lines()
in the file. Then we split()
each line.HashMap
.HashMap
, which was populated by the file we just read in. The file text is shown in the example.use std::io::*; use std::fs::File; use std::collections::HashMap; fn main() { // Open file of key-value pairs. let file = File::open("/Users/sam/example.txt").unwrap(); let reader = BufReader::new(file); let mut hash: HashMap<String, String> = HashMap::new(); // Read and parse file. for line in reader.lines() { let line_inner = line.unwrap(); let values: Vec<&str> = line_inner.split('=').collect(); if values.len() == 2 { hash.insert(values[0].to_string(), values[1].to_string()); } } // Get value from file. let cat = hash.get("cat"); if let Some(value) = &cat { println!("VALUE FOUND: {}", value); } }VALUE FOUND: orangebird=blue cat=orange
Split
onceSuppose we have a string
and it has one separator, and we want to split apart the two sides of the string
. This can be done with a split_once
call.
collect()
method calls or loops. Just assign a tuple pair to the result of split_once
.split()
can be replaced with split_once()
, so it is a good function to know.fn main() { let value = "left:right"; // Get left and right from value. let (left, right) = value.split_once(":").unwrap(); println!("left = {} right = {}", left, right); }left = left right = right
Often the split function is called with a following collect call. This is sometimes needlessly inefficient—we can avoid the collect.
for
-loop.for
-loop).use std::time::*; fn main() { let source = String::from("a,b,c"); let t0 = Instant::now(); // Version 1: call collect after split. for _ in 0..1000000 { let letters = source.split(',').collect::<Vec<&str>>(); for _ in &letters { } } println!("{}", t0.elapsed().as_millis()); // Version 2: avoid collect after split. let t1 = Instant::now(); for _ in 0..1000000 { let letters = source.split(','); for _ in letters { } } println!("{}", t1.elapsed().as_millis()); }62 ms split, collect 19 ms split
The Split
function is often used with Vec
and string
arrays. We can pass functions (or closures) to split()
for more complex behavior, and an iterator is returned.