collect File for HashMap parse String Array

`Split`

When processing text in Rust, we often need to separate apart the values in the strings. Parts are separated with delimiter chars or strings.

When we invoke the split function, we get an iterator. This can be used in a for-loop, or it can be collected into a Vec of strings.

First example

To begin, we use split() in the simplest way possible. We first declare a string literal (test) that has a delimiter char—here it is a semicolon.

Step 1 We invoke split, passing the semicolon as an argument—it is a char argument. We do not convert or collect the iterator.

Step 2 We loop over the resulting iterator with the for-in loop. We print each value, getting the results of split as we proceed.

fn main() {
    let test = "cat;bird";
    // Step 1: get iterator from splitting on character.
    let values = test.split(';');

    // Step 2: print all results from iterator.
    for v in values {
        println!("SPLIT: {}", v)
    }
}SPLIT: cat
SPLIT: bird

Delimiter function

Suppose we wish to have more complex logic that tests for a delimiter. We can use a closure, or a function, to test chars.

Here We have 2 chars we want to split on, the space and newline chars. Whitespace_test returns true if the argument matches.

fn whitespace_test(c: char) -> bool {
    return c == ' ' || c == '\n';
}

fn main() {
    let test = "cat dog\nbird";
    // Call split, using function to test separators.
    let values = test.split(whitespace_test);

    // Print results.
    for v in values {
        println!("SPLIT: {}", v)
    }
}SPLIT: cat
SPLIT: dog
SPLIT: bird

`Split_whitespace`

There is another function in the Rust standard library that always splits on any whitespace. This is split_whitespace.

Here We have a "terms" string that has 5 parts separated by various whitespace characters. We split it apart.

fn main() {
    let terms = "bird frog tree\n?\t!";
    // Split on whitespace.
    for term in terms.split_whitespace() {
        println!("{}", term);
    }
}bird
frog
tree
?
!

`Split` ascii whitespace

If we have a string that is known to have ASCII delimiters, we can use split_ascii_whitespace. This is a good solution when we are sure we just have ASCII.

fn main() {
    let terms = "bird frog tree\n?\t!";
    // Has the same results as split_whitespace.
    for term in terms.split_ascii_whitespace() {
        println!("{}", term);
    }
}bird
frog
tree
?
!

`Split` and parse

It is possible to split apart a string and parse each number in the string. This code is often used for parsing in text files containing numbers.

Info We split the string on spaces, and then parse each resulting string in the iterator with the parse() function.

fn main() {
    let test = "123 456";
    let values = test.split(' ');
    for v in values {
        // Parse each part.
        let parsed: u32 = v.parse().unwrap();
        // Add 1 to show that we have a u32 value.
        println!("SPLIT PARSE: {} {}", parsed, parsed + 1)
    }
}SPLIT PARSE: 123 124
SPLIT PARSE: 456 457

Collect

Suppose we want to get a vec from the split function. The easiest way to do this is to call collect() with the TurboFish operator to specify the desired type.

fn main() {
    let source = String::from("a,b,c");
    // Use collect to get a vector from split.
    let letters = source.split(',').collect::<Vec<&str>>();
    println!("{:#?}", letters);
}[
    "a",
    "b",
    "c",
]

Read file, split

Suppose we have a file of key-value pairs, with a key and value separated by an equal sign on each line. With Rust we can parse this file into a HashMap of string keys.

Start We open the file with File open() and then loop over the lines() in the file. Then we split() each line.

Detail We collect the result of split, and then place the left side as the key, and the right side as the value in the HashMap.

Finally We get a key from the HashMap, which was populated by the file we just read in. The file text is shown in the example.

use std::io::*;
use std::fs::File;
use std::collections::HashMap;

fn main() {
    // Open file of key-value pairs.
    let file = File::open("/Users/sam/example.txt").unwrap();
    let reader = BufReader::new(file);
    let mut hash: HashMap<String, String> = HashMap::new();

    // Read and parse file.
    for line in reader.lines() {
        let line_inner = line.unwrap();
        let values: Vec<&str> = line_inner.split('=').collect();
        if values.len() == 2 {
            hash.insert(values[0].to_string(), values[1].to_string());
        }
    }

    // Get value from file.
    let cat = hash.get("cat");
    if let Some(value) = &cat {
        println!("VALUE FOUND: {}", value);
    }
}
VALUE FOUND: orangebird=blue
cat=orange

`Split` once

Suppose we have a string and it has one separator, and we want to split apart the two sides of the string. This can be done with a split_once call.

And We can avoid complicated collect() method calls or loops. Just assign a tuple pair to the result of split_once.

Tip Many uses of split() can be replaced with split_once(), so it is a good function to know.

fn main() {
    let value = "left:right";
    // Get left and right from value.
    let (left, right) = value.split_once(":").unwrap();
    println!("left = {} right = {}", left, right);
}
left = left right = right

Collect benchmark

Often the split function is called with a following collect call. This is sometimes needlessly inefficient—we can avoid the collect.

Version 1 This version of the code calls split, and then calls collect to get a vector.

Version 2 Here we call split, but then directly use the result of the split call in a for-loop.

Result There are possible performance improvements when avoiding calling collect—try to use the result of split directly (as in a for-loop).

use std::time::*;

fn main() {
    let source = String::from("a,b,c");
    let t0 = Instant::now();

    // Version 1: call collect after split.
    for _ in 0..1000000 {
        let letters = source.split(',').collect::<Vec<&str>>();
        for _ in &letters {
        }
    }
    println!("{}", t0.elapsed().as_millis());

    // Version 2: avoid collect after split.
    let t1 = Instant::now();
    for _ in 0..1000000 {
        let letters = source.split(',');
        for _ in letters {
        }
    }
    println!("{}", t1.elapsed().as_millis());
}62 ms    split, collect
19 ms    split

The Split function is often used with Vec and string arrays. We can pass functions (or closures) to split() for more complex behavior, and an iterator is returned.

Split