`Split`

Strings in Python often store many pieces of data. In a comma-separated format, these parts are divided with commas. A space is another common delimiter.

Method details

With split we extract string parts. Often files must be read and lines must be split—this is done with readlines() and split.

First example

Here we handle a string that contains fields separated by commas. We call split() with a single comma string argument.

Info The split method with a string argument separates strings based on the specified delimiter.

Result We loop over the resulting list with a for-loop and print each value to the console.

# Input string.
s = "lowercase a,uppercase A,lowercase z"

# Separate on comma.
values = s.split(",")

# Loop and print each string.
for value in values:
    print(value)lowercase a
uppercase A
lowercase z

No arguments

Split() can be called with no argument. In this case, split uses spaces as the delimiter—one or more spaces are treated the same.

# Input string.
# ... Irregular number of spaces between words.
s = "One two   three"

# Call split with no arguments.
words = s.split()

# Display results.
for word in words:
    print(word)One
two
three

CSV file

This kind of file contains lines of text. It has values separated by commas. These files can be parsed with the split method.

Detail We combine the open, readlines and strip methods. The path passed to open should be corrected.

Info This CSV parser splits each line of text at the commas. It loops and displays the original data and the extracted values.

manhattan,the bronx
brooklyn,queens
staten island
# Open this file.
f = open("C:\perls.txt", "r")

# Loop over each line in the file.
for line in f.readlines():

    # Strip the line to remove whitespace.
    line = line.strip()

    # Display the line.
    print(line)

    # Split the line.
    parts = line.split(",")

    # Display each part of the line, indented.
    for part in parts:
        print("   ", part)manhattan,the bronx
    manhattan
    the bronx
brooklyn,queens
    brooklyn
    queens
staten island
    staten island

Rsplit

Usually rsplit() is the same as split. The only difference occurs when the second argument is specified. This limits the number of times a string is separated.

So When we specify 3, we split off only three times from the right. This is the maximum number of splits that occur.

Tip The first element in the result list contains all the remaining, non-separated string values. This is unprocessed data.

# Data.
s = "orange;yellow;blue;tan;red;green"

# Separate on semicolon: split from the right, only split three.
colors = s.rsplit(";", 3)

# Loop and print.
for color in colors:
    print(color)orange;yellow;blue
tan
red
green

Splitlines

Lines of text can be separated with Windows or UNIX newline sequences. This makes splitting on lines complex. The splitlines() method helps here.

And We split the 3-line string literal into three separate strings with splitlines. And then we print them.

# Data.
s = """This string
has many
lines."""

# Split on line breaks.
lines = s.splitlines()

# Loop and display each line.
for line in lines:
    print("[" + line + "]")[ This string ]
[ has many ]
[ lines. ]

Partition

This method is similar to split(). It separates a string only on the first (leftmost) delimiter. It then returns a tuple containing its result data.

Info The tuple has 3 parts. It has the left part, the delimiter character, and the remaining string data.

Also The rpartition method acts from the right of the string, rather than the left. Partition is "lpartition."

# Input data.
s = "123 Oak Street, New York"

# Partition on first space.
t = s.partition(" ")

# Print tuple contents.
print(t)

# Print first element.
print("First element:", t[0])('123', ' ', 'Oak Street, New York')
First element: 123

Partition loop

The result tuple of partition() makes it easy to use in a loop. We can continually partition a string, shortening the source data as we go along.

Here In this example, we continue to consume each word in a source string. We read in each word at a time.

Detail We use the while-loop to continue as long as further data exists in the input string.

# The input string.
s = "Dot Net Perls website"

# Continue while the string has data.
while len(s) > 0:

    # Partition on first space.
    t = s.partition(" ")

    # Display the partitioned part.
    print(t[0])
    print("    ", t)

    # Set string variable to non-partitioned part.
    s = t[2]Dot
     ('Dot', ' ', 'Net Perls website')
Net
     ('Net', ' ', 'Perls website')
Perls
     ('Perls', ' ', 'website')
website
     ('website', '', '')

Handle numbers

A string contains numbers separated by a character. We can split the string, then convert each result to an integer with int.

Here We sum the integers in a string. The float built-in handles numbers with decimal places.

numbers = "100,200,50"

# Split apart the numbers.
values = numbers.split(",")

# Loop over strings and convert them to integers.
# ... Then sum them.
total = 0
for value in values:
    total += int(value)

print(total)
350

Benchmark, arguments

Should we specify an argument to split() if we do not need one? Often using an argument is not needed, although this can change behavior.

Version 1 This version of the code uses split() with no arguments, which splits the string apart on spaces.

Version 2 This code uses the space argument. It has the same logical effect as version 1.

Result We find that split with no arguments is faster (by about 10%) than split with a space argument.

import time

# Input data.
s = "This is a split performance test"

print(s.split())
print(s.split(" "))

# Time 1.
print(time.time())

# Version 1: default version.
i = 0
while i < 1000000:
    words = s.split()
    i += 1

# Time 2.
print(time.time())

# Version 2: explicit space version.
i = 0
while i < 1000000:
    words = s.split(" ")
    i += 1

# Time 3.
print(time.time())['This', 'is', 'a', 'split', 'performance', 'test']
['This', 'is', 'a', 'split', 'performance', 'test']
1361813180.908
1361813181.561   split()    = 0.6530 s
1361813182.307   split(" ") = 0.7460 s

CSV module

We do not need to use split() to manually parse CSV files. The csv module is available. It offers the csvfile type. We use dialects to detect how to parse files.

Strip

Often strings must be processed in some way before splitting them. For leading and trailing whitespace, please try the strip method. The lstrip and rstrip methods are also useful.

Summary

Split() helps with processing many text files and input data, as from websites or databases. We benchmarked split, and we explored related methods like partition.