Bytes

For low-level tasks in Python, we must directly access bytes. A byte can store 0 through 255. We use the bytes and bytearray built-ins.

In many ways, bytes objects are similar to strings. Many similar methods are available. We should know that bytearray is mutable, and "bytes" is not.

Bytearray example

To begin, we use a list and a bytearray together. We can use the bytearray built-in, and pass it a list instance to create a bytearray.

Step 1 We create a list of 6 integers. Each number in the list is between 0 and 255 (inclusive).

Step 2 Here we create a bytearray from the list—we use the bytearray built-in function.

Step 3 We modify the first 2 elements in the bytearray—this cannot be done with a bytes object. Then we loop over the result.

# Step 1: create list of integers.
# ... All are between 0 and 255 inclusive.
elements = [0, 200, 50, 25, 10, 255]

# Step 2: create bytearray from list.
values = bytearray(elements)

# Step 3: modify elements, and loop over the bytearray.
values[0] = 5
values[1] = 0
for value in values:
    print("VALUE:", value)VALUE: 5
VALUE: 0
VALUE: 50
VALUE: 25
VALUE: 10
VALUE: 255

Bytes example

We now consider bytes—this is similar to bytearray, but the elements cannot be changed. The bytes object is an immutable array of bytes.

Tip Bytearray, bytes and memoryview act upon the buffer protocol. They all share similar syntax.

elements = [5, 10, 100]

# Create immutable bytes object.
data = bytes(elements)

# Loop over bytes.
for item in data:
    print("BYTE:", item)BYTE: 5
BYTE: 10
BYTE: 100

Error

Here we see an error when using bytes. We try to modify the first element of a bytes object. Python complains—the "object does not support item assignment."

data = bytes([10, 20, 30, 40])

# We can read values from a bytes object.
print(data[0])

# We cannot assign elements.
data[0] = 110
Traceback (most recent call last):
  File "/Users/sam/Documents/test.py", line 9, in <module>
    data[0] = 1
TypeError: 'bytes' object does not support item assignment

Len

We can get the length of a bytearray or bytes object with the len built-in. Here we use bytearray in the same way as a string or a list.

# Create bytearray from some data.
values = bytearray([6, 7, 60, 70, 0])

# It has 5 elements.
# ... The len is 5.
print("Element count:", len(values))
Element count: 5

Literals

Bytes and bytearray objects can be created with a special string literal syntax. We prefix the literals with a "b." This prefix is required.

Tip Buffer protocol methods require byte-prefix string literals, even for arguments to methods like replace().

# Create bytes object from byte literal.
data = bytes(b"abc")
for value in data:
    print(value)

print()

# Create bytearray from byte literal.
arr = bytearray(b"abc")
for value in arr:
    print(value)97
98
99

97
98
99

Slice, `bytearray`

We can slice bytearrays. And because bytearray is mutable, we can use slices to change its contents. Here we assign a slice to an integer list.

values = [5, 10, 15, 20]
arr = bytearray(values)

# Assign first 2 elements to new list.
arr[0:2] = [100, 0, 0]

# The array is now modified.
for v in arr: print(v)100
0
0
15
20

Slice, bytes

A bytes object too supports slice syntax, but it is read-only. Here we get a slice of bytes (the first 2 elements) and loop over it.

Tip We can loop over a slice directly in the for-loop condition. The variable is not needed.

data = bytes(b"abc")

# Get a slice from the bytes object.
first_part = data[0:2]

# Display values from slice.
for element in first_part: print(element)97
98

`Count`

Count loops through the bytes and counts instances matching our specified pattern. The argument must be a byte object, like a "b" string literal or a number between 0 and 255.

# Create a bytes object and a bytearray.
data = bytes(b"abc abc")
arr = bytearray(b"abc abc")

# The count method (from the buffer interface) works on both.
print(data.count(b"c"))
print(arr.count(b"c"))2
2

`Find`

This method returns the leftmost index of a matching sequence. Optionally we can specify a start index and an end index (as the second and third arguments).

data = bytes(b"python")

# This sequence is found.
index1 = data.find(b"on")
print(index1)

# This sequence is not present.
index2 = data.find(b"java")
print(index2)4
-1

In operator

This tests for existence. We use "in" to see if an element exists within the bytes objects. This is a clearer way to see if a byte exists in our object.

data = bytes([100, 20, 10, 200, 200])

# Test bytes object with "in" operator.
if 200 in data:
    print(True)

if 0 not in data:
    print(False)True
False

Combine 2 bytearrays

As with lists and other sequences, we can combine 2 bytearrays (or bytes) with a plus. In my tests, I found it does not matter if we combine 2 different types.

left = bytearray(b"hello ")
right = bytearray(b"world")

# Combine 2 bytearray objects with plus.
both = left + right
print(both)
bytearray(b'hello world')

Convert list

A list of bytes (numbers between 0 and 256) can be converted into a bytearray with the constructor. To convert back into a list, please use the list built-in constructor.

Tip Lists display in a more friendly way with the print method. So we might use this code to display bytearrays and bytes.

initial = [100, 255, 255, 0]
print(initial)

# Convert the list to a byte array.
b = bytearray(initial)
print(b)

# Convert back to a list.
result = list(b)
print(result)[100, 255, 255, 0]
bytearray(b'd\xff\xff\x00')
[100, 255, 255, 0]

Convert `string`

A bytearray can be created from a string. The encoding (like "ascii") is specified as the second argument in the bytearray constructor.

Detail To convert from a bytearray back into a string, the decode method is needed.

# Create a bytearray from a string with ASCII encoding.
arr = bytearray("abc", "ascii")
print(arr)

# Convert bytearray back into a string.
result = arr.decode("ascii")
print(result)bytearray(b'abc')
abc

`Append`, del, insert

A bytearray supports many of the same operations as a list. We can append values. We can delete a value or a range of values with del. And we can insert a value.

# Create bytearray and append integers as bytes.
values = bytearray()
values.append(0)
values.append(1)
values.append(2)
print(values)

# Delete the first element.
del values[0:1]
print(values)

# Insert at index 1 the value 3.
values.insert(1, 3)
print(values)bytearray(b'\x00\x01\x02')
bytearray(b'\x01\x02')
bytearray(b'\x01\x03\x02')

`ValueError`

Numbers inserted into a bytearray or bytes object must be between 0 and 255 inclusive. If we try to insert an out-of-range number, we will receive a ValueError.

# This does not work.
values = bytes([3000, 4000, 5000])
print("Not reached")Traceback (most recent call last):
  File "/Users/sam/Documents/test.py", line 4, in <module>
    values = bytes([3000, 4000, 5000])
ValueError: byte must be in range(0, 256)

`Replace`

The buffer protocol supports string-like methods. We can use replace() as on a string. The arguments must be bytes objects—here we use "b" literals.

value = b"aaabbb"

# Use bytes replace method.
result = value.replace(b"bbb", b"ccc")
print(result)
b'aaaccc'

Compare

A "b" literal is a bytes object. We can compare a bytearray or a bytes object with this kind of constant. To compare bytes objects, we use 2 equals signs.

Note Two equals signs compares the individual byte contents, not the identity of the objects.

# Create a bytes object with no "bytes" keyword.
value1 = b"desktop"
print(value1)

# Use bytes keyword.
value2 = bytes(b"desktop")
print(value2)

# Compare 2 bytes objects.
if value1 == value2:
    print(True)b'desktop'
b'desktop'
True

Start, end

We can handle bytes objects much like strings. Common methods like startswith and endswith are included. These check the beginning and end parts.

Info The argument to startswith and endswith must be a bytes object. We can use the handy "b" prefix.

value = b"users"

# Compare bytes with startswith and endswith.
if value.startswith(b"use"):
    print(True)

if value.endswith(b"s"):
    print(True)True
True

`Split`, join

The split and join methods are implemented on bytes objects. Here we handle a simple CSV string in bytes. We separate values based on a comma char.

# A bytes object with comma-separate values.
data = b"cat,dog,fish,bird,true"

# Split on comma-byte.
elements = data.split(b",")

# Print length and list contents.
print(len(elements))
print(elements)

# Combine bytes objects into a single bytes object.
result = b",".join(elements)
print(result)5
[b'cat', b'dog', b'fish', b'bird', b'true']
b'cat,dog,fish,bird,true'

Memoryview

This is an abstraction that provides buffer interface methods. We can create a memoryview from a bytes object, a bytearray or another type like an array.

Tip With memoryview we can separate our code that uses the buffer interface from the actual data type. It is an abstraction.

view = memoryview(b"abc")

# Print the first element.
print(view[0])

# Print the element count.
print(len(view))

# Convert to a list.
print(view.tolist())b'a'
3
[97, 98, 99]

Benchmark, `bytearray`

Suppose we want to append 256 values to a list. Bytearray is more complex to handle, and it does not support large numeric values. But it may help performance.

Version 1 This version of the code appends integers to a list collection in a nested loop.

Version 2 In this code, we append integers to a bytearray in a nested loop. The same values are used as in version 1.

Result Bytearray here is faster. So we both improve memory size and reduce time required with bytearray over list.

import time

print(time.time())

# Version 1: append to list.
for i in range(0, 1000000):
    x = list()
    for v in range(0, 255):
        x.append(v)

print(time.time())

# Version 2: append to bytearray.
for i in range(0, 1000000):
    x = bytearray()
    for v in range(0, 255):
        x.append(v)

print(time.time())1411859925.29213
1411859927.673053    list append:      2.38 s
1411859929.463818    bytearray append: 1.79 s  [faster]

Read bytes from file

A file can be read into a bytes object. We must specify the "b" mode—to read a file as bytes, we use the argument "rb."

Notes, performance

Lists and strings can become inefficient quickly. Where we represent data in bytes, numbers from 0 to 255, buffer types (like bytes and bytearray) are ideal.

Summary

Bytes and bytearrays are an efficient, byte-based form of strings. They have many of the same methods as strings, but can also be used as lists.