For low-level tasks in Python, we must directly access bytes. A byte
can store 0 through 255. We use the bytes and bytearray
built-ins.
In many ways, bytes objects are similar to strings. Many similar methods are available. We should know that bytearray
is mutable, and "bytes" is not.
To begin, we use a list and a bytearray
together. We can use the bytearray
built-in, and pass it a list instance to create a bytearray
.
bytearray
from the list—we use the bytearray
built-in function.bytearray
—this cannot be done with a bytes object. Then we loop over the result.# Step 1: create list of integers. # ... All are between 0 and 255 inclusive. elements = [0, 200, 50, 25, 10, 255] # Step 2: create bytearray from list. values = bytearray(elements) # Step 3: modify elements, and loop over the bytearray. values[0] = 5 values[1] = 0 for value in values: print("VALUE:", value)VALUE: 5 VALUE: 0 VALUE: 50 VALUE: 25 VALUE: 10 VALUE: 255
We now consider bytes—this is similar to bytearray
, but the elements cannot be changed. The bytes object is an immutable array of bytes.
elements = [5, 10, 100] # Create immutable bytes object. data = bytes(elements) # Loop over bytes. for item in data: print("BYTE:", item)BYTE: 5 BYTE: 10 BYTE: 100
Here we see an error when using bytes. We try to modify the first element of a bytes object. Python complains—the "object does not support item assignment."
data = bytes([10, 20, 30, 40]) # We can read values from a bytes object. print(data[0]) # We cannot assign elements. data[0] = 110 Traceback (most recent call last): File "/Users/sam/Documents/test.py", line 9, in <module> data[0] = 1 TypeError: 'bytes' object does not support item assignment
We can get the length of a bytearray
or bytes object with the len
built-in. Here we use bytearray
in the same way as a string
or a list.
# Create bytearray from some data. values = bytearray([6, 7, 60, 70, 0]) # It has 5 elements. # ... The len is 5. print("Element count:", len(values))Element count: 5
Bytes and bytearray
objects can be created with a special string
literal syntax. We prefix the literals with a "b." This prefix is required.
Buffer
protocol methods require byte
-prefix string
literals, even for arguments to methods like replace()
.# Create bytes object from byte literal. data = bytes(b"abc") for value in data: print(value) print() # Create bytearray from byte literal. arr = bytearray(b"abc") for value in arr: print(value)97 98 99 97 98 99
bytearray
We can slice bytearrays. And because bytearray
is mutable, we can use slices to change its contents. Here we assign a slice to an integer list.
values = [5, 10, 15, 20] arr = bytearray(values) # Assign first 2 elements to new list. arr[0:2] = [100, 0, 0] # The array is now modified. for v in arr: print(v)100 0 0 15 20
A bytes object too supports slice syntax, but it is read-only. Here we get a slice of bytes (the first 2 elements) and loop over it.
for
-loop condition. The variable is not needed.data = bytes(b"abc") # Get a slice from the bytes object. first_part = data[0:2] # Display values from slice. for element in first_part: print(element)97 98
Count
Count
loops through the bytes and counts instances matching our specified pattern. The argument must be a byte
object, like a "b" string
literal or a number between 0 and 255.
# Create a bytes object and a bytearray. data = bytes(b"abc abc") arr = bytearray(b"abc abc") # The count method (from the buffer interface) works on both. print(data.count(b"c")) print(arr.count(b"c"))2 2
Find
This method returns the leftmost index of a matching sequence. Optionally we can specify a start index and an end index (as the second and third arguments).
data = bytes(b"python") # This sequence is found. index1 = data.find(b"on") print(index1) # This sequence is not present. index2 = data.find(b"java") print(index2)4 -1
This tests for existence. We use "in" to see if an element exists within the bytes objects. This is a clearer way to see if a byte
exists in our object.
data = bytes([100, 20, 10, 200, 200]) # Test bytes object with "in" operator. if 200 in data: print(True) if 0 not in data: print(False)True False
As with lists and other sequences, we can combine 2 bytearrays (or bytes) with a plus. In my tests, I found it does not matter if we combine 2 different types.
left = bytearray(b"hello ") right = bytearray(b"world") # Combine 2 bytearray objects with plus. both = left + right print(both)bytearray(b'hello world')
A list of bytes (numbers between 0 and 256) can be converted into a bytearray
with the constructor. To convert back into a list, please use the list built-in constructor.
initial = [100, 255, 255, 0] print(initial) # Convert the list to a byte array. b = bytearray(initial) print(b) # Convert back to a list. result = list(b) print(result)[100, 255, 255, 0] bytearray(b'd\xff\xff\x00') [100, 255, 255, 0]
string
A bytearray
can be created from a string
. The encoding (like "ascii") is specified as the second argument in the bytearray
constructor.
bytearray
back into a string
, the decode method is needed.# Create a bytearray from a string with ASCII encoding. arr = bytearray("abc", "ascii") print(arr) # Convert bytearray back into a string. result = arr.decode("ascii") print(result)bytearray(b'abc') abc
Append
, del, insertA bytearray
supports many of the same operations as a list. We can append values. We can delete a value or a range of values with del. And we can insert a value.
# Create bytearray and append integers as bytes. values = bytearray() values.append(0) values.append(1) values.append(2) print(values) # Delete the first element. del values[0:1] print(values) # Insert at index 1 the value 3. values.insert(1, 3) print(values)bytearray(b'\x00\x01\x02') bytearray(b'\x01\x02') bytearray(b'\x01\x03\x02')
ValueError
Numbers inserted into a bytearray
or bytes object must be between 0 and 255 inclusive. If we try to insert an out-of-range number, we will receive a ValueError
.
# This does not work. values = bytes([3000, 4000, 5000]) print("Not reached")Traceback (most recent call last): File "/Users/sam/Documents/test.py", line 4, in <module> values = bytes([3000, 4000, 5000]) ValueError: byte must be in range(0, 256)
Replace
The buffer protocol supports string
-like methods. We can use replace()
as on a string
. The arguments must be bytes objects—here we use "b" literals.
value = b"aaabbb" # Use bytes replace method. result = value.replace(b"bbb", b"ccc") print(result)b'aaaccc'
A "b" literal is a bytes object. We can compare a bytearray
or a bytes object with this kind of constant. To compare bytes objects, we use 2 equals signs.
byte
contents, not the identity of the objects.# Create a bytes object with no "bytes" keyword. value1 = b"desktop" print(value1) # Use bytes keyword. value2 = bytes(b"desktop") print(value2) # Compare 2 bytes objects. if value1 == value2: print(True)b'desktop' b'desktop' True
We can handle bytes objects much like strings. Common methods like startswith
and endswith
are included. These check the beginning and end parts.
startswith
and endswith
must be a bytes object. We can use the handy "b" prefix.value = b"users" # Compare bytes with startswith and endswith. if value.startswith(b"use"): print(True) if value.endswith(b"s"): print(True)True True
Split
, joinThe split and join methods are implemented on bytes objects. Here we handle a simple CSV string
in bytes. We separate values based on a comma char
.
# A bytes object with comma-separate values. data = b"cat,dog,fish,bird,true" # Split on comma-byte. elements = data.split(b",") # Print length and list contents. print(len(elements)) print(elements) # Combine bytes objects into a single bytes object. result = b",".join(elements) print(result)5 [b'cat', b'dog', b'fish', b'bird', b'true'] b'cat,dog,fish,bird,true'
This is an abstraction that provides buffer interface
methods. We can create a memoryview from a bytes object, a bytearray
or another type like an array.
interface
from the actual data type. It is an abstraction.view = memoryview(b"abc") # Print the first element. print(view[0]) # Print the element count. print(len(view)) # Convert to a list. print(view.tolist())b'a' 3 [97, 98, 99]
bytearray
Suppose we want to append 256 values to a list. Bytearray is more complex to handle, and it does not support large numeric values. But it may help performance.
bytearray
in a nested loop. The same values are used as in version 1.bytearray
over list.import time print(time.time()) # Version 1: append to list. for i in range(0, 1000000): x = list() for v in range(0, 255): x.append(v) print(time.time()) # Version 2: append to bytearray. for i in range(0, 1000000): x = bytearray() for v in range(0, 255): x.append(v) print(time.time())1411859925.29213 1411859927.673053 list append: 2.38 s 1411859929.463818 bytearray append: 1.79 s [faster]
A file can be read into a bytes object. We must specify the "b" mode—to read a file as bytes, we use the argument "rb."
Lists and strings can become inefficient quickly. Where we represent data in bytes, numbers from 0 to 255, buffer types (like bytes and bytearray
) are ideal.
Bytes and bytearrays are an efficient, byte
-based form of strings. They have many of the same methods as strings, but can also be used as lists.