Python programs often open, write to, and append to files. Many useful defs are available in this language—batteries are included.
With the open method, we access files. Methods like readlines()
handle their data. With Python, often a loop is not even needed to read a whole file.
We begin with text files. Even if we just want to display all the lines from a file, newlines must be handled. We read all lines from a file with readlines()
.
readlines()
.string
with "r" to avoid errors with backslashes.print()
modifies the behavior of print. When we use end="" the trailing newline is not printed to the console.# Open a file on the disk. f = open(r"C:\perls.txt", "r") # Print all its lines. for line in f.readlines(): # Modify the end argument. print(line, end="")Line 1 Line 2
We do not need readlines()
to access all the lines in a file—we do not even need read()
or readline
. We can loop over the file object directly.
# Call open() to access the file. f = open(r"C:\programs\info.txt", "r") for line in f: # Empty lines contain a newline character. if line == "\n": print("::EMPTY LINE::") continue # Strip the line. line = line.strip() print(line)Pets: 1. Dog 2. Cat 3. BirdPets: ::EMPTY LINE:: 1. Dog 2. Cat 3. Bird
This statement cleans up resources. It makes simpler the task of freeing system resources. It is used with file handling: open()
is a common call. It improves readability.
try-finally
statement.name = r"C:\perls.txt" # Open the file in a with statement. with open(name) as f: print(f.readline(), end="") # Repeat. with open(name) as f: print(f.readline(), end="")First line First line
The second argument to open()
is a string
containing "mode" flag characters. The "w" specifies write-only mode—no appending or reading is done.
# Create new empty file. # ... If the file exists, it will be cleared of content. f = open("C:\\programs\\test.file", "w")
This program writes lines to a file. It first creates an empty file for writing. It specifies the "w" mode to create an empty file. Then it writes two lines.
# Create an empty file for writing. with open("C:\\programs\\test.file", "w") as f: # Write two lines to the file. f.write("cat\n") f.write("bird\n")cat bird
Count
character frequenciesThis program opens a file and counts each character using a frequency dictionary. It combines open()
, readlines, and dictionary's get()
.
# Open a file. f = open(r"C:\programs\file.txt", "r") # Stores character counts. chars = {} # Loop over file and increment a key for each char. for line in f.readlines(): for c in line.strip(): # Get existing value for this char or a default of zero. # ... Add one and store that. chars[c] = chars.get(c, 0) + 1 # Print character counts. for item in chars.items(): print(item)('a', 12) (' ', 5) ('C', 2) ('b', 15) ('c', 2) ('y', 5) ('x', 2) ('Z', 1)aaaa bbbbb aaaa bbbbb aaaa bbbbb CCcc xx y y y y y Z
IOError
This program causes an IOError
to occur. The file "nope.txt" is most likely not present on the computer. The open()
method raises an IOError
with the "No such file or directory" message.
# An invalid path. name = "/nope.txt" # Attempt to open the file. with open(name) as f: print(f.readline())Traceback (most recent call last): File "...", line 7, in <module> with open(name) as f: IOError: [Errno 2] No such file or directory: '/nope.txt'
Exists
We can prevent the IOError
by first testing the path.exists()
method. This returns true if the file exists, and false otherwise. Here, the method returns false—so open()
is never reached.
import os # A file that does not exist. name = "/nope.txt" # See if the path exists. if os.path.exists(name): # Open the file. with open(name) as f: print(f.readline())
This example uses a try
-raise construct to capture errors. When the open()
method raises an error, control flow enters the except-block.
try: # Does not exist. name = "/nope.txt" # Attempt to open it. with open(name) as f: print(f.readline()) except IOError: # Handle the error. print("An error occurred")An error occurred
There is significant overhead in accessing a file for a read. Here we benchmark file usage on a file with about 1000 lines.
readlines()
method and then loops over each line, calling len
on each line.read()
on the file, and then access the len
of the entire file at once.read()
method. Using readlines was slower.import time print(time.time()) # Version 1: use readlines. i = 0 while i < 10000: with open("C:\\programs\\test.file", "r") as f: count = 0 for line in f.readlines(): count += len(line) i += 1 print(time.time()) # Version 2: use read. i = 0 while i < 10000: with open("C:\\programs\\test.file", "r") as f: count = 0 data = f.read() count = len(data) i += 1 print(time.time())1406148416.003978 1406148423.383404 readlines = 7.38 s 1406148425.989555 read = 2.61 sThis is an interesting file. This is an interesting file. ...
A Python program can read binary data from a file. We must add a "b" at the end of the mode argument. We call read()
to read the entire file into a bytes object.
# Read file in binary form. # ... Specify "b" for binary read and write. f = open(r"C:\stage-perls-cf\file-python", "rb") # Read the entire file. data = f.read() # Print length of result bytes object. # ... Print first three bytes (which are gzip). print(len(data)) print(data[0]) print(data[1]) print(data[2])42078 31 139 8
In Python we have many helpful modules built-in: one such module is textwrap
. This allows us to easily wrap text with just a method call.
File handling is an important yet error-prone aspect of program development. It is essential. It gives us data persistence.