A Python program cannot access a file on the Internet directly. It must instead make an external request. With urllib
we make these requests—we download external files.
By calling urlopen
in a for
-loop, we can access each line of a web page. This can be useful when scraping for data on external sites.
First we must import, from the urllib
library, the urlopen
method. This is the first line of the Python file. The program next calls urlopen()
.
decode()
on the line. This fixes some of the data.double
line breaks at the end of lines.from urllib.request import urlopen # Print first four lines of this site. i = 0 for line in urlopen("http://www.example.com/"): # Decode. line = line.decode() # Print. print(i, line, end="") # See if past limit. if i == 3: break i += 10 <!doctype html> 1 <html> 2 <head> 3 <title>Example Domain</title>
Parse
Web locations usually begin in http or https—these are called URLs or URIs. In Python we use the urllib.parse
module to access the urlparse type.
from urllib.parse import urlparse # Parse this url. result = urlparse("http://www.example.com/") # Get some values from the ParseResult. scheme = result.scheme loc = result.netloc path = result.path # Print our values. print(scheme) print(loc) print(path)http www.example.com /
Python can fetch external files or web pages. But the complexity of programs increases when external files are necessary—sometimes external files cause errors.