Home
Python
HTML Title Method
Updated Nov 8, 2023
Dot Net Perls
Title, HTML. It is often possible to extract the title of an HTML document with a regular expression in Python. A pattern specifies the surrounding tags, and a group captures the text.
Some issues. Though it does not always work, due to comments and other issues, the re.match method is worth a try on HTML. The total program size is kept small and no complex parser is needed.
HTML Paragraph
Example. This Python program introduces the gettitle() method, which receives an HTML string and returns the title. We begin by specifying the HTML as a string.
Part 1 We specify the HTML. The title within the HTML string is "Example" and this is our desired result.
Part 2 We call re.match. We must use a pattern that matches the entire string, so we allow leading and trailing chars around the tags.
re.match
Part 3 We check our returned Match against None, and then return the first group (which was captured by the parentheses).
None
import re def gettitle(html): # Part 2: use re.match to match the entire html string, and extract data within the title. m = re.match(r"^.*<title>\s*(.+?)\s*</title>.*$", html) # Part 3: return the first group if match was successful. if m: return m.group(1) return "" # Part 1: specify html string and get its title. html = r"<html><title>Example.</title><body><p>...</p></body></html>" print("TITLE:", gettitle(html))
TITLE: Example.
In HTML, titles often contain important information about pages. And with Python we are often tasked with processing data files (which might involve their titles).
Summary. HTML can often be invalid, or may contain commented-out HTML. This is difficult for re.match to deal with—but often it has enough power to capture text.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
This page was last updated on Nov 8, 2023 (new).
Home
Changes
© 2007-2025 Sam Allen