In Python programs we often want to collect unique strings or ints. A set is ideal—it is a dictionary but with no linked values. A frozenset
is immutable and can be a dictionary key.
The set()
function is built-in, as Python is "batteries included." With it we can remove duplicate elements. We can initialize it with curly brackets.
This program initializes a set. When we initialize a set, we do not include the values in the syntax as with a dictionary. We specify only the keys.
len()
, in and not-in to test the set. It has just 3 elements, despite specifying 5 strings.in
-keyword. This returns true or false depending on whether the item exists.# Create a set. items = {"arrow", "spear", "arrow", "arrow", "rock"} # Print set. print(items) print(len(items)) # Use in-keyword. if "rock" in items: print("Rock exists") # Use not-in keywords. if "clock" not in items: print("Cloak not found"){'spear', 'arrow', 'rock'} 3 Rock exists Cloak not found
Elements can be added to a set with the add method. Here we create an empty set with the set built-in method. We then add 3 elements to the set.
# An empty set. items = set() # Add three strings. items.add("cat") items.add("dog") items.add("gerbil") print(items){'gerbil', 'dog', 'cat'}
We can create a set from a list (or other iterable
collection). Consider this example. We pass a list with six different elements in it to set()
. The duplicates are ignored.
# Create a set from this list. # ... Duplicates are ignored. numbers = set([10, 20, 20, 30, 40, 50]) print(numbers){40, 10, 20, 50, 30}
In set theory, we determine relations between sets of elements. And with the Python set type, we can compute these with built-in methods.
numbers1 = {1, 3, 5, 7} numbers2 = {1, 3} # Is subset. if numbers2.issubset(numbers1): print("Is a subset") # Is superset. if numbers1.issuperset(numbers2): print("Is a superset") # Intersection of the two sets. print(numbers1.intersection(numbers2))Is a subset Is a superset {1, 3}
Another set operation that is available is union()
. This combines two sets. Any element in either set is retained in the return value of union. But duplicates are eliminated.
# Two sets. set1 = {1, 2, 3} set2 = {6, 5, 4, 3} # Union the sets. set3 = set1.union(set2) print(set3){1, 2, 3, 4, 5, 6}
A set can be subtracted from another set. The difference()
method is used in this case. The syntax that is clearer is the best choice.
difference()
to make the operation explicit.string
"connecticut" is removed. The other strings remain.string
"connecticut" is also removed. The other two strings from "b" remain.a = {"new york", "connecticut", "new jersey"} b = {"connecticut", "pennsylvania", "maine"} # Subtract. c = a - b print(c) # Difference. c = a.difference(b) print(c) # Subtract in opposite order. c = b - a print(c) # Difference in opposite order. c = b.difference(a) print(c){'new jersey', 'new york'} {'new jersey', 'new york'} {'pennsylvania', 'maine'} {'pennsylvania', 'maine'}
We pass discard()
the value of an element we want to remove. If the element does not exist, discard will cause no error—it does nothing. Remove
, however, will cause a KeyError
.
remove()
, you may need to use the in
-operator beforehand. Discard meanwhile does not need this step.animals = {"cat", "dog", "parrot", "walrus"} print(animals) # Discard nonexistent element, nothing happens. animals.discard("zebra") print(animals) # Discard element that exists. animals.discard("cat") print(animals) # Remove element that exists. animals.remove("parrot") print(animals) # Remove causes an error if the element is not found. animals.remove("buffalo"){'walrus', 'dog', 'parrot', 'cat'} {'walrus', 'dog', 'parrot', 'cat'} {'walrus', 'dog', 'parrot'} {'walrus', 'dog'} Traceback (most recent call last): File "...", line 16, in <module> animals.remove("buffalo") KeyError: 'buffalo'
A dictionary contains only unique keys. With the set()
built-in, we can get these keys and convert them into a set.
# This dictionary contains key-value pairs. dictionary = {"cat": 1, "dog": 2, "bird": 3} print(dictionary) # This set contains just the dictionary's keys. keys = set(dictionary) print(keys){'bird': 3, 'dog': 2, 'cat': 1} {'cat', 'bird', 'dog'}
Methods such as map can be used to transform collections. The result of map()
is not a set. It is a "map object" which we can enumerate in a for
-loop.
values = {10, 20, 30} # Multiply all values in the set by 100. result = map(lambda x: x * 100, values) # Display our results. for value in result: print(value)1000 2000 3000
How does a set compare in performance to a dictionary? Logically the performance should be similar. I test how an in
-keyword test runs.
import time set1 = {"a", "b", "c", "z"} dict1 = {"a": 1, "b": 2, "c": 3, "z": 4} print(time.time()) # Version 1: use set. i = 0 while i < 10000000: a = "z" in set1 i += 1 print(time.time()) # Version 2: use dictionary. i = 0 while i < 10000000: a = "z" in dict1 i += 1 print(time.time())1346615677.741 1346615679.7 (Set = 1.959 s) 1346615681.732 (Dictionary = 2.032 s) 1346615958.731 1346615960.692 (Set = 1.961 s) 1346615962.736 (Dictionary = 2.044 s)
Sometimes, the existence of keys is our main consideration. The keys have no specific value. Here a set avoids confusion with having unused values in a dictionary.