Python difflib compare two strings

The difflib module, as the name suggests, can be used to find differences or “diff” between contents of files or other hashable Python objects. It can be also used to find a ratio that shows the extent of similarities between two objects. The usage of the difflib module and its functions can be best understood through examples. Some of them are listed below.

About Hashable Python Objects

In Python, object types whose value is not likely to change or most of the immutable object types are called hashable types. Hashable type objects have a certain fixed value assigned by Python during declaration and these values do not change during their lifetime. All hashable objects in Python have a “__hash__” method. Have a look at the code sample below:

number = 6
print (type(number))
print (number.__hash__())

word = "something"
print (type(word))
print (word.__hash__())

dictionary = {"a" : 1, "b": 2}
print (type(dictionary))
print (dictionary.__hash__())

After running the above code sample, you should get the following output:

Python difflib compare two strings

The code sample includes three Python types: an integer type object, a string type object, and a dictionary type object. The output shows that when calling the “__hash__” method, the integer type object and the string type object show a certain value while the dictionary type object throws an error as it doesn’t have a method called “__hash__”. Hence an integer type or a string type is a hashable object in Python while a dictionary type is not. You can learn more about hashable objects from .

Comparing Two Hashable Python Objects

You can compare two hashable types or sequences using the “Differ” class available in the difflib module. Have a look at the code sample below.

from difflib import Differ

line1 = "abcd"
line2 = "cdef"
d = Differ()
difference = list(d.compare(line1, line2))
print (difference)

The first statement imports the Differ class from the difflib module. Next, two string type variables are defined with some values. A new instance of the Differ class is then created as “d”. Using this instance, the “compare” method is then called to find the difference between “line1” and “line2” strings. These strings are supplied as arguments to the compare method. After running the above code sample, you should get the following output:

Python difflib compare two strings

The dashes or minus signs indicate that “line2” doesn’t have these characters. Characters without any signs or leading whitespace are common to both variables. Characters with plus sign are available in the “line2” string only. For better readability, you can use the newline character and “join” method to view line by line output:

from difflib import Differ

line1 = "abcd"
line2 = "cdef"
d = Differ()
difference = list(d.compare(line1, line2))
difference = '\n'.join(difference)
print (difference)

After running the above code sample, you should get the following output:

Python difflib compare two strings

Instead of the Differ class, you can also use the “HtmlDiff” class to produce colored output in HTML format.

from difflib import HtmlDiff

line1 = "abcd"
line2 = "cdef"
d = HtmlDiff()
difference = d.make_file(line1, line2)
print (difference)

The code sample is the same as above, except that the Differ class instance has been replaced by an instance of HtmlDiff class and instead of the compare method, you now call the “make_file” method. After running the above command, you will get some HTML output in the terminal. You can export the output to a file using the “>” symbol in bash or you can use the code sample below to export the output to a “diff.html” file from Python itself.

from difflib import HtmlDiff

line1 = "abcd"
line2 = "cdef"
d = HtmlDiff()
difference = d.make_file(line1, line2)
with open("diff.html", "w") as f:
    for line in difference.splitlines():
        print (line, file=f)

The “with open” statement in “w” mode creates a new “diff.html” file and saves the entire contents of the “difference” variable to the diff.html file. When you open the diff.html file in a browser, you should get a layout similar to this:

Python difflib compare two strings

Getting Differences Between Contents of Two Files

If you want to produce diff data from the contents of two files using the Differ.compare() method, you can use the “with open” statement and “readline” method to read the contents of files. The example below illustrates this where contents of “file1.txt” and “file2.txt” are read using “with open” statements. The “with open” statements are used to safely read data from files.

from difflib import Differ

with open ("file1.txt") as f:
    file1_lines = f.readlines()
with open ("file2.txt") as f:
    file2_lines = f.readlines()
d = Differ()
difference = list(d.compare(file1_lines, file2_lines))
difference = '\n'.join(difference)
print (difference)

The code is pretty straightforward and nearly the same as the example shown above. Assuming that “file1.txt” contains “a”, “b”, “c”, and “d” characters each on a new line and “file2.txt” contains “c”, “d”, “e”, and “f” characters each on a new line, the code sample above will produce the following output:

Python difflib compare two strings

The output is almost the same as before, “-” sign represents lines not present in the second file. The “+” sign shows lines only present in the second file. Lines without any signs or having both signs are common to both files.

Finding Similarity Ratio

You can use the “sequenceMatcher” class from the difflib module to find the similarity ratio between two Python objects. The range of the similarity ratio lies between 0 and 1 where having a value of 1 indicates exact match or maximum similarity. A value of 0 indicates totally unique objects. Have a look at the code sample below:

from difflib import SequenceMatcher
line1 = "abcd"
line2 = "cdef"
sm = SequenceMatcher(a=line1, b=line2)
print (sm.ratio())

A SequenceMatcher instance has been created with objects to be compared supplied as “a” and “b” arguments. The “ratio” method is then called upon the instance to get the similarity ratio. After running the above code sample, you should get the following output:

Python difflib compare two strings

Conclusion

The difflib module in Python can be used in a variety of ways to compare data from different hashable objects or content read from files. Its ratio method is also useful if you just want to get a similarity percentage between two objects.

How to compare 2 strings in Python?

You can compare strings in Python using the equality ( == ) and comparison ( < , > , != , <= , >= ) operators. There are no special methods to compare two strings.

How do I compare two strings in the same list in Python?

Python sort() method and == operator to compare lists We can combine Python's sort() method with the == operator to compare two lists. Python sort() method is used to sort the input lists with a purpose that if the two input lists are equal, then the elements would reside at the same index positions.

How to compare two strings in Python and return non matches?

Using Membership Operator. We can compare the list by checking if each element in the one list present in another list. ... .
Using Set Method. ... .
Using Sort Method. ... .
Return Non-Matches Elements with For Loop. ... .
Difference Between Two List. ... .
Lambda Function To Return All Unmatched Elements..

How would you confirm that 2 strings have the same identity?

How would you confirm that 2 strings have the same identity? The is operator returns True if 2 names point to the same location in memory. This is what we're referring to when we talk about identity. Don't confuse is with ==, the latter which only tests equality.