How do you remove punctuation from a data set in python?

Recipe Objective

In the world of Content, there are many things that are present like, text, characters, special characters, and many more. Sometimes we just need the text for our ease of access and don't need any special characters or punctuation in it.So, We are going to see how to remove Punctuations from a text and only text will remain.

Master the Art of Data Cleaning in Machine Learning

Table of Contents

  • Recipe Objective
    • Step 1- Taking a simple string or text and printing it
    • Step 2 - Storing all punctuations in a Variable
    • Step 3 - Removing punctuations from the text
    • Step 4 - Removing punctuations by using re, importing re
    • Step 5 - Taking another text and printing it
    • Step 6 - Removing punctuations using re, printing updated one

Step 1- Taking a simple string or text and printing it

simple_text = "It, is better for waking up early in morning !!, than working late nights ;" print("Printing the Simple Text for our Understanding :", simple_text)

Printing the Simple Text for our Understanding : It, is better for waking up early in morning !!, than working late nights;

So from the above we can see that in simple text punctuations are there and we need to remove them. So lets see how to do it.

Step 2 - Storing all punctuations in a Variable

All_punct = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''

So in this we are taking a variable named All_punct which consist of all the Punctuations that we want to remove.

Step 3 - Removing punctuations from the text

for elements in simple_text: if elements in All_punct: simple_text = simple_text.replace(elements, "") print("Now let us see the text after removing the punctuations :", simple_text)

Now let us see the text after removing the punctuations : It is better for waking up early in morning than working late nights

Here we can see that the punctuations were present in the text are removed by using the for loop and only text is remaining without any special character or punctuation.

Step 4 - Removing punctuations by using re, importing re

import re

It is more simpler than the other method we used for removing punctuation, just need to import re which is nothing but a regex.

Step 5 - Taking another text and printing it

second_text = "why can't i live freely ?? , It's just the : way i want it, no more interference required !! by any other side ;" print("Printing the original text with punctuations :", second_text)

Printing the original text with punctuations : why can't i live freely ?? , It's just the : way i want it, no more interference required !! by any other side ;

Step 6 - Removing punctuations using re, printing updated one

remove = re.sub(r'[^\w\s]', '', second_text) print("updated text with no punctuations :", remove)

updated text with no punctuations : why cant i live freely Its just the way i want it no more interference required by any other side

So here, we can get a idea about how regex works for removing the punctuations from a text

Sometimes, we may wish to break a sentence into a list of words.

In such cases, we may first want to clean up the string and remove all the punctuation marks. Here is an example of how it is done.

Source Code

# define punctuation
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''

my_str = "Hello!!!, he said ---and went."

# To take input from the user
# my_str = input("Enter a string: ")

# remove punctuation from the string
no_punct = ""
for char in my_str:
   if char not in punctuations:
       no_punct = no_punct + char

# display the unpunctuated string
print(no_punct)

Output

Hello he said and went

In this program, we first define a string of punctuations. Then, we iterate over the provided string using a for loop.

In each iteration, we check if the character is a punctuation mark or not using the membership test. We have an empty string to which we add (concatenate) the character if it is not punctuation. Finally, we display the cleaned up string.

While doing some Python projects, we need to remove the Punctuation marks to make our code look cleaner. So, keeping this in mind, Python Pool brings you an in-depth article on removing the punctuation marks from a string, list, and file in Python.

The whole article will be divided into three parts. In the first part, we will look at the elimination of punctuation from a string. After that, we will move on to the List, and subsequently, we will see how to remove Punctuation from a file in Python. Accordingly, without wasting any time, let’s directly jump into the tutorial.

  • What is a Punctuation Mark?
  • Removing Punctuation Marks from a String in Python
  • Ways to Remove Punctuation Marks from a String in Python
  • Using a for Loop and Punctuation String
  • Using the Regex to Remove Punctuation from String in Python
  • By using the translate() method to Remove Punctuation From a String in Python
  • Using the join() Method to Remove Punctuation from String in Python
  • By Using Generator Expression
  • Removing Punctuation From a List in Python
  • How to Remove Punctuation From a File in Python
  • You Might Be Also Interested in Reading:
  • Application
  • Conclusion

What is a Punctuation Mark?

According to Google: Any one of the marks (such as a period, comma, or question mark) used to divide a piece of writing into sentences, clauses, etc., are known as Punctuation marks. Broadly speaking, there are 14 Punctuation Marks listed in English Grammar. They are the period (full stop), question mark, exclamation point/mark, comma, semicolon, colon, dash, hyphen, parentheses, brackets, braces, apostrophe, quotation marks, and ellipses. In this article, we will see how to remove these punctuation marks from our program using Python.

Removing Punctuation Marks from a String in Python

Moving to the first part of our article, we will discuss all possible ways to remove punctuation from a string in Python. At the same time, digging and researching this particular topic. I got to know about 5 ways to remove punctuation from a string. I will try my best to explain through examples and step by step walkthrough to get a clear cut idea. You will not look into other websites or video tutorials after reading this whole composition.

5 ways to Remove Punctuation from a string in Python:

  1. Using Loops and Punctuation marks string
  2. Using the Regex
  3. By using the translate() method
  4. Using the join() method 
  5. By using Generator Expression

Let’s start our journey with the above five ways to remove punctuation from a String in Python.

Using a for Loop and Punctuation String

This program will remove all punctuations out of a string. We’ll assess each part of the string using for loop. From time to time, we might want to split a sentence into a list of phrases. In these situations, we might first wish to wash up the string and eliminate all punctuation marks. Here’s a good illustration of how it’s completed.

Let’s see the working through an example:

punctuations = '''!()-[]{};:'"\,<>./[email protected]#$%^&*_~'''

inp_str = input("Enter a string: ")

no_punc = ""
for char in inp_str:
   if char not in punctuations:
       no_punc = no_punc + char

print("Punctuation Free String: ",no_punc)

Output:

Enter a string: Hi I am Karan from @python.pool
Punctuation Free String:  Hi I am Karan from pythonpool

Explanation

The above method to remove punctuation from a string in python is a simple brute way this task can be carried out. In this, we assess for the punctuations utilizing a raw string that contains punctuations, and we build string after removing those punctuations.

In this program, we first defined a string named ‘punctuations‘ consists of all punctuation marks. After that, we have taken the input from the user and stored it in ‘inp_str’. Then we iterate over the provided string using a for loop.
We check if the character is a punctuation mark or not using the membership evaluation in every iteration. We have an empty string to which we include (concatenate) the character if it’s no punctuation. Ultimately, we exhibit the cleaned-up string.

Using the Regex to Remove Punctuation from String in Python

Python gives us the regex library to manage all sorts of regular expressions and also control and manipulate the same. If you don’t know what a regular expression is let me tell you: A regular expression is a sequence of characters which specify a search pattern. Normally, these patterns are utilized by string-searching algorithms for “find” or” find and replace” operations on strings, or for input signal. It’s a strategy developed in theoretical computer science and formal language theory.

Note: We need to import re library to work with regular expression.

Regex in python comes with sub-string function and we will use that function. To remove punctuation from string in python.

Syntax of re.sub

re.sub(pattern, replacement, original_string)
  • pattern: The punctuation marks(pattern) we want to replace.
  • replacement: Pattern replacement string (mostly empty string).
  • original_string: The original string from which we need to remove punctuations(pattern).

Let’s see the working through an example:

Example to Remove Punctuation from a String in Python Using Regex

import re

my_string = "Python P$#@!*oo()&l,. is ##th$e$ Bes.t pl*ace to [email protected] P)(*y&tho.n"

op_string = re.sub(r'[^\w\s]','',my_string)

print('String with Punctuation: ', my_string)
print('String without Punctuation: ', op_string)

Output:

String with Punctuation:  Python P$#@!*oo()&l,. is ##th$e$ Bes.t pl*ace to [email protected] P)(*y&tho.n
String without Punctuation:  Python Pool is the Best place to Learn Python

Explanation

In the above example, we need to import the regex library because we are using a function that is available in the regex library. Then we have our input string with punctuations in it. And we have stored it in the variable my_string. Subsequently, with the function re.sub we have, we have removed all the punctuations. Here in the parameters of ‘re.sub’ you might be wondering what r'[^\w\s] is. So, basically, r'[^\w\s] is a pattern to select characters and numbers.

I prefer using Regular Expressions though as they easy to maintain and also easier to understand (if someone else is reading your code).

By using the translate() method to Remove Punctuation From a String in Python

The string translate method is the fastest way to remove punctuation from a string in python. The translate() function is available in the built-in string library. So, we need to import string module to use translate function.

If you don’t know what translate function do is let me explain it to you. The translate() method returns a string where some particular characters are replaced with the character outlined in a dictionary, or in a mapping table.

Let’s see the working through an example:

Example To Remove Punctuation From A String In Python Using Translate Function

import string

my_string = "H*!i I a&m [email protected]$an F)(&rom Python P$#@!*oo()&l,"

op_string = my_string.translate(str.maketrans('', '', string.punctuation))

print('String with Punctuation: ', my_string)
print('String without Punctuation: ', op_string)

Output:

String with Punctuation:  H*!i I a&m [email protected]$an F)(&rom Python P$#@!*oo()&l,
String without Punctuation:  Hi I am Karan From Python Pool

Explanation

In the above example firstly we need to import the string library. As the translate method is a part of the string module in python. After that, we have initialized our string which consists of a lot of punctuation marks. We can remove all punctuation from these values using the translate() method in the next step. How this method work is it makes a copy of a string with a specific set of values substituted.

To make this work, we’re going to use the string.punctuation as a parameter in the translate method. This method, which is part of the “string” library, gives us a list of all punctuation marks.

Using the join() Method to Remove Punctuation from String in Python

We can also use the join() method to remove punctuation from the string. If you don’t know about the join method let me briefly explain it to you. The join() method gives a flexible approach to make strings out of iterable objects. It joins each component of an iterable (for example, list, string, and tuple) with a string separator (the string on the join() method is called) and returns the concatenated string.

The syntax of the join() method is:

string.join(iterable)

The join() method takes an iterable as the parameter.
Let’s see through an example how we can remove punctuation from a string in python using the join() method.

import string

st = "This , is a sam^ple string f#^[email protected] P#@ytho&n P#o#o*~l"

exclude = set(string.punctuation)
st = ''.join(ch for ch in st if ch not in exclude)
print(st)

Output:

This  is a sample string from Python Pool

Explanation:

In the given example, we first start importing the string module. This module provides multiple sets of characters as per your need. In our case, we required all the punctuation characters and created a set of those punctuation marks. Next, we used the join method to combine all the characters by eliminating the punctuation marks in one line.

The join function can be used as a one-liner initializer for lists and strings. In this case, we used it for the sample string.

By Using Generator Expression

The last but not the least method to remove punctuation from a string in Python is by using the generator.  Generators are a simple way of creating iterators.  It returns an object (iterator) which we can iterate over (one value at a time).

def remove_punc_generator(string):
    punc = '''!()-[]{};:'"\,<>./[email protected]#$%^&*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    yield string


sample = "This is, a list! For% #Pythonpool"

sample = remove_punc_generator(sample)

print(next(sample))

Output:

This is a list For Pythonpool

Explanation:

There are multiple ways of creating a generator. Two of them are by using yield statements and () comprehension. In the given example, we’ve used the yield to create a generator object for our string.

Firstly, we start by creating a function that accepts a string and then yields the string in the final statement. The yield statement allows the function to return a generator object, further using the next() function. In our code’s last statement, we’ve used the next(sample) to get the item from the generator object.

Removing Punctuation From a List in Python

We have talked about a lot of methods to remove punctuation from a string in Python. But the string is not the only thing in python. We have Lists too. The list is one of the most popular built-in data types. So, it’s become mandatory for us to talk about such a popular datatype and how to remove punctuation from the Lists in Python.

If you guys don’t know what a list is let me briefly explain it to you: The list is a most flexible datatype available in Python. List can be written as a list of comma-separated values (items) between square brackets. Important thing about a list is that items in a list need not be of the same type.

Without wasting any time let’s directly jump to example:

Example to Remove Punctuation From a List in Python

lis = ["[email protected]!is", "i#s" , "*&a", "list!", "For%", "#Pyt#$hon.?^pool"]

def remove_punc(string):
    punc = '''!()-[]{};:'"\, <>./[email protected]#$%^&*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    return string

lis = [remove_punc(i) for i in lis]
print(lis) # cleaned list

Output:

['This', 'is', 'a', 'list', 'For', 'Pythonpool']

Explanation:

Lists are one of the most used data types in python. There are multiple ways for iterating through the list. In the above example, we’ll use list comprehension to loop through all the elements of the list.

Firstly, we start by creating a customized function that accepts a string as a parameter and removes all the string’s punctuations. The removal process is done by replacing all the punctuation marks with an empty character in the string. Then we create a sample list consisting of multiple strings and use the list comprehension method to apply remove_punc() on each of the list elements. Then finally, to check the list, print() is used.

How to Remove Punctuation From a File in Python

While doing some projects and some mathematical tasks it becomes necessary to have a clean and clear text file to work with. Which has no punctuation marks in it. So, that we can perform mathematical calculations easily.

How do you remove punctuation from a data set in python?
Original Text File with Punctuation

filename = input("Enter filename: ")


def remove_punc(string):
    punc = '''!()-[]{};:'"\, <>./[email protected]#$%^&*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    return string


try:
    with open(filename,'r',encoding="utf-8") as f:
        data = f.read()
    with open(filename,"w+",encoding="utf-8") as f:
        f.write(remove_punc(data))
    print("Removed punctuations from the file", filename)
except FileNotFoundError:
    print("File not found")

Output:

How do you remove punctuation from a data set in python?
Clean text file after removing punctuation using Python

Explanation:

Reading and Writing files is an integral part of python codes, and every coder must know how to do it. To do the same, we’ve used the open() method to read and write files.

Firstly, we declare a user input variable that asks the user to enter a filename. Next, we created a customized function to remove all the string punctuation characters. Then we read the file using an open() statement. To avoid the File Not Found error, we’ve used the try-catch method to inform the end-user that the filename is invalid. Then, we use remove_punc() to remove all the punctuation characters and rewrite the file using the open() method.

You Might Be Also Interested in Reading:

  • How to Remove Character From String Python
  • Python Remove Duplicates From List With Examples

Application

 This can have application in data preprocessing in Data Science domain and also in day-day programming. 

Conclusion

To summarize, in this post, you have learned various methods to remove punctuation marks from a string, list, and file in Python.

However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible.

Happy Pythoning!

How do you remove punctuation from a data set?

“remove punctuation in dataframe column” Code Answer's.
# Define the function to remove the punctuation..
def remove_punctuations(text):.
for punctuation in string. punctuation:.
text = text. replace(punctuation, '').
return text..
# Apply to the DF series..
df['new_column'] = df['column']. apply(remove_punctuations).

How do I remove punctuation from a csv file in Python?

By using the translate() method to Remove Punctuation From a String in Python. The string translate method is the fastest way to remove punctuation from a string in python. The translate() function is available in the built-in string library. So, we need to import string module to use translate function.

How do I get rid of punctuation in pandas?

How To Remove Punctuation From Pandas.
str.replace().
regex.sub().
and str.translate().

How do I remove a comma from a dataset in Python?

sub() function removes commas from the python string.