Remove Non Alphanumeric Characters Python


Non-alphanumeric characters are characters that are not letters or numbers. They include symbols, punctuation, and special characters. For example, !, @, #, $, %, ^, &, *, (, ), -, _, +, =, [, ], {, }, |, \, :, ;, ", ', <, >, ?, / are all non-alphanumeric characters.

Sometimes, we need to remove these characters from strings to clean or manipulate text data.

For example, if we want to count the number of words in a string, we need to remove punctuation marks and other non-alphanumeric characters first.

Let's learn various ways to remove non-alphanumeric characters from a string in Python.


1. Using Regular Expression

The re module in Python provides regular expression operations. You can use the re.sub() method to replace all non-alphanumeric characters with an empty string.

The re.sub() method takes three arguments, the first argument is the pattern to match, the second one is the replacement string (here empty string), and then pass the string in which you want to remove non-alphanumeric characters.

import re

string = "Hello! @a#b$c%d^e&123"

# remove non-alphanumeric characters
string = re.sub(r'[^A-Za-z0-9 ]+', '', string)

print(string)

Output

Hello abcde123

2. Using isalnum() method

The isalnum() is a Python method that returns True if all characters in the string are alphanumeric (either alphabets or numbers). If not, it returns False.

So we can use this method to check if a character is alphanumeric or not, and if it is, then we can add it to a new string.

string = "Hello! @a#b$c%d^e&123"

# remove non-alphanumeric characters
new_string = ""

for char in string:
    if char.isalnum() or char == " ":
        new_string += char

print(new_string)

Output

Hello abcde123

3. Using ASCII characters

Each character has an ASCII value associated with it. For example, the ASCII value of a is 97, b is 98, c is 99, and so on.

Here is range of ASCII values for alphabets and numbers:

To remove non-alphanumeric characters check if the ASCII value of a character is in the above range or not. If not then don't add it to the new string.

string = "Hello! @a#b$c%d^e&123"

# remove non-alphanumeric characters
new_string = ""

for char in string:
    ascii_val = ord(char)
    if (ascii_val >= 97 and ascii_val <= 122) or (ascii_val >= 65 and ascii_val <= 90) or (ascii_val >= 48 and ascii_val <= 57) or char == " ":
        new_string += char

print(new_string)

Output

Hello abcde123

4. Using isalpha() and isdigit() method

isalpha() method returns True if all characters in the string are alphabets (either lowercase or uppercase) and isdigit() method returns True if all characters in the string are digits (0-9).

We can use same logic as above to remove non-alphanumeric characters.

string = "Hello! @a#b$c%d^e&123"

# remove non-alphanumeric characters
new_string = ""

for char in string:
    if char.isalpha() or char.isdigit() or char == " ":
        new_string += char

print(new_string)

Output

Hello abcde123

5. Using maketrans() and translate() method

The makestrans() method is used to replace a character with another character and delete a set of characters from a string. It returns a mapping table to be used in translate() method.

The translate() method returns a string where each character is mapped to its corresponding character in the translation table.

Here is how you can use these methods to remove non-alphanumeric characters.

string = "Hello! @a#b$c%d^e&123"

# remove non-alphanumeric characters
translation_table = string.maketrans("", "", "!@#$%^&*()_-+={[}]|\:;"'<>?/")
new_string = string.translate(translation_table)

print(new_string)

Output

Hello abcde123

Conclusion

Now removing non-alphanumeric characters from a string should not be a problem for you. You can choose any of the above methods for this task.

Here is the speed comparison of all the above methods:

Performance of all the above methods
Performance of all the above methods

As you can see, the Method 2: Using isalnum() is the fastest method to remove non-alphanumeric characters from a string.