10 Essential Python Code Snippets for Text Processing

Text processing is a fundamental skill for programmers, especially those working in Natural Language Processing (NLP). Below are 10 Python code snippets for common text processing tasks, each with a brief explanation and example usage. These snippets are beginner-friendly yet powerful for real-world applications.

1. Count Words in Text

This code counts the number of words in a given text.

def count_words(text):
words = text.split()
return len(words)
# Example usage
text = "Hello, this is a sample text for processing in Python"
word_count = count_words(text)
print(f"Word count: {word_count}")

Explanation: Splits the text into words using split() and returns their count with len().
Use Case: Useful for analyzing documents or articles.
Output: Word count: 8

2. Reverse Text

This code reverses the characters in a text string.

def reverse_text(text):
return text[::-1]
# Example usage
text = "Hello World"
reversed_text = reverse_text(text)
print(f"Reversed text: {reversed_text}")

Explanation: Uses Python’s slicing [::-1] to reverse the string efficiently.
Use Case: Helpful for palindrome checks or text transformation.
Output: Reversed text: dlroW olleH

3. Search for a Word in Text

This code checks if a specific word exists in the text.

def search_word(text, word):
words = text.split()
if word in words:
return f"The word '{word}' is found in the text"
else:
return f"The word '{word}' is not found in the text"
# Example usage
text = "This is a sample text for processing"
word_to_find = "sample"
result = search_word(text, word_to_find)
print(result)

Explanation: Splits text into words and uses in to check for the word’s presence.
Use Case: Ideal for keyword search or text analysis.
Output: The word 'sample' is found in the text

4. Count Word Frequency

This code counts how often each word appears in the text.

from collections import Counter
def word_frequency(text):
words = text.split()
word_counts = Counter(words)
return dict(word_counts)
# Example usage
text = "This is a text text for processing processing"
frequencies = word_frequency(text)
for word, count in frequencies.items():
print(f"Word '{word}': {count} times")

Explanation: Uses Counter to create a dictionary of word frequencies.
Use Case: Useful for finding common words in a document.
Output:

Word 'This': 1 times
Word 'is': 1 times
Word 'a': 1 times
Word 'text': 2 times
Word 'for': 1 times
Word 'processing': 2 times

5. Remove Stop Words

This code removes common words (stop words) from the text.

def remove_stop_words(text, stop_words):
words = text.split()
filtered_words = [word for word in words if word not in stop_words]
return ' '.join(filtered_words)
# Example usage
text = "This is a sample text for processing in Python"
stop_words = ['This', 'is', 'in', 'for']
filtered_text = remove_stop_words(text, stop_words)
print(f"Text after removing stop words: {filtered_text}")

Explanation: Filters out specified stop words and joins the remaining words.
Use Case: Enhances NLP tasks by focusing on meaningful words.
Output: Text after removing stop words: a sample text processing Python

6. Extract Unique Words

This code extracts unique words from the text.

def unique_words(text):
words = text.split()
return list(set(words))
# Example usage
text = "This is a text text for processing processing"
unique = unique_words(text)
print(f"Unique words: {unique}")

Explanation: Uses set() to remove duplicates and returns a list of unique words.
Use Case: Useful for vocabulary analysis.
Output: Unique words: ['This', 'is', 'a', 'text', 'for', 'processing']

7. Convert Text to Lowercase

This code converts all characters in the text to lowercase.

def to_lowercase(text):
return text.lower()
# Example usage
text = "Hello WORLD! This is PYTHON"
lowercase_text = to_lowercase(text)
print(f"Lowercase text: {lowercase_text}")

Explanation: Uses lower() to standardize text case.
Use Case: Ensures consistency in text processing (e.g., for case-insensitive searches).
Output: Lowercase text: hello world! this is python

8. Remove Punctuation

This code removes punctuation from the text.

import string
def remove_punctuation(text):
return text.translate(str.maketrans('', '', string.punctuation))
# Example usage
text = "Hello, World! This is a sample text."
clean_text = remove_punctuation(text)
print(f"Text without punctuation: {clean_text}")

Explanation: Uses string.punctuation and translate() to remove all punctuation marks.
Use Case: Cleans text for NLP tasks or tokenization.
Output: Text without punctuation: Hello World This is a sample text

9. Count Sentences

This code counts the number of sentences in the text.

def count_sentences(text):
sentences = text.split('.')
sentences = [s.strip() for s in sentences if s.strip()]
return len(sentences)
# Example usage
text = "This is a sample text. It has multiple sentences. Let's count them!"
sentence_count = count_sentences(text)
print(f"Sentence count: {sentence_count}")

Explanation: Splits text by periods and counts non-empty sentences after stripping whitespace.
Use Case: Useful for text analysis, like summarizing document structure.
Output: Sentence count: 3

10. Replace Words

This code replaces a specific word with another in the text.

def replace_word(text, old_word, new_word):
return text.replace(old_word, new_word)
# Example usage
text = "This is a sample text for processing"
new_text = replace_word(text, "sample", "example")
print(f"Text after replacement: {new_text}")

Explanation: Uses replace() to swap one word for another.
Use Case: Useful for text editing or preprocessing (e.g., synonym replacement).
Output: Text after replacement: This is a example text for processing

Tips for Advanced Text Processing

  • Preprocessing: Combine these functions (e.g., lowercase + remove punctuation) for robust NLP pipelines.
  • Libraries: Use nltk or spacy for advanced tasks like tokenization or part-of-speech tagging.
  • Scaling: For large texts, optimize with libraries or consider parallel processing.
  • Customization: Adapt these snippets for specific needs, like handling different languages or file inputs.

These snippets provide a solid foundation for text processing and can be extended for more complex NLP tasks. Experiment with them to build powerful text analysis tools!