Simple Sentiment Analysis Tool

Prerequisites

Install the following packages using the command prompt in your IDE

Tweepy (for accessing the Twitter API)
pip install tweepy
NLTK (for natural language processing)
pip install nltk
Python-Dotenv (for managing environment variables)
pip install python-dotenv
Matplotlib (for data visualization)
pip install matplotlib
Download and store the emotions.txt file on the project directory.
'victimized': 'cheated', 'accused': 'cheated', 'acquitted': 'singled out', 'adorable': 'loved', 'adored': 'loved', 'affected': 'attracted', 'afflicted': 'sad', 'aghast': 'fearful', 'agog': 'attracted', 'agonized': 'sad', 'alarmed': 'fearful', 'amused': 'happy', 'angry': 'angry', 'anguished': 'sad', 'animated': 'happy', '...

Step 1: Setting Up the Environment

Code Snippet
from dotenv import load_dotenv import os # Load environment variables load_dotenv()

# Set up API keys consumer_key = os.getenv("CONSUMER_KEY")
consumer_secret = os.getenv("CONSUMER_SECRET")
access_token = os.getenv("ACCESS_TOKEN")
access_token_secret = os.getenv("ACCESS_TOKEN_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")

Explanation

  • dotenv:

    • The dotenv library helps load environment variables from a .env file.
    • These variables are stored securely outside the source code to protect sensitive credentials, such as API keys.
  • os.getenv():

    • This function retrieves the value of an environment variable by name.
    • Example: os.getenv("CONSUMER_KEY") fetches the CONSUMER_KEY from the .env file.
  • Purpose:

    • Keep credentials secure and prevent them from being hardcoded in the script.

Step 2: Fetching Tweets To Be Analysed By Our Tool

Code Snippet
import tweepy

# Initialize the client client = tweepy.Client( bearer_token=bearer_token, consumer_key=consumer_key, consumer_secret=consumer_secret, access_token=access_token, access_token_secret=access_token_secret, wait_on_rate_limit=True ) # Define search query search_query = "about birds lang:en" try: # Search for tweets tweets = client.search_recent_tweets( query=search_query, max_results=100, tweet_fields=['created_at', 'author_id'] ) if tweets.data: # Save tweets to file with open("tweets.txt", "w", encoding='utf-8') as file: for tweet in tweets.data: file.write(tweet.text + '\n') # Print success message print(f"Successfully saved {len(tweets.data)} tweets to tweets.txt") else: # Print error message if no tweets found print("No tweets found.") except tweepy.TweepyException as e: # Print error message if Twitter API exception occurs print(f"Twitter API Error: {str(e)}") 

Explanation

  1. Initialize the Twitter Client:

    • The tweepy.Client object is initialized with your API keys to authenticate access to the Twitter API.
    • The parameter wait_on_rate_limit=True ensures that the script handles rate limits by pausing automatically.
  2. Search for Tweets:

    • search_recent_tweets fetches tweets based on a search query ("about birds").
    • max_results=100 specifies the number of tweets to fetch (maximum allowed is 100 per request).
    • tweet_fields requests additional metadata, such as created_at and author_id.
  3. Save Tweets to a File:

    • The tweets’ text is saved in tweets.txt, enabling further processing.
  4. Error Handling:

    • If an API error occurs, the exception (TweepyException) is caught, and an error message is printed.

Step 3: Preprocessing Text

Code Snippet
import string from collections import Counter import matplotlib.pyplot as plt # Read tweets from file text = open("tweets.txt", encoding="utf-8").read()

# Convert to lowercase lower_case = text.lower()

# Remove punctuation clean_text = lower_case.translate(
    str.maketrans('', '', string.punctuation)
)

# Tokenize words tokenized_words = clean_text.split()

# Define stop words stop_words = [
    "i", "me", "my", "myself", 
    "we", "our", "ours", "ourselves", 
    "you", "your", "yours", "yourself", 
    "yourselves", "he", "him", "his", 
    "himself", "she", "her", "hers", 
    "herself", "it", "its", "itself", 
    "they", "them", "their", "theirs", 
    "themselves", "what", "which", "who", 
    "whom", "this", "that", "these", 
    "those", "am", "is", "are", 
    "was", "were", "be", "been", 
    "being", "have", "has", "had", 
    "having", "do", "does", "did", 
    "doing", "a", "an", "the", 
    "and", "but", "if", "or", 
    "because", "as", "until", "while", 
    "of", "at", "by", "for", 
    "with", "about", "against", "between", 
    "into", "through", "during", "before", 
    "after", "above", "below", "to", 
    "from", "up", "down", "in", 
    "out", "on", "off", "over", 
    "under", "again", "further", "then", 
    "once", "here", "there", "when", 
    "where", "why", "how", "all", 
    "any", "both", "each", "few", 
    "more", "most", "other", "some", 
    "such", "no", "nor", "not", 
    "only", "own", "same", "so", 
    "than", "too", "very", "s", 
    "t", "can", "will", "just", 
    "don", "should", "now"
]

# Remove stop words final_words = [word for word in tokenized_words if word not in stop_words]

Explanation

  1. Text Input:

    • Reads the saved tweets from tweets.txt.
  2. Convert to Lowercase:

    • Converts all characters to lowercase using .lower().
    • This standardizes the text for processing.
  3. Remove Punctuation:

    • str.maketrans('', '', string.punctuation) creates a translation table to remove punctuation from the text.
  4. Tokenization:

    • Splits the cleaned text into individual words using .split().
  5. Stop Words:

    • Stop words are common words (like “and,” “the,” “is”) that do not add value to sentiment analysis.
    • This step removes these words from tokenized_words.

Step 4: Mapping Emotions

Code Snippet
from collections import Counter # Map words to emotions emotion_list = [] with open("emotions.txt", "r") as file: for line in file: clear_line = line.strip().replace("'", "").replace(",", "") word, emotion = clear_line.split(":")
        if word in final_words: emotion_list.append(emotion) # Count emotions emotion_count = Counter(emotion_list) print(emotion_count)

Explanation

  • Open emotions.txt:

    • Reads the emotions.txt file containing word-to-emotion mappings (e.g., “happy:joy”).
  • Cleaning the Lines:

    • Each line is stripped of extra characters (strip()) and formatted to ensure accurate processing.
  • Word Matching:

    • If a word from final_words exists in the mapping file, its associated emotion is appended to emotion_list.
  • Emotion Count:

    • The Counter object counts the frequency of each emotion.

Step 5: Sentiment Analysis using NLTK

Code Snippet
from nltk.sentiment.vader import SentimentIntensityAnalyzer # Perform Sentiment Analysis sia = SentimentIntensityAnalyzer() sentiment = sia.polarity_scores(clean_text) print('Sentiment Analysis:', sentiment)

Explanation

  • SentimentIntensityAnalyzer:

    • Part of NLTK’s VADER tool, designed for sentiment analysis of text data.
  • Polarity Scores:

    • The polarity_scores method returns:
      • neg: Negative sentiment proportion.
      • neu: Neutral sentiment proportion.
      • pos: Positive sentiment proportion.
      • compound: Overall sentiment score (range: -1 to 1).
Example Output:
# Print sentiment analysis result print('Sentiment Analysis:', sentiment)

Step 6: Visualizing Results

Code Snippet
# Plot emotions fig, ax1 = plt.subplots()
ax1.bar(emotion_count.keys(), emotion_count.values())
fig.autofmt_xdate()
plt.savefig("emotions.png")
plt.show()

Explanation

  • Create Bar Chart:

    • plt.bar() creates a bar graph where:
      • X-axis: Emotions.
      • Y-axis: Emotion counts.
  • Format the X-axis:

    • fig.autofmt_xdate() adjusts labels for better readability.
  • Save and Display the Plot:

    • Saves the graph as emotions.png and displays it using plt.show().

Example Result of a Sentiment Analysis performed in a Speech by Mark Zuckerberg

Live Demo - Getting Tweets using X Developer API & Sentiment Analysis using NLTK

 

Leave a Reply

Your email address will not be published. Required fields are marked *