Prerequisites
Install the following packages using the command prompt in your IDE
Tweepy (for accessing the Twitter API)
pip install tweepy
NLTK (for natural language processing)
pip install nltk
Python-Dotenv (for managing environment variables)
pip install python-dotenv
Matplotlib (for data visualization)
pip install matplotlib
Download and store the emotions.txt file on the project directory.
'victimized': 'cheated', 'accused': 'cheated', 'acquitted': 'singled out', 'adorable': 'loved', 'adored': 'loved', 'affected': 'attracted', 'afflicted': 'sad', 'aghast': 'fearful', 'agog': 'attracted', 'agonized': 'sad', 'alarmed': 'fearful', 'amused': 'happy', 'angry': 'angry', 'anguished': 'sad', 'animated': 'happy', '...
Step 1: Setting Up the Environment
Code Snippet
from dotenv import load_dotenv import os # Load environment variables load_dotenv()
# Set up API keys consumer_key = os.getenv("CONSUMER_KEY")
consumer_secret = os.getenv("CONSUMER_SECRET")
access_token = os.getenv("ACCESS_TOKEN")
access_token_secret = os.getenv("ACCESS_TOKEN_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")
Explanation
dotenv
:- The
dotenv
library helps load environment variables from a.env
file. - These variables are stored securely outside the source code to protect sensitive credentials, such as API keys.
- The
os.getenv()
:- This function retrieves the value of an environment variable by name.
- Example:
os.getenv("CONSUMER_KEY")
fetches theCONSUMER_KEY
from the.env
file.
Purpose:
- Keep credentials secure and prevent them from being hardcoded in the script.
Step 2: Fetching Tweets To Be Analysed By Our Tool
Code Snippet
import tweepy
# Initialize the client client = tweepy.Client( bearer_token=bearer_token, consumer_key=consumer_key, consumer_secret=consumer_secret, access_token=access_token, access_token_secret=access_token_secret, wait_on_rate_limit=True ) # Define search query search_query = "about birds lang:en" try: # Search for tweets tweets = client.search_recent_tweets( query=search_query, max_results=100, tweet_fields=['created_at', 'author_id'] ) if tweets.data: # Save tweets to file with open("tweets.txt", "w", encoding='utf-8') as file: for tweet in tweets.data: file.write(tweet.text + '\n') # Print success message print(f"Successfully saved {len(tweets.data)} tweets to tweets.txt") else: # Print error message if no tweets found print("No tweets found.") except tweepy.TweepyException as e: # Print error message if Twitter API exception occurs print(f"Twitter API Error: {str(e)}")
Explanation
Initialize the Twitter Client:
- The
tweepy.Client
object is initialized with your API keys to authenticate access to the Twitter API. - The parameter
wait_on_rate_limit=True
ensures that the script handles rate limits by pausing automatically.
- The
Search for Tweets:
search_recent_tweets
fetches tweets based on a search query ("about birds"
).max_results=100
specifies the number of tweets to fetch (maximum allowed is 100 per request).tweet_fields
requests additional metadata, such ascreated_at
andauthor_id
.
Save Tweets to a File:
- The tweets’ text is saved in
tweets.txt
, enabling further processing.
- The tweets’ text is saved in
Error Handling:
- If an API error occurs, the exception (
TweepyException
) is caught, and an error message is printed.
- If an API error occurs, the exception (
Step 3: Preprocessing Text
Code Snippet
import string from collections import Counter import matplotlib.pyplot as plt # Read tweets from file text = open("tweets.txt", encoding="utf-8").read()
# Convert to lowercase lower_case = text.lower()
# Remove punctuation clean_text = lower_case.translate(
str.maketrans('', '', string.punctuation)
)
# Tokenize words tokenized_words = clean_text.split()
# Define stop words stop_words = [
"i", "me", "my", "myself",
"we", "our", "ours", "ourselves",
"you", "your", "yours", "yourself",
"yourselves", "he", "him", "his",
"himself", "she", "her", "hers",
"herself", "it", "its", "itself",
"they", "them", "their", "theirs",
"themselves", "what", "which", "who",
"whom", "this", "that", "these",
"those", "am", "is", "are",
"was", "were", "be", "been",
"being", "have", "has", "had",
"having", "do", "does", "did",
"doing", "a", "an", "the",
"and", "but", "if", "or",
"because", "as", "until", "while",
"of", "at", "by", "for",
"with", "about", "against", "between",
"into", "through", "during", "before",
"after", "above", "below", "to",
"from", "up", "down", "in",
"out", "on", "off", "over",
"under", "again", "further", "then",
"once", "here", "there", "when",
"where", "why", "how", "all",
"any", "both", "each", "few",
"more", "most", "other", "some",
"such", "no", "nor", "not",
"only", "own", "same", "so",
"than", "too", "very", "s",
"t", "can", "will", "just",
"don", "should", "now"
]
# Remove stop words final_words = [word for word in tokenized_words if word not in stop_words]
Explanation
Text Input:
- Reads the saved tweets from
tweets.txt
.
- Reads the saved tweets from
Convert to Lowercase:
- Converts all characters to lowercase using
.lower()
. - This standardizes the text for processing.
- Converts all characters to lowercase using
Remove Punctuation:
str.maketrans('', '', string.punctuation)
creates a translation table to remove punctuation from the text.
Tokenization:
- Splits the cleaned text into individual words using
.split()
.
- Splits the cleaned text into individual words using
Stop Words:
- Stop words are common words (like “and,” “the,” “is”) that do not add value to sentiment analysis.
- This step removes these words from
tokenized_words
.
Step 4: Mapping Emotions
Code Snippet
from collections import Counter # Map words to emotions emotion_list = [] with open("emotions.txt", "r") as file: for line in file: clear_line = line.strip().replace("'", "").replace(",", "") word, emotion = clear_line.split(":")
if word in final_words: emotion_list.append(emotion) # Count emotions emotion_count = Counter(emotion_list) print(emotion_count)
Explanation
Open
emotions.txt
:- Reads the
emotions.txt
file containing word-to-emotion mappings (e.g., “happy:joy”).
- Reads the
Cleaning the Lines:
- Each line is stripped of extra characters (
strip()
) and formatted to ensure accurate processing.
- Each line is stripped of extra characters (
Word Matching:
- If a word from
final_words
exists in the mapping file, its associated emotion is appended toemotion_list
.
- If a word from
Emotion Count:
- The
Counter
object counts the frequency of each emotion.
- The
Step 5: Sentiment Analysis using NLTK
Code Snippet
from nltk.sentiment.vader import SentimentIntensityAnalyzer # Perform Sentiment Analysis sia = SentimentIntensityAnalyzer() sentiment = sia.polarity_scores(clean_text) print('Sentiment Analysis:', sentiment)
Explanation
SentimentIntensityAnalyzer:
- Part of NLTK’s
VADER
tool, designed for sentiment analysis of text data.
- Part of NLTK’s
Polarity Scores:
- The
polarity_scores
method returns:neg
: Negative sentiment proportion.neu
: Neutral sentiment proportion.pos
: Positive sentiment proportion.compound
: Overall sentiment score (range: -1 to 1).
- The
Example Output:
# Print sentiment analysis result print('Sentiment Analysis:', sentiment)
Step 6: Visualizing Results
Code Snippet
# Plot emotions fig, ax1 = plt.subplots()
ax1.bar(emotion_count.keys(), emotion_count.values())
fig.autofmt_xdate()
plt.savefig("emotions.png")
plt.show()
Explanation
Create Bar Chart:
plt.bar()
creates a bar graph where:- X-axis: Emotions.
- Y-axis: Emotion counts.
Format the X-axis:
fig.autofmt_xdate()
adjusts labels for better readability.
Save and Display the Plot:
- Saves the graph as
emotions.png
and displays it usingplt.show()
.
- Saves the graph as
Example Result of a Sentiment Analysis performed in a Speech by Mark Zuckerberg
Live Demo - Getting Tweets using X Developer API & Sentiment Analysis using NLTK