Category: Artificial Intelligence

Day 5 of My Learning Journey: Building a Multilingual Sentiment Analysis Model
On Day 5 of my learning journey, I dove into the fascinating world of Natural Language Processing (NLP) by building a multilingual sentiment analysis model using Python. This project was an exciting step toward understanding how machine learning can interpret human emotions from text data, even across different languages. Below, I share the key components of this project, the challenges I faced, and the lessons I learned.

Project Overview

The goal was to create a system that analyzes movie reviews and predicts whether they express a positive or negative sentiment. What made this project particularly exciting was its ability to handle reviews in multiple languages, such as English, Spanish, French, German, Japanese, and Russian, by incorporating language detection and translation.

The project was structured into five key steps:
1. Data Preparation: Loading and cleaning the IMDB dataset.
2. Model Training: Training a logistic regression model on the processed data.
3. Multilingual Testing: Adding language detection and translation to handle non-English reviews.
4. Model Evaluation: Assessing the model’s performance using accuracy and classification metrics.
5. Interactive Application: Building a simple interface for users to input reviews and get sentiment predictions.
Step-by-Step Breakdown

1. Data Preparation

I started by loading the IMDB dataset, a collection of movie reviews labeled as positive or negative. Using pandas, I read the CSV file and performed initial checks to ensure the dataset contained the expected columns (review and sentiment). To handle potential inconsistencies in column names, I implemented logic to dynamically identify relevant columns.

The text data was cleaned by:
- Converting reviews to lowercase.
- Removing punctuation using regular expressions (re).
- Transforming the text into numerical features using CountVectorizer from scikit-learn, which creates a bag-of-words representation.
The processed data (X for features, y for labels) and the vectorizer were saved using pickle for later use.

2. Model Training

For the classification task, I chose Logistic Regression due to its simplicity and effectiveness for binary classification. The dataset was split into 80% training and 20% testing sets using train_test_split. After training the model on the training data, I saved the trained model and test data for evaluation.

3. Multilingual Sentiment Analysis

To make the model multilingual, I integrated langdetect for language detection and deep_translator for translating non-English reviews into English. This allowed the model to process reviews in languages like Spanish, French, German, Japanese, and Russian. The workflow was:
- Detect the language of the input review.
- If non-English, translate it to English using Google Translate.
- Clean the text and transform it into numerical features using the saved vectorizer.
- Predict sentiment using the trained model.
4. Model Evaluation

To evaluate the model’s performance, I used the test set to calculate:
- Accuracy: The proportion of correct predictions.
- Classification Report: Precision, recall, and F1-score for both positive and negative classes.
- Confusion Matrix: To visualize true positives, true negatives, false positives, and false negatives.
The model’s performance provided insights into its strengths and areas for improvement, such as handling imbalanced data or improving translation accuracy.

5. Interactive Application

Finally, I created an interactive script that allows users to input movie reviews and receive sentiment predictions in real-time. The script uses the saved model and vectorizer to process user input, detect the language, and predict sentiment. I also tested the system with sample reviews in multiple languages to demonstrate its multilingual capabilities.

Challenges and Lessons Learned
- Data Cleaning: Ensuring consistent text preprocessing was critical. For example, removing punctuation and handling special characters improved the model’s performance.
- Multilingual Processing: Language detection occasionally failed for short or ambiguous texts, leading to a fallback to English. This highlighted the importance of robust language detection libraries.
- Model Limitations: The bag-of-words approach with CountVectorizer is simple but may miss contextual nuances. Exploring more advanced techniques like word embeddings (e.g., BERT) could enhance performance.
- Scalability: Saving and loading large datasets and models using pickle was efficient, but I learned about potential issues with pickle compatibility across Python versions.
Key Takeaways
- NLP Fundamentals: I gained hands-on experience with text preprocessing, feature extraction, and classification.
- Multilingual NLP: Integrating language detection and translation opened up possibilities for global applications.
- Evaluation Metrics: Understanding accuracy, precision, recall, and confusion matrices deepened my knowledge of model evaluation.
- Practical Application: Building an interactive script showed me how to bridge the gap between a trained model and a user-facing application.
Next Steps

Moving forward, I plan to:
- Experiment with advanced NLP models like BERT or TF-IDF for better text representation.
- Improve language detection accuracy for short texts.
- Deploy the model as a web application using frameworks like Flask or FastAPI to make it accessible to a broader audience.
Code Highlight

Below is a snippet of the interactive script for sentiment prediction: import pickle import re from utils import detect_language, translate_to_english

Load trained model and vectorizer
```
with open('trained_model.pkl', 'rb') as f:
model = pickle.load(f)
with open('vectorizer.pkl', 'rb') as f:
vectorizer = pickle.load(f)

def clean_text(text):
text = text.lower()
text = re.sub(r'[^\w\s]', '', text)
return text

def predict_sentiment(review, vectorizer, model):
detected_lang = detect_language(review)
if detected_lang != 'en' and detected_lang != 'unknown':
review = translate_to_english(review)
review = clean_text(review)
review_vector = vectorizer.transform([review])
return model.predict(review_vector)[0]
```
Interactive loop
```
print("Sentiment Classifier: Enter a movie review to predict its sentiment.")
while True:
user_review = input("Enter your review (or type 'exit' to quit): ")
if user_review.lower() == 'exit':
break
sentiment = predict_sentiment(user_review, vectorizer, model)
print(f"Predicted Sentiment: {sentiment}\n")
```
July 9, 2025
Day 4 of Our Learning Journey: Building an AI-Powered Website Design Generator
Welcome to Day 4 of our coding adventure! Today, we tackled an exciting project: an AI-Powered Website Design Generator that turns natural language prompts into custom HTML and CSS code. This tool makes web design accessible to everyone, allowing users to describe their vision—like “a modern portfolio with a dark theme and bold buttons”—and instantly get professional-grade code. As beginners, we’re thrilled to share our progress, the requirements behind this project, and a link to the code on GitHub.

The Mission: Web Design for All

Our goal was to create a tool that empowers anyone, from entrepreneurs to hobbyists, to generate website designs without coding. By combining AI with a user-friendly interface, we’re making web design fast, intuitive, and inclusive. Day 4 was the perfect opportunity to stretch our skills and build something impactful.

The Tech Stack

We used a beginner-friendly stack to bring this project to life:
- Backend: Flask (Python) for the API, integrated with Google’s Gemini 1.5 Flash AI model to generate HTML and CSS.
- Frontend: React for a dynamic, dark-themed interface that displays live previews and generated code.
- AI: Google Gemini to process prompts and output structured JSON with HTML and CSS.
- Deployment: Backend on a custom server, frontend hosted on Vercel for seamless access.
Requirements

To build and run the Website Design Generator, here’s what we needed:

Backend Dependencies (requirements.txt)

These Python packages power the Flask backend and AI integration:
- Flask==3.0.3
- blinker==1.9.0
- click==8.2.1
- colorama==0.4.6
- itsdangerous==2.2.0
- jinja2==3.1.6
- markupsafe==3.0.2
- werkzeug==3.1.3
- idna==3.10
- python-dotenv==1.1.1
- requests==2.32.4
- urllib3==2.5.0
- charset_normalizer==3.4.2
- certifi==2025.6.15
- annotated-types==0.7.0
- cachetools==5.5.2
- google-ai-generativelanguage==0.6.15
- google-api-core==2.25.1
- google-api-python-client==2.175.0
- google-auth==2.40.3
- google-auth-httplib2==0.2.0
- google-generativeai==0.8.5
- googleapis-common-protos==1.70.0
- grpcio==1.73.1
- grpcio-status==1.71.2
- httplib2==0.22.0
- proto-plus==1.26.1
- protobuf==5.29.5
- pyasn1==0.6.1
- pyasn1-modules==0.4.2
- pydantic==2.11.7
- pydantic-core==2.33.2
- pyparsing==3.2.3
- rsa==4.9.1
- tqdm==4.67.1
- typing-extensions==4.14.1
- typing-inspection==0.4.1
- uritemplate==4.2.0
- flask-cors==6.0.1
Frontend Dependencies

The React frontend relies on:
- React (v18.x)
- TypeScript for type safety
- Vercel for deployment
- Basic HTML/CSS for styling (inline styles in the component)
Additional Requirements
- Gemini API Key: A Google API key for accessing the Gemini 1.5 Flash model, stored in a .env file.
- Node.js: For running the React frontend locally.
- Python 3.8+: For the Flask backend.
- Internet Access: For API calls to Gemini and frontend-backend communication.
What We Learned on Day 4

This project was a whirlwind of new skills:
- Backend Development: Setting up Flask routes, handling JSON, and integrating with an AI model taught us about APIs and server logic.
- Frontend Development: Building a React interface with state management and live previews showed us how to create dynamic UIs.
- AI Integration: Crafting prompts for Gemini and parsing its output helped us understand AI’s potential and quirks.
- Deployment: Hosting on Vercel and configuring CORS gave us hands-on experience with production environments.
- Problem-Solving: Handling errors, like inconsistent AI responses, pushed us to write robust code.
Check Out the Code!

We’ve shared the full project on GitHub for you to explore, run, or contribute to: Website Design Generator GitHub Repository. Try it out, experiment with prompts, and let us know what you think!

Frontend Code : https://github.com/manojtsx/Website-Component-Design-Generator-Frontend

Backend Code: https://github.com/manojtsx/Website-Component-Design-Generator-Backend

What’s Next?

On Day 5, we plan to enhance the generator with features like:
- Customizable design tweaks via sliders or additional prompts.
- Support for JavaScript to add interactivity.
- A component library for reusable elements like navbars or footers.
Let’s Connect!

Day 4 has been a game-changer, showing us how AI can transform web development. If you’re learning to code, passionate about AI, or curious about web design, let’s connect! Share your thoughts in the comments, try our tool, or reach out to collaborate. Here’s to more learning and building!

#WebDevelopment #AI #CodingJourney #Day4 #React #Flask
July 8, 2025
Day 3 of Learning OCR: Building a Modular Python OCR System
On Day 3 of my journey into Optical Character Recognition (OCR), I took a significant step forward by organizing a Python-based OCR project into a modular, scalable structure. Using powerful libraries like OpenCV and Tesseract, I built a system capable of extracting text from images with improved preprocessing techniques. Below, I’ll share the project structure, the complete code for each file, and the key lessons I learned along the way.

Why Modularize?

As my OCR project grew, I realized the importance of keeping code organized and reusable. By splitting the functionality into separate files—each handling a specific task like image loading, preprocessing, or text extraction—I made the codebase easier to maintain, debug, and extend. This approach mirrors real-world software engineering practices, making it a valuable lesson for building production-ready applications.

The Project Structure

I designed a clean folder structure to keep everything tidy:
```
universal_ocr/
├── images/               # Folder for input images
├── main.py               # Entry point of the application
├── ocr/
│   ├── __init__.py       # Makes ocr a Python package
│   ├── loader.py         # Handles image loading
│   ├── processor.py      # Manages image preprocessing
│   └── reader.py         # Performs text extraction
```
Below is the complete code for each file, along with explanations of what I learned while building them.

1. ocr/loader.py

This module handles loading images from a specified folder, filtering for common image formats like PNG, JPG, and more.
```
import os

def load_images_from_folder(folder):
    images = []
    for filename in os.listdir(folder):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff')):
            images.append(os.path.join(folder, filename))
    return images
```
Key Learning: Using os.listdir() and os.path.join() makes file handling platform-independent. The case-insensitive check with filename.lower() ensures robustness across different image formats.

2. ocr/processor.py

This module preprocesses images to improve OCR accuracy. It includes steps like converting to grayscale, resizing, applying Gaussian blur, sharpening, adaptive thresholding, and skew correction.
```
import cv2
import numpy as np

def preprocess_image(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    scale_percent = 150
    width = int(gray.shape[1] * scale_percent / 100)
    height = int(gray.shape[0] * scale_percent / 100)
    gray = cv2.resize(gray, (width, height), interpolation=cv2.INTER_LINEAR)

    blur = cv2.GaussianBlur(gray, (5,5), 0)

    kernel_sharpen = np.array([[0,-1,0], [-1,5,-1], [0,-1,0]])
    sharpened = cv2.filter2D(blur, -1, kernel_sharpen)

    thresh = cv2.adaptiveThreshold(
        sharpened, 255, 
        cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
        cv2.THRESH_BINARY, 31, 10)

    coords = np.column_stack(np.where(thresh > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle

    (h, w) = thresh.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(thresh, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)

    return rotated
```
Key Learning: Preprocessing is the backbone of effective OCR. Each step—grayscale conversion, resizing, blurring, sharpening, thresholding, and skew correction—addresses specific challenges like noise, low resolution, or text rotation. Tuning parameters like the thresholding block size (31) and constant (10) was critical for handling diverse image qualities.

3. ocr/reader.py

This module uses Tesseract to extract text from preprocessed images, leveraging the preprocessing function from processor.py.
```
import cv2
import pytesseract
from .processor import preprocess_image

# Optional: specify path if not in PATH
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

def extract_text_from_image(image_path):
    image = cv2.imread(image_path)
    preprocessed = preprocess_image(image)
    text = pytesseract.image_to_string(preprocessed)
    return text
```
Key Learning: Tesseract’s performance heavily depends on image quality, making preprocessing essential. I also learned that specifying the Tesseract executable path is necessary in some environments, like Windows, if it’s not in the system PATH.

4. main.py

The main script ties everything together, loading images and extracting text while incorporating basic error handling.
```
from ocr.loader import load_images_from_folder
from ocr.reader import extract_text_from_image

def main():
    image_folder = 'images'
    image_paths = load_images_from_folder(image_folder)
    
    for path in image_paths:
        print(f"\nExtracting from: {path}")
        try:
            text = extract_text_from_image(path)
            print("Text:\n", text.strip())
        except Exception as e:
            print("Failed to process image:", e)

if __name__ == "__main__":
    main()
```
Key Learning: A clean entry point simplifies execution and testing. Using try-except blocks ensures the program doesn’t crash on problematic images, and the if __name__ == "__main__": construct allows the script to be imported as a module without running the main logic.

Running the Project

To run the project, I placed images in the images/ folder and executed:
```
python main.py
```
The script processes each image, applies preprocessing, and prints the extracted text. This setup is simple yet flexible, allowing for future enhancements like logging or image previews.

Challenges and Takeaways
- Challenge: Finding the right preprocessing parameters was tricky. For example, adjusting the adaptive thresholding parameters (31 and 10) required experimentation to handle different image qualities effectively.
- Takeaway: Modular design not only improves code readability but also simplifies debugging and testing. By isolating preprocessing, I could refine it independently without affecting other components.
- Next Steps: I plan to add logging to track errors and successes, implement image previews for visual debugging, and explore advanced preprocessing techniques for handling noisy or multilingual documents.
Why This Matters

This project is more than a learning exercise—it’s a step toward building real-world applications like document digitization, automated form processing, or assistive technologies for visually impaired users. Mastering OCR equips me with skills that have practical impact across industries, from finance to healthcare.

Final Thoughts

Day 3 taught me the power of modular design and the critical role of preprocessing in OCR. I’m excited to continue this journey, building on this foundation to tackle more complex challenges like multilingual text extraction or optimizing for low-quality images. If you’re working on OCR or computer vision projects, I’d love to hear your experiences and tips in the comments!

#Python #OCR #ComputerVision #MachineLearning #Day3
July 8, 2025
Day 2 of Building a Spam Detector with Python: A Hands-On Machine Learning Journey
On a late Sunday evening, July 6, 2025, I dove into a new machine learning adventure: creating a spam detector using Python. Inspired by my recent sentiment classifier project, this guide walks you through classifying emails or messages as “spam” or “ham” (not spam) in a practical, beginner-friendly way. Whether you’re a data enthusiast or a professional sharpening your skills, this five-step process will help you build and deploy your own ML model. Let’s get started!

Step 1: Understanding Model Training Basics

Machine learning is all about teaching models to spot patterns. For this spam detector, the goal is to analyze text and classify it as spam or ham. The workflow includes:
- Data Preparation: Gathering and cleaning text data.
- Choosing a Model: Selecting an algorithm like Naive Bayes.
- Training: Feeding data to learn spam patterns.
- Evaluation: Measuring the model’s accuracy.
- Deployment: Applying it to classify new messages.
What I Did:
- Installed Python libraries (scikit-learn, pandas, numpy) using:pip install scikit-learn pandas numpy
- Set up a clear plan to build a spam detector, leveraging text classification techniques.
This step laid the groundwork for a streamlined ML project.

Step 2: Preparing the Data

Data powers any ML model. I used a small sample dataset of messages for simplicity, but real-world applications benefit from larger datasets like the SMS Spam Collection.

What I Did:
- Created a dataset with five messages and their labels (spam/ham).
- Cleaned text by converting to lowercase and removing punctuation.
- Transformed text into numerical features using CountVectorizer.
Here’s the code:
```
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import re

# Sample dataset
data = {
    'message': [
        'Win a free iPhone now! Click here.',
        'Meeting at 10 AM tomorrow, see you there.',
        'Get rich quick with this offer!',
        'Hi, just checking in about the project.',
        'Claim your prize today, urgent!'
    ],
    'label': ['spam', 'ham', 'spam', 'ham', 'spam']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Clean text data
def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    return text

df['message'] = df['message'].apply(clean_text)

# Convert text to numerical features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['message'])
y = df['label']

print("Feature matrix shape:", X.shape)
print("Labels:", y)
```
This code converted raw text into a numerical format, creating a feature matrix (X) and labels (y) for the model.

Step 3: Choosing and Training the Model

I chose Multinomial Naive Bayes, a go-to algorithm for text classification due to its strength with word frequency data.

What I Did:
- Split data into 80% training and 20% testing sets.
- Trained the Naive Bayes model on the training data.
Here’s the code:
```
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = MultinomialNB()
model.fit(X_train, y_train)

print("Model trained successfully!")
print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)
```
The model learned to differentiate spam from ham based on word patterns.

Step 4: Evaluating the Model

To verify performance, I tested the model on the unseen test set and calculated key metrics.

What I Did:
- Made predictions on the test set.
- Computed accuracy and generated a classification report for precision, recall, and F1-score.
Here’s the code:
```
from sklearn.metrics import accuracy_score, classification_report

# Make predictions
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Print results
print("Test Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
```
Note: The small dataset (five messages) limited the test set to one sample, reducing metric reliability. For robust results, use a larger dataset like the SMS Spam Collection.

Step 5: Deploying the Model in a Project

I created an interactive application to classify new messages, showcasing real-world applicability.

What I Did:
- Saved the model and vectorizer for reuse.
- Built a function to predict labels and an interactive script for user inputs.
Here’s the code:
```
import joblib

# Save the model and vectorizer
joblib.dump(model, 'spam_detector_model.pkl')
joblib.dump(vectorizer, 'vectorizer.pkl')

# Function to predict spam
def predict_spam(message, vectorizer, model):
    message = clean_text(message)
    message_vector = vectorizer.transform([message])
    prediction = model.predict(message_vector)
    return prediction[0]

# Interactive script
print("Spam Detector: Enter a message to check if it's spam or ham.")
while True:
    user_message = input("Enter your message (or type 'exit' to quit): ")
    if user_message.lower() == 'exit':
        break
    label = predict_spam(user_message, vectorizer, model)
    print(f"Predicted Label: {label}\n")

# Example usage
sample_messages = [
    "Win a free trip today! Click now.",
    "Let’s schedule a call for tomorrow."
]
for message in sample_messages:
    label = predict_spam(message, vectorizer, model)
    print(f"Message: {message}\nPredicted Label: {label}\n")
```
This script allows users to input messages and receive instant spam/ham predictions, demonstrating practical deployment.

What’s Next?

This spam detector is just the beginning! Here are some ideas to take it further:
- Scale Up: Use a larger dataset for improved accuracy.
- Deploy as a Web App: Create a user-friendly interface with Flask or Streamlit.
- Enhance Features: Add keyword filtering or email header analysis.
Building this project fueled my excitement for applying ML to real-world challenges. I’d love to see your ML projects or hear your thoughts—connect with me on LinkedIn and let’s keep the conversation going! 🚀

#MachineLearning #Python #DataScience #AI #SpamDetection
July 8, 2025
Day 1 of Building a Sentiment Classifier with Python: A Beginner-Friendly Machine Learning Project
Machine learning (ML) is revolutionizing data analysis and predictive modeling. In this article, I’ll guide you through creating a simple sentiment classifier to predict whether a movie review is positive or negative. This beginner-friendly project uses Python and scikit-learn, and by the end, you’ll have a functional model ready for real-world applications. Let’s explore the five steps I took to build this project from scratch!

Step 1: Understanding Model Training Basics

Model training involves teaching an ML model to identify patterns in data. For this project, we aim to classify movie reviews as “positive” or “negative” based on their text. The process includes:
- Data Preparation: Collecting and cleaning data.
- Choosing a Model: Selecting an algorithm (e.g., Logistic Regression).
- Training: Feeding data to the model to learn.
- Evaluation: Testing the model’s performance.
- Deployment: Integrating the model into a project.
What I Did:
- Installed Python (3.8+) and required libraries: scikit-learn, pandas, and numpy.
- Ran this command to set up my environment:pip install scikit-learn pandas numpy
- Chose to build a sentiment classifier for movie reviews, a classic ML task.
This step established the foundation, ensuring I had the tools and a clear objective.

Step 2: Preparing the Data

Data is the backbone of any ML model. For simplicity, I used a small sample dataset of movie reviews, but you can scale up with datasets like IMDb from Kaggle.

What I Did:
- Created a sample dataset with five reviews and their sentiments.
- Cleaned the text by converting it to lowercase and removing punctuation.
- Converted text into numerical features using a bag-of-words model with CountVectorizer.
Here’s the code:
```
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import re

# Sample dataset
data = {
    'review': [
        'This movie was amazing and I loved it!',
        'Terrible film, really boring.',
        'Great acting and wonderful story.',
        'Awful, I hated this movie.',
        'Fantastic experience, highly recommend!'
    ],
    'sentiment': ['positive', 'negative', 'positive', 'negative', 'positive']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Clean text data
def clean_text(text):
    text = text.lower()  # Convert to lowercase
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    return text

df['review'] = df['review'].apply(clean_text)

# Convert text to numerical features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['review'])  # Features
y = df['sentiment']  # Labels

print("Feature matrix shape:", X.shape)
print("Labels:", y)
```
This code transformed raw text into a numerical format, creating a feature matrix (X) and labels (y) for the model.

Step 3: Choosing and Training the Model

I selected Logistic Regression, a robust algorithm for binary classification (positive vs. negative).

What I Did:
- Split the data into training (80%) and testing (20%) sets for later evaluation.
- Trained the Logistic Regression model on the training data.
Here’s the code:
```
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

print("Model trained successfully!")
print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)
```
The model learned patterns from the training data, preparing it for predictions.

Step 4: Evaluating the Model

To assess performance, I tested the model on the unseen test set and calculated metrics like accuracy.

What I Did:
- Made predictions on the test set.
- Computed accuracy and generated a classification report for detailed metrics.
Here’s the code:
```
from sklearn.metrics import accuracy_score, classification_report

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print("Test Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
```
Note: With only five reviews, the test set had one sample, making metrics less reliable. For robust results, use a larger dataset like IMDb with thousands of reviews.

Step 5: Deploying the Model in a Project

I built a simple application to predict sentiments for new reviews, showcasing real-world usability.

What I Did:
- Saved the model and vectorizer for reuse.
- Created a function to preprocess and predict sentiment for new reviews.
- Developed an interactive script for user inputs.
Here’s the code:
```
import joblib

# Save the model and vectorizer
joblib.dump(model, 'sentiment_model.pkl')
joblib.dump(vectorizer, 'vectorizer.pkl')

# Function to predict sentiment
def predict_sentiment(review, vectorizer, model):
    review = clean_text(review)
    review_vector = vectorizer.transform([review])
    prediction = model.predict(review_vector)
    return prediction[0]

# Interactive script
print("Sentiment Classifier: Enter a movie review to predict its sentiment.")
while True:
    user_review = input("Enter your review (or type 'exit' to quit): ")
    if user_review.lower() == 'exit':
        break
    sentiment = predict_sentiment(user_review, vectorizer, model)
    print(f"Predicted Sentiment: {sentiment}\n")

# Example usage
sample_reviews = [
    "This movie was fantastic and thrilling!",
    "I didn’t enjoy the plot, it was confusing."
]
for review in sample_reviews:
    sentiment = predict_sentiment(review, vectorizer, model)
    print(f"Review: {review}\nPredicted Sentiment: {sentiment}\n")
```
This script enables users to input reviews and receive instant sentiment predictions, demonstrating practical deployment.

What’s Next?

This project is a great starting point! Here are ways to enhance it:
- Improve the Model: Experiment with algorithms like Naive Bayes or use larger datasets.
- Deploy as a Web App: Use Flask or Streamlit for a user-friendly interface.
- Explore New Domains: Apply the workflow to predict stock trends, customer feedback, or spam.
Building this sentiment classifier was an exciting way to dive into the ML workflow. Whether you’re new to ML or advancing your skills, I hope this inspires your next project!

Try It Yourself: Grab the code, install the libraries, and experiment with your dataset. Share your projects or questions on LinkedIn—let’s connect and learn together!

#MachineLearning #Python #DataScience #SentimentAnalysis #ScikitLearn
July 8, 2025