Skip to main content

Customer product review sentiment analysis using python (Machine Learning Project)

 This article delves into the examination of user reviews and ratings on Flipkart. These reviews serve as a valuable resource for informing others about their experiences and, importantly, provide insights into product quality and brand reputation. Through this analysis, we aim to offer users valuable information about products and suggest ways to improve product quality.

Here we will be applying Machine Learning to analyze the data and make prediction, either if a product review is positive or negative.

Before going into the analysis and code, you can download the data from this link.

Since we are using python here we will be using libraries & module like Pandas, Seaborne, Matplotlib, Scikit, NLTK etc. From the coding perspective i am using Jupyter Notebook which best for analysis and machine learning tasks.

So first we will import our required modules and libraries in jupyter.


Now if you have downloaded the dataset from the given link , import it using pandas and use the head keyword.


Since we have only two columns review and ratings and since ratings columns is the labeled columns lets check for unique ratings using unique keyword of pandas.


So we can see that each customers have given ratings out of 5, so based on our requirement we have to prepare a model which can predict if a a review is positive or negative so here we will separate positive and negative reviews add a another column names label.


After adding labels to the data model we have to preprocess data, here in this step we have to remove stopwords from reviews, cleaning, stemming, etc to get clean model for our prediction. So here is the function which will do the preprocess step.


Now lets preprocess the reviews column using the predefined function. Here we have used Tqdm library to see the progress of preprocessing. To know more about this tqdm library you can visit its documentation. 


Wo we can clearly see the positive(1) ratings are more than the negative(0) ratings.

  
No we have to convert text to vector using TF-IDF (Term Frequency Inverse Document Frequency) and split into train and test set.


About TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) is a technique used in natural language processing and text analysis to represent the importance of words in a document relative to a collection of documents (corpus). It's typically used for document-level analysis, not individual sentences. However, you can certainly apply TF-IDF to a sentence within the context of a larger document or corpus.

Here's a general process to apply TF-IDF to a sentence within a larger text:

1. Preprocessing: Tokenize the text into words or terms. You may need to perform additional preprocessing steps like lowercasing, removing punctuation, and stemming or lemmatization.

2. TF-IDF Calculation:

   - Term Frequency (TF): Calculate the term frequency of each term in the sentence. Term frequency is the number of times a term (word) appears in the sentence.

   - Document Frequency (DF): Calculate the document frequency of each term within the larger document or corpus. Document frequency is the number of documents in the corpus that contain the term.

   - Inverse Document Frequency (IDF): Compute the IDF, which is the logarithm of the total number of documents divided by the document frequency. It measures how unique a term is across the corpus.

   - TF-IDF Score: Multiply the TF by the IDF for each term in the sentence. The result is the TF-IDF score for each term in the sentence.


So here we created a vector with max feature of 500. So that we can predict the from short reviews also. 




Now we have to train our model, here i am using Random Forest classifier to prepare our model with 30 level depth if you want you can use decision tree also. With Random Forest i got an accuracy of around 90-95 percent.


NOW LETS TESTY THE MODEL ON NEW RANDOM REVIEW:

Now since our model is done we will test if the model is working fine using random data, so here is how i tested the model.




So now you can see the model is working fine with random data from customer which predicts if a review is positive about a product or not. 

Please share this and help me to grow.













Comments

Popular posts from this blog

How to Remove Dandruff: A Complete Guide

Dandruff can be an embarrassing and frustrating condition, but the good news is that it’s manageable. In this guide, we’ll explore what dandruff is, its causes, and the most effective ways to eliminate it. Whether you prefer home remedies or over-the-counter solutions, there’s something here for everyone. What is Dandruff? Dandruff is a common scalp condition characterized by flaking and itching. It occurs when the scalp sheds dead skin cells excessively, often due to dryness, sensitivity, or fungal infections. While it’s not harmful, it can be a nuisance and impact self-confidence. Causes of Dandruff Understanding the root causes of dandruff can help you choose the right treatment. Here are some common reasons: - Dry Skin: A dry scalp often leads to flaking, especially during winter months. - Sensitivity to Hair Products: Certain shampoos, conditioners, or styling products can irritate the scalp. - Fungal Infections: Malassezia, a type of yeast, thrives on oily scalps and can trigger ...

The Environmental Toll of Data Centers: Energy Consumption, Water Usage, and Carbon Emissions

Why Data Centers Are Danger To Environment ?     Data centers are critical for modern society because they serve as the backbone for modern infrastructure, to power modern business and technologies. They play crucial role to power modern internet, to host websites, applications and process customer data, storing huge volumes of data and powering e-commerce platforms. But with these great things there are some disadvantages are also related to data centers which makes them a threat to environment. Data centers helps in support cloud services, analytics, Storage, cloud computing, empowering streaming services like Amazon, Netflix, Facebook, You Tube, also AI and Machine learning rely on these data centers to process huge data to process business logics etc. But in order to do all these great tasks they need tremendous amount of energy and electricity to power networking, servers, storage equipment, cloud services and the infrastructure supporting these services. Data centers ae...

Revolutionizing Data Centers: Cutting-Edge Construction Techniques Reshaping the Digital Landscape

  In a world where a single data center can consume as much water in a day as a small city does in a year, the race to build more efficient digital fortresses is on. The humble data center, once a nondescript building humming with servers, has become a hotbed of architectural and engineering innovation. As these facilities evolve to meet the insatiable appetite for data processing and storage, they're reshaping the very foundations of construction technology. Modular Design: The Future of Data Center Architecture Gone are the days of painstakingly slow, brick-by-mortar builds. Today's data centers are rising from the ground at breakneck speeds, thanks to modular design. This isn't just a trend; it's a revolution, with up to 70% of facilities now being pieced together like high-tech Lego sets in factories before ever touching their final destination. The benefits are as stackable as the modules themselves: Speed demons: These prefab marvels sprint to completion 60% faste...