Skip to main content

Customer product review sentiment analysis using python (Machine Learning Project)

 This article delves into the examination of user reviews and ratings on Flipkart. These reviews serve as a valuable resource for informing others about their experiences and, importantly, provide insights into product quality and brand reputation. Through this analysis, we aim to offer users valuable information about products and suggest ways to improve product quality.

Here we will be applying Machine Learning to analyze the data and make prediction, either if a product review is positive or negative.

Before going into the analysis and code, you can download the data from this link.

Since we are using python here we will be using libraries & module like Pandas, Seaborne, Matplotlib, Scikit, NLTK etc. From the coding perspective i am using Jupyter Notebook which best for analysis and machine learning tasks.

So first we will import our required modules and libraries in jupyter.


Now if you have downloaded the dataset from the given link , import it using pandas and use the head keyword.


Since we have only two columns review and ratings and since ratings columns is the labeled columns lets check for unique ratings using unique keyword of pandas.


So we can see that each customers have given ratings out of 5, so based on our requirement we have to prepare a model which can predict if a a review is positive or negative so here we will separate positive and negative reviews add a another column names label.


After adding labels to the data model we have to preprocess data, here in this step we have to remove stopwords from reviews, cleaning, stemming, etc to get clean model for our prediction. So here is the function which will do the preprocess step.


Now lets preprocess the reviews column using the predefined function. Here we have used Tqdm library to see the progress of preprocessing. To know more about this tqdm library you can visit its documentation. 


Wo we can clearly see the positive(1) ratings are more than the negative(0) ratings.

  
No we have to convert text to vector using TF-IDF (Term Frequency Inverse Document Frequency) and split into train and test set.


About TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) is a technique used in natural language processing and text analysis to represent the importance of words in a document relative to a collection of documents (corpus). It's typically used for document-level analysis, not individual sentences. However, you can certainly apply TF-IDF to a sentence within the context of a larger document or corpus.

Here's a general process to apply TF-IDF to a sentence within a larger text:

1. Preprocessing: Tokenize the text into words or terms. You may need to perform additional preprocessing steps like lowercasing, removing punctuation, and stemming or lemmatization.

2. TF-IDF Calculation:

   - Term Frequency (TF): Calculate the term frequency of each term in the sentence. Term frequency is the number of times a term (word) appears in the sentence.

   - Document Frequency (DF): Calculate the document frequency of each term within the larger document or corpus. Document frequency is the number of documents in the corpus that contain the term.

   - Inverse Document Frequency (IDF): Compute the IDF, which is the logarithm of the total number of documents divided by the document frequency. It measures how unique a term is across the corpus.

   - TF-IDF Score: Multiply the TF by the IDF for each term in the sentence. The result is the TF-IDF score for each term in the sentence.


So here we created a vector with max feature of 500. So that we can predict the from short reviews also. 




Now we have to train our model, here i am using Random Forest classifier to prepare our model with 30 level depth if you want you can use decision tree also. With Random Forest i got an accuracy of around 90-95 percent.


NOW LETS TESTY THE MODEL ON NEW RANDOM REVIEW:

Now since our model is done we will test if the model is working fine using random data, so here is how i tested the model.




So now you can see the model is working fine with random data from customer which predicts if a review is positive about a product or not. 

Please share this and help me to grow.













Comments

Popular posts from this blog

Scrape PDF's using python

Hi, guys welcome to this blog post, i hope you guys are doing well. In this post i will discuss about how to scrape any specific text data or tables from PDF's and what kind of problems one can face while scrapping the PDF data.  The data trapped inside PDF are unstructured data and they can come from different sources like manually typed or system generated and depending on the source we have classified the PDF's into two categories  Simple or readable PDF's. Complex or scanned PDF's. Simple or readable PDF's: Simple PDF's can be of system generated or can come from data entry related sources and generally such kind of PDF's are less complicated and any kind of data can be easily extracted from such kind of PDFs.  Complex or scanned PDF's: On the other hand complex PDFs  or scanned PDFs are may come from system generated sources and generally are in scanned format and it is very difficult to handle the scanned PDFs and extracting data from it because so...

What is IoT (Internet Of Things)...???

This world has changed a lot, since the very beginning. Humans has made an significant development on this earth as compared to the other species. But it is not the thing that makes us so special on this earth, what makes us so special is the power of thinking that we have bestowed with. This is the thing that makes us so special on this planet earth. But are we humans really deserves this level of intelligence, although we have it naturally no doubt !!! For this... If we go back to the history and start digging our culture. how we survived from the great calamities, how we changed the world, how we developed our society, all the answers lies in it....!!! Biologically if we see every species on this is so well designed is to dominate the other species by making them on the top one but natures engineering is so perfect that it created a food chain, this food chain works so well that it maintains a perfect relationship between the apex ones and those species are at the bottom of the ...

All about data analysis and which programming language to choose to perform data analysis?

  What is data analysis ? Data analysis is the process of exploring, cleansing, transforming and modelling data in order to derive useful insight, supporting decision. Tools available for it ! There are two kinds of tools used in order to carry out data analysis: 1) Auto managed closed tools: These are the tools whose source code is not available, that is these are not open source. If you want to use these tools then you have to pay for them. Also, as these tools are not open source, if you want to learn these tools then you have to follow their documentation site. Though some auto managed tools have their free versions available.  Pros & Cons: Closed Source Expensive They are limited  Easy to learn Example: Tableau, Qlik View, Excel (Paid Version), Power BI (Paid Version), Zoho Analytics, SAS 2) Programming Languages: Then there are suitable programming languages which can derive the same result like auto managed closed tools.  Pros & Cons: These are open so...