Sentiment Analysis on Amazon's Customer Reviews

Sentiment Analysis, also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text. This is a popular way for organizations to determine and categorize opinions about a product, service or idea. It involves the use of data mining, machine learning and artificial intelligence to mine text for sentiment and subjective information.

Types of Sentiment Analysis

Fine-grained sentiment analysis provides a more precise level of polarity by breaking it down into further categories, usually very positive to very negative. This can be considered the opinion equivalent of ratings on a 5-star scale.
Emotion detection identifies specific emotions rather than positivity and negativity. Examples could include happiness, frustration, shock, anger and sadness.
Intent-based analysis recognizes actions behind a text in addition to opinion. For example, an online comment expressing frustration about changing a battery could prompt customer service to reach out to resolve that specific issue.
Aspect-based analysis gathers the specific component being positively or negatively mentioned. For example, a customer might leave a review on a product saying the battery life was too short. Then, the system will return that the negative sentiment is not about the product as a whole, but about the battery life

Application of Sentiment Analysis

Sentiment analysis tools can be used by organizations for a variety of application, including:

Identifying brand awareness, reputation and popularity at a specific moment or over time.
Tracking consumer reception of new products or features.
Evaluating the success of marketing campaign.
Pinpointing the target audience or demographics.
Collecting customer feedback from social media, websites or online forms.
Conducting market research.
Categorizing customer service requests.

Challenges with Sentiment Analysis

Challenging associated with sentiment analysis typically revolve around inaccuracies in training models. Objectivity, or comments with a neutral sentiment, tend to pose a problem for system and are often misidentified.

Sentiment can be challenging to identify when systems cannot understand the context or tone. Answers to polls or surveys questions like "nothing" or "everything" are hard to categorize when the context is not given, as they could be labeled as positive or negative depending on the question. Similarly, irony and sarcasm often cannot be explicitly trained and lead to falsely labeled sentiments.

Computer programs also have trouble when encountering emojis and irrelevant information. Special attention needs to be given to training models with emojis and neutral data so as to not improperly flag texts.

Finally, people can be contradictory in their statements. Most reviews will have both positive and negative comments, which is somewhat manageable by analyzing sentences one at a time. However, the more informal the medium, the more likely people are to combine different opinions in the same sentences and the more difficult it will be for a computer to parse.

Steps of Sentiment Analysis

Sentiment analysis steps are deeply intrinsic, comprising many different machine learning and NLP tasks and subtasks.

Step 1: Data Collection

This is one of the most important steps in the sentiment analysis process. Everything from here on will be dependent on the quality of the data that has been gathered and how it has been annotated or labelled.

API Data - Data can be uploaded through Live APIs for Social Media. A news API can help us glean information from all kinds of news publishers, while a Facebook API can allow us to take all the publicly available data we need from its platform. We can also use open source repositories like Kaggle, Amazon Reviews, or Yelp.

Manual - If we have data that we already have from a CRM tool, we can manually upload that onto the sentiment analysis API as a .csv file.

Step 2: Data Processing

The processing of the data will depend on the kind of information it has - text, image, video or audio. Repustate IQ sentiment analysis steps also include handling video content analysis with the same ease it does text analysis. Below are the sub-tasks.

Audio Transcription - The audio from the video data is transcribed through speech to text software to ensure that any video or audio file in the data is not overlooked.

Caption Overlay - If there are any captions appearing in the video, they are extracted by Repustate IQ and analyzed for any appearing entities, aspects or topics that we have identified as important.

Image Overlay - Similarly, the platform recognizes and captures any images in the video or text data through OCR (Optical Character Reader).

Logo Recognition - Repustate IQ's intelligent data scanner immediately recognizes any logos that appear in the video background. This means even videos that appear in the clothes of the presenter or say on an item like a pen, or a mug on the desk. It even picks up logos from background posters. it does it in such a way so that not even the smallest detail foes unnoticed when the platform conducts sentiment analysis of the brand.

Text Extraction - All the text is similarly recognized and extracted in the sentiment analysis process. This includes emojis and hashtags as well, which are a vital part of social media sentiment analysis. Unlike other sentiment analysis platforms, Repustate IQ ensures that emojis are never left out of the data processing because that could lead to false positives or negatives.

Step 3: Data Analysis

There are many subtasks that need to be done for this stage of the sentiment analysis process.

Training the Model - A set of dedicated, classified and labeled sentiment analysis of dataset that will be used to train the model needs to be pre-processed and manually labelled. It is this labelled data that will be used to train the model by comparing the correctly classified data with the incorrectly classified one. This will help improve the custom model that is created for a brand.

Multilingual Data - In sentiment analysis steps that include multilingual data processing, Repustate IQ has the dataset for each language individually annotated and trained. This is because the platform does not rely on translations at all since all the information can get distorted, or nuances lost due to vast differences in certain languages like Spanish and Korean. This is the reason why Repustate gives the highest accuracy score compared to other platforms.

Custom Tags - In this part of the process, custom tags for aspects and themes will be created for the data such as brand mentions, product name etc. Once the model has been trained, it will automatically segregate text based on these custom created tags.

Topic Classification - The topic classifier attaches a theme to a text. For example, this text "The dresses were awesome, and I found some really good scarves as well." will be tagged as the topic "clothes".

Sentiment Analysis - Each aspect and theme is isolated in this stage by the platform and then analyzed for the sentiment. Sentiment scores are given in the range of -1 to +1. A neutral statement may be termed as 'Zero'. This assigning of polarity is important as ultimately, even as the platform assigns the different scores to different aspects like convenience, speed, cleanliness, functionality, drinks, ambience, etc. it is the aggregate score that is calculated to know the sentiment of the audience towards the brand. So, if 3 of the 7 aspects receive a poor rating and 4 of them receive good ones, the sentiment analyzer will give an average score as the overall sentiment of the brand.

Step 4: Data Visualization

Once all the steps in the sentiment analysis process have been covered, the insights are quickly turned into actionable reports in the form of graphs and charts. These reports can then be shared within teams as well. These visual reports are really important because it is through them that we see granular, aspect-based results. For example, when we get an average score for our brand, we can filter the results in the sentiment analysis dashboard to see which aspects got how high a score and which ones got low scores. This will give us an idea as to what areas need attention more than others. Thus, at this stage of the sentiment analysis steps, we get actionable insights that we can use to decide the right course of action for our growth plans.

F1 Score

In statistical analysis of binary classification, the F1-score is a measure of a test's accuracy. The F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. It is primarily used to compare the performance of two classifiers.

F1-score Formula

N Gram

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-gram typically are collected from a text or speech corpus.

Since 1-gram is sometimes insufficient to understand the significance of certain words in a text, it is natural to consider blocks of words i.e. n-gram.

The simplest version of the n-gram model, for n>1, is the bigram model, which looks at pairs of consecutive words.

For example, the sentence "The quick brown fox jumps over the lazy dog" would have tokens like, "The quick", "quick brown", "brown fox", ….., "lazy dog".

n-gram

Bag of Words Approach

The bag-of-words approach model is a simplifying representation used in natural language processing and information retrieval. In this model, a text is represented as the bag of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model has also been used for computer vision.

The bag-of-words model is commonly used in methods of document classification where the frequency of each word is used as a feature for training a classifier.

We got 95% F1 score after applying Bag of Words approach

TF IDF Approach

TF-IDF stands for "Term Frequency - Inverse Document Frequency". This is a technique to quantify words in a set of documents. We generally compute a score for each word to signify its importance in the document and corpus. This method is a widely used technique in Information Retrieval and Text Mining.

Term Frequency (TF)

This measures the frequency of a word in a document. This highly depends on the length of the document and the generality of the word, for example, a very common word such as "was" can appear multiple times in a document. But if we take two documents with 100 words and 10,000 words respectively, there is a high probability that the common word "was" is present more in the 10,000 worded document. But we cannot say that the longer document is more important than the shorter document.

TF Formula

Document Frequency (DF)

This measures the importance of documents in a whole set of the corpus. This is very similar to Term Frequency but the only difference is that TF is frequency counter for a term t in document d, where Document Frequency is the count of occurrences of term t in the document set N. In other words, DF is the number of documents in which the word is present.

df(t) = occurrence of t in N documents

Inverse Document Frequency (IDF)

IDF is the inverse of the document frequency which measures the informativeness of term t. When we calculate IDF, it will be very low for the most occurring words such as stop words.

IDF Formula

Document Frequency - Inverse Document Frequency (TF-IDF)

By taking a multiplicative value of TF and IDF, we get the TF-IDF score. This is the basic version of TD-IDF.

TF-IDF Formula

We got 94% F1 score after applying TF-IDF Approach

Source Code

Sentiment Analysis on Amazon's Customer Reviews

Finalization

We cleaned up and improved an Amazon reviews dataset and built some classification models on these improvements to predict sentiments.

We saw that bag-of-words and TF-IDF both the approaches gave interpretable features.

Through the increment of the set of n-grams we used from 1-grams to up to 4-grams, we were able to get out logistic regression model accuracy up to 95%.

Although there are different types of pre-processing involved in the textual data, not everything has to be applied in each case.

Every NLP classification task is different, but the process to be followed is similar to what we did in this case: wrangle the data -> create features from text -> train ML models.

Conclusion

Sentiment analysis or opinion mining is a field of study that analyzes people's sentiments, attitudes, or emotions towards certain entities. This project tackles a fundamental problem of sentiment analysis, sentiment polarity categorization. Online product reviews from Amazon are selected as data used for this study. A sentiment polarity categorization process has been proposed along with detailed descriptions of each step. Experiments for both sentence-level categorization and review-level categorization have been performed.

Search This Blog

UEMK Projects Batch CS 2019 - 2023