International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 162 - Issue 12 |
Published: Mar 2017 |
Authors: Subarno Pal, Soumadip Ghosh |
![]() |
Subarno Pal, Soumadip Ghosh . Sentiment Analysis using Averaged Histogram. International Journal of Computer Applications. 162, 12 (Mar 2017), 22-26. DOI=10.5120/ijca2017913421
@article{ 10.5120/ijca2017913421, author = { Subarno Pal,Soumadip Ghosh }, title = { Sentiment Analysis using Averaged Histogram }, journal = { International Journal of Computer Applications }, year = { 2017 }, volume = { 162 }, number = { 12 }, pages = { 22-26 }, doi = { 10.5120/ijca2017913421 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2017 %A Subarno Pal %A Soumadip Ghosh %T Sentiment Analysis using Averaged Histogram%T %J International Journal of Computer Applications %V 162 %N 12 %P 22-26 %R 10.5120/ijca2017913421 %I Foundation of Computer Science (FCS), NY, USA
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of the text is quite high as amount of reviews obtained from the Internet are huge in number. Reviews on various ‘E-commerce websites’, ‘social networks’, and ‘movie review websites’ come up huge in number regularly. These reviews on popular products help in determining the public opinion towards the product. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class. Histograms of every test elements are then matched with the averaged histograms of every class using k-Nearest Neighbor and Naïve Bayesian Classifier. Results showed on 3000 reviews a steady classification accuracy of 79-80% with the Naïve Bayesian Classifier with very little cost of computation, and increase in the number of training dataset k-Nearest Neighbor can give up to a high accuracy of 85%. This work proposed here is language independent, neither include any dictionary nor depend on the meaning of any word.