Research Article

Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network

by  Himanshu Suyal, R B Patel
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Issue 18
Published: May 2014
Authors: Himanshu Suyal, R B Patel
10.5120/16463-6194
PDF

Himanshu Suyal, R B Patel . Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications. 94, 18 (May 2014), 42-46. DOI=10.5120/16463-6194

                        @article{ 10.5120/16463-6194,
                        author  = { Himanshu Suyal,R B Patel },
                        title   = { Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 94 },
                        number  = { 18 },
                        pages   = { 42-46 },
                        doi     = { 10.5120/16463-6194 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Himanshu Suyal
                        %A R B Patel
                        %T Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network%T 
                        %J International Journal of Computer Applications
                        %V 94
                        %N 18
                        %P 42-46
                        %R 10.5120/16463-6194
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

In Micro-blogging web services such as Twitter, the user is often bombarded with tons of information and raw data, with user unable to classify it into right category. The solution to overcome this problem can be derived from automatic text classification process. Social networking websites often limit their users to put up a short text message of length 140 characters only. Hence classifying this raw data continuously on these microblogging websites is a tedious task, as one has to deal with short text. Short text messages are difficult to classify as they have lack of semantic information and they have high risk of getting misclassified. In this research paper, a methodology has been developed that incorporates preparation of semantic database and then employ it to extract the necessary classification features from the database. This prepared database is then used for binary feature extraction from the set of user tweeted database hence the process of extracting features from the available database based on the semantic database approach has been presented. The basic of this paper is mainly focused on extracting nine features and then reducing the features to seven features using logical operations. The process of reducing the features not only reduces the complexity of the written code but also saves the database memory required to save the extracted feature for master training database. The features so extracted are easier to use and operation has less complexity of generation than compared to features generated by other available algorithms like Bag-of-Words.

References
  • www. twitter. com
  • X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91{100. ACM, 2008.
  • N. Cohen. Twitteronthebarricades:Sixlesson,learned. http://www. nytimes. com/2009/06/21/weekinreview/21cohenw b. html, Pub. June 20, 2009
  • http://www. time. com/time/magazine/article/0, 9171, 1044658, 00. html
  • A. Java X. Song, T. Finin, and B. Tseng, 2007. Why we twitter: understanding microblogging usage and communities. In Process WebKDD/SNA-KDD '07 (San Jose, California, August, 2007), 56-65.
  • X. -H. Phan, L. -M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, WWW '08, pages 91-100. ACM, 2008.
  • Mehran Sahami , Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. Proceedings of the 15th international conference on World Wide Web, 2006.
  • D Bollegala, Y Matsuo, M Ishizuka. Measuring semantic similarity between words using web search engines. Proceedings of the 16th international conference on World Wide Web, 2007.
  • Ou Jin, Nathan N. Liu, Kai Zhao , Yong Yu , Qiang Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. Proceedings of the 20th ACM international conference on Information and knowledge management, 2011.
  • Mengen Chen, Xiaoming Jin, Dou Shen. Short text classification improved by learning multi-granularity topics. Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, p. 1776-1781, 2011.
  • Sankaranarayanan, J. , Samet, H. , Teitler, B. E. , Lieberman, and M. D. ,Sperling, J. TwitterStand: news in tweets. In Proc. ACM GIS'09(Seattle, Washington, Nov. 2009), 42-51.
  • Yue Lu, Qiaozhu Mei , Chengxiang Zhai. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, v. 14 n. 2, p. 178-203, 2011.
  • Q. Diao, J. Jiang, F. Zhu, and E. -P. Lim. Finding bursty topics from microblogs. in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 1, p. 536C544. 2012.
  • Yang, Lili, et al. "Combining Lexical and Semantic Features for Short Text Classification. " Procedia Computer Science 22 (2013): 78-86.
  • M. Milian. Twitter sees earth shaking activity during So Caquake. http://latimesblogs. latimes. com/technology /2008-07/twitter-earthqu. html,Pub. July 30, 2008
  • http://en. wikipedia. orgwiki/Micro-blogging
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Text classification short text Twitter semantic Bag-of-Words

Powered by PhDFocusTM