Research Article

A Statistical Approach of Keyword Extraction for Efficient Retrieval

by  Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 168 - Issue 7
Published: Jun 2017
Authors: Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra
10.5120/ijca2017914443
PDF

Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra . A Statistical Approach of Keyword Extraction for Efficient Retrieval. International Journal of Computer Applications. 168, 7 (Jun 2017), 31-36. DOI=10.5120/ijca2017914443

                        @article{ 10.5120/ijca2017914443,
                        author  = { Shruti Luthra,Dinkar Arora,Kanika Mittal,Anusha Chhabra },
                        title   = { A Statistical Approach of Keyword Extraction for Efficient Retrieval },
                        journal = { International Journal of Computer Applications },
                        year    = { 2017 },
                        volume  = { 168 },
                        number  = { 7 },
                        pages   = { 31-36 },
                        doi     = { 10.5120/ijca2017914443 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2017
                        %A Shruti Luthra
                        %A Dinkar Arora
                        %A Kanika Mittal
                        %A Anusha Chhabra
                        %T A Statistical Approach of Keyword Extraction for Efficient Retrieval%T 
                        %J International Journal of Computer Applications
                        %V 168
                        %N 7
                        %P 31-36
                        %R 10.5120/ijca2017914443
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Large number of techniques for keyword extraction have been proposed for better matching of documents with the user’s query but most of them deal with tf-idf to find the weight age of query terms in the entire document but this can result in improper result as if a term has a low term frequency in overall document but high frequency in a certain part of the document then that term can be ignored by traditional tf-idf method. Through this paper, the keyword extraction is improved using a hybrid technique in which the entire document is split into multiple domains using a master keyword and the frequency of all unique words is found in every domain . The words having high frequency are selected as candidate keywords and the final selection is made on the basis of a graph which is constructed between the keywords using Word Net. The experiments, conducted on various documents show that proposed approach outperforms other keyword extraction methodologies by enhancing document retrieval.

References
  • Information Retrieval Research, Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies, The Robert Gordon University, Aberdeen, Scotland. (Eds)
  • Important problems in information retrieval, Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742
  • "Keyword extraction-a review of methods and approaches" Slobodan Beliga University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia
  • Effective Approaches For Extraction Of Keywords Jasmeen Kaur, Vishal Gupta, ME Research Scholar Computer Science & Engineering, UIET, Panjab University Chandigarh, (UT)-160014
  • Understanding Inverse Document Frequency: On theoretical arguments for IDF, Stephen Robertson Microsoft Research 7 JJ Thomson Avenue Cambridge CB3 0FB UK
  • Keyword Extraction using graph based approaches, R. Nagarajan, Dr. S. Anu H Nair, Dr. P. Aruna, N. Puviarasan Department of Computer Science & Engineering, Annamalai University, Tamilnadu, India
  • Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
  • Cohen J. D., “Highlights: Language and Domain- independent Automatic Indexing Terms for Abstracting”,Journal of the American Society for Information Science, 46(3): 162 – 174, 1995
  • Mihalcea R and Tarau P, “Textrank: Bringing order into texts”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004
  • Jasmeen and Vishal,"Effective approaches for extraction of keywords", IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814
  • Hulth A., “Improved automatic keyword extraction given more linguistic knowledge”, In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP'03), 216 – 223, Sapporo, 2003
  • Hulth A, “Combining machine learning and natural language processing for automatic keyword extraction”,PhD Thesis, Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences, 2004
  • Whitney P, Engel D and Cramer N, “Mining for surprise events within text streams”. Proceedings of the NinthSIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 617–627, 2009
  • Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
  • I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, C. G. Nevill-Manning, “Kea: Pra-ctical Automatic Keyphrase Extraction” inProc. of the 4th ACM Conf. of the Digital Libraries, Berkeley, CA, USA, 1999.
  • P. D. Turney, “Learning to Extract Keyphrases from Text” in Tech. Report, National Research Council of Canada, Institute for Information Technology, 1999.
  • T. D. Nguyen, M.-Y. Kan, „Keyphrase extraction in scientific publications“ in Proc. of ICADL 2007, pp. 317-326, 2007.
  • M. Krapivin, A. Autayeu, M. Marchese, E. Blanzieri, N. Segata, “Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing” in Proc. of 12th Int. Conf. on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia, LNAI v.6102, pp. 102-111, 2010
  • Y. HaCohen-Kerner, “Automatic Extraction of Keywords from Abstracts” in Proc. of 7th Int. Conf. KES 2003 (LNCS v. 2773), pp, 843-849, 2003.
  • M. Litvak, M. Last, “Graph-based keyword extraction for single-document summarization” in ACM Workshop on Multi-source Multilingual Information Extraction and Summarization, pp.17-24, 2008.
  • Z. Yang, J. Lei, K. Fan, Y. Lai, “Keyword extraction by entropy difference between the intrinsic and extrinsic mode” in Physica A: Statistical Mechanics and its Applications, V. 392, I. 19, pp. 4523-4531, 2013.
  • Slobodan beliga, University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia,"Keyword extraction a review of method and approaches"
  • Y Matsuo," Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information",International Journal on Artificial Intelligence Tools c World Scientific Publishing Company
  • "Domain keyword extraction technique: A new weighting method based on frequency analysis" Rakhi Chakraborty ,Department of Computer Science & Engineering, Global Institute Of Management and Technology, Nadia, India
  • Willett, P. (2006) The Porter stemming algorithm: then and now. Program: electronic library and information systems, 40 (3). pp. 219-223.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Information Retrieval Domain Splitting Natural Language Processing Inverse Document Frequency Word Net

Powered by PhDFocusTM