A Statistical Approach of Keyword Extraction for Efficient Retrieval

Shruti Luthra; Dinkar Arora; Kanika Mittal; Anusha Chhabra

Research Article

A Statistical Approach of Keyword Extraction for Efficient Retrieval

by Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 168 - Issue 7

Published: Jun 2017

Authors: Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra

10.5120/ijca2017914443

PDF

Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra . A Statistical Approach of Keyword Extraction for Efficient Retrieval. International Journal of Computer Applications. 168, 7 (Jun 2017), 31-36. DOI=10.5120/ijca2017914443

                        @article{ 10.5120/ijca2017914443,
                        author  = { Shruti Luthra,Dinkar Arora,Kanika Mittal,Anusha Chhabra },
                        title   = { A Statistical Approach of Keyword Extraction for Efficient Retrieval },
                        journal = { International Journal of Computer Applications },
                        year    = { 2017 },
                        volume  = { 168 },
                        number  = { 7 },
                        pages   = { 31-36 },
                        doi     = { 10.5120/ijca2017914443 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2017
                        %A Shruti Luthra
                        %A Dinkar Arora
                        %A Kanika Mittal
                        %A Anusha Chhabra
                        %T A Statistical Approach of Keyword Extraction for Efficient Retrieval%T 
                        %J International Journal of Computer Applications
                        %V 168
                        %N 7
                        %P 31-36
                        %R 10.5120/ijca2017914443
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Large number of techniques for keyword extraction have been proposed for better matching of documents with the user’s query but most of them deal with tf-idf to find the weight age of query terms in the entire document but this can result in improper result as if a term has a low term frequency in overall document but high frequency in a certain part of the document then that term can be ignored by traditional tf-idf method. Through this paper, the keyword extraction is improved using a hybrid technique in which the entire document is split into multiple domains using a master keyword and the frequency of all unique words is found in every domain . The words having high frequency are selected as candidate keywords and the final selection is made on the basis of a graph which is constructed between the keywords using Word Net. The experiments, conducted on various documents show that proposed approach outperforms other keyword extraction methodologies by enhancing document retrieval.

References

Information Retrieval Research, Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies, The Robert Gordon University, Aberdeen, Scotland. (Eds)
Important problems in information retrieval, Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742
"Keyword extraction-a review of methods and approaches" Slobodan Beliga University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia
Effective Approaches For Extraction Of Keywords Jasmeen Kaur, Vishal Gupta, ME Research Scholar Computer Science & Engineering, UIET, Panjab University Chandigarh, (UT)-160014
Understanding Inverse Document Frequency: On theoretical arguments for IDF, Stephen Robertson Microsoft Research 7 JJ Thomson Avenue Cambridge CB3 0FB UK
Keyword Extraction using graph based approaches, R. Nagarajan, Dr. S. Anu H Nair, Dr. P. Aruna, N. Puviarasan Department of Computer Science & Engineering, Annamalai University, Tamilnadu, India
Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
Cohen J. D., “Highlights: Language and Domain- independent Automatic Indexing Terms for Abstracting”,Journal of the American Society for Information Science, 46(3): 162 – 174, 1995
Mihalcea R and Tarau P, “Textrank: Bringing order into texts”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004
Jasmeen and Vishal,"Effective approaches for extraction of keywords", IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814
Hulth A., “Improved automatic keyword extraction given more linguistic knowledge”, In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP'03), 216 – 223, Sapporo, 2003
Hulth A, “Combining machine learning and natural language processing for automatic keyword extraction”,PhD Thesis, Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences, 2004
Whitney P, Engel D and Cramer N, “Mining for surprise events within text streams”. Proceedings of the NinthSIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 617–627, 2009
Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, C. G. Nevill-Manning, “Kea: Pra-ctical Automatic Keyphrase Extraction” inProc. of the 4th ACM Conf. of the Digital Libraries, Berkeley, CA, USA, 1999.
P. D. Turney, “Learning to Extract Keyphrases from Text” in Tech. Report, National Research Council of Canada, Institute for Information Technology, 1999.
T. D. Nguyen, M.-Y. Kan, „Keyphrase extraction in scientific publications“ in Proc. of ICADL 2007, pp. 317-326, 2007.
M. Krapivin, A. Autayeu, M. Marchese, E. Blanzieri, N. Segata, “Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing” in Proc. of 12th Int. Conf. on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia, LNAI v.6102, pp. 102-111, 2010
Y. HaCohen-Kerner, “Automatic Extraction of Keywords from Abstracts” in Proc. of 7th Int. Conf. KES 2003 (LNCS v. 2773), pp, 843-849, 2003.
M. Litvak, M. Last, “Graph-based keyword extraction for single-document summarization” in ACM Workshop on Multi-source Multilingual Information Extraction and Summarization, pp.17-24, 2008.
Z. Yang, J. Lei, K. Fan, Y. Lai, “Keyword extraction by entropy difference between the intrinsic and extrinsic mode” in Physica A: Statistical Mechanics and its Applications, V. 392, I. 19, pp. 4523-4531, 2013.
Slobodan beliga, University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia,"Keyword extraction a review of method and approaches"
Y Matsuo," Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information",International Journal on Artificial Intelligence Tools c World Scientific Publishing Company
"Domain keyword extraction technique: A new weighting method based on frequency analysis" Rakhi Chakraborty ,Department of Computer Science & Engineering, Global Institute Of Management and Technology, Nadia, India
Willett, P. (2006) The Porter stemming algorithm: then and now. Program: electronic library and information systems, 40 (3). pp. 219-223.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Information Retrieval Domain Splitting Natural Language Processing Inverse Document Frequency Word Net