Research Article

Keyword and Keyphrase Extraction Techniques: A Literature Review

by  Sifatullah Siddiqi, Aditi Sharan
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 109 - Issue 2
Published: January 2015
Authors: Sifatullah Siddiqi, Aditi Sharan
10.5120/19161-0607
PDF

Sifatullah Siddiqi, Aditi Sharan . Keyword and Keyphrase Extraction Techniques: A Literature Review. International Journal of Computer Applications. 109, 2 (January 2015), 18-23. DOI=10.5120/19161-0607

                        @article{ 10.5120/19161-0607,
                        author  = { Sifatullah Siddiqi,Aditi Sharan },
                        title   = { Keyword and Keyphrase Extraction Techniques: A Literature Review },
                        journal = { International Journal of Computer Applications },
                        year    = { 2015 },
                        volume  = { 109 },
                        number  = { 2 },
                        pages   = { 18-23 },
                        doi     = { 10.5120/19161-0607 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2015
                        %A Sifatullah Siddiqi
                        %A Aditi Sharan
                        %T Keyword and Keyphrase Extraction Techniques: A Literature Review%T 
                        %J International Journal of Computer Applications
                        %V 109
                        %N 2
                        %P 18-23
                        %R 10.5120/19161-0607
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper we present a survey of various techniques available in text mining for keyword and keyphrase extraction. Keywords and keyphrases are very useful in analyzing large amount of textual material quickly and efficiently search over the internet besides being useful for many other purposes. Keywords and keyphrases are set of representative words of a document that give high-level specification of the content for interested readers. They are used highly in the field of Computer Science especially in Information Retrieval and Natural Language Processing and can be used for index generation, query refinement, text summarization, author assistance, etc. We have also discussed some important feature selection metrics generally employed by researchers to rank candidate keywords and keyphrases according to their importance.

References
  • Feather, J. and S. P. , International encyclopedia of information and library science. London & New York: Routledge, 1996
  • Justeson, J. , Katz, S. , "Technical terminology: some linguistic properties and an algorithm for identification in text", Natural Language Engineering 1, 9-27, 1995
  • G. Salton, C. S. Yang, C. T. Yu, "A Theory of Term Importance in Automatic Text Analysis", Journal of the American society for Information Science, 26(1), 33-44, 1975.
  • J. D. Cohen, "Highlights: Language and Domain-independent Automatic Indexing Terms for Abstracting" Journal of the American Society for Information Science, 46(3): 162-174, 1995
  • M. Ortuño et al. , "Keyword detection in natural languages and DNA", Europhys. Lett. 57, 759, 2002
  • J. P. Herrera, P. A. Pury, "Statistical keyword detection in literary corpora", The European physical journal, 2008
  • P. Carpena et al. , "Level statistics of words-Finding keywords in literary texts and symbolic sequences", Physical Review E, 79, 03512(R), 2009
  • Turney P. D. , "Learning algorithms for keyphrase extraction", Information Retrieval, 2: pp 303-336, 2000
  • Frank E. , Paynter G. W. , Witten I. H. , Gutwin C. , Nevill-Manning C. G. , " Domain-specific keyphrase extraction", Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668-673. San Francisco, CA, USA, 1999
  • Song M. et al. ," KPSpotter: a flexible information gain-based keyphrase extraction system", Proceedings of the 5th ACM international workshop on Web information and data management, Pages 50 – 53, 2003
  • Hulth A. "Improved automatic keyword extraction given more linguistic knowledge", Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 216-223. Association for Computational Linguistics, Morristown, NJ, USA, 2003
  • Turney P. , "Coherent Keyphrase Extraction via Web Mining", Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 434-439, 2003
  • Tang J. et al. : Loss Minimization Based Keyword Distillation, Lecture Notes in Computer Science Volume 3007, pp 572-577, 2004
  • Yasin Uzun, "Keyword Extraction Using Naïve Bayes", Bilkent University, Computer Science Dept. , Turkey, 2005
  • Zhang K. et al. "Keyword Extraction Using Support Vector Machine", Lecture Notes in Computer Science Volume 4016, pp 85-96, 2006
  • Medelyan O. , Witten H. "Thesaurus based automatic keyphrase indexing", Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, Pages 296-297, 2006
  • Nguyen, T. D. , Kan, M. Y. , "Keyphrase extraction in scientific publications", Goh, D. H. L. , Cao, T. H. , Sfilvberg, I. , Rasmussen, E. M. (eds. ) ICADL. LNCS, vol. 4822, pp. 317-326. Springer, 2007
  • Zhang C. et al. , "Automatic Keyword Extraction from Documents Using Conditional Random Fields", Journal of Computational Information Systems 4:3 pp 1169-1180, 2008
  • Jiajia Feng et al. , "Keyword extraction based on sequential pattern mining", Proceedings of the Third International Conference on Internet Multimedia Computing and Service, pages 34-38, 2011
  • Hong B. , Zhen D. , "An Extended Keyword Extraction Method", International Conference on Applied Physics and Industrial Engineering, Physics Procedia, Volume 24, Part B, 2012, Pages 1120–1127,2012
  • Steier A. , Belew R. , "Exporting phrases: A statistical analysis of topical language", Second Symposium on Document Analysis and Information Retrieval, 1993
  • Krulwich B. , and Burkey C. , "Learning user information interests through the extraction of semantically significant phrases", AAAI 1996 Spring Symposium on Machine Learning in Information Access, AAAI Press, 1996
  • Muñoz,A. , "Compound key word generation from document databases using a hierarchical clustering ART model" Intelligent Data Analysis, 1996
  • Barker, K. , and Cornacchia, N. , "Using nounphrase heads to extract document keyphrases", Advances in Artificial Intelligence, Lecture Notes in Computer Science, volume 1822/2000, pp 40-52, 2000
  • Tomikoyo T. , Hurst M. , "A language model approach to keyphrase extraction", Proceedings of the ACL workshop on Multiword expressions: analysis, acquisition and treatment, Volume 18, Pages 33-40, 2003
  • Mihalcea, R. , and Tarau, P. , "TextRank: Bringing order into texts", Proceedings of EMNLP, pp 404-411, 2004
  • Bracewell et al. , "Multilingual single document keyword extraction for information retrieval", Natural Language Processing and Knowledge Engineering, pp. 517 – 522, 2005
  • Liu, Z. , Li, P. , Zheng, Y. , Sun, M. , "Clustering to find exemplar terms for keyphrase extraction", Proceedings of Conference on Empirical Methods in Natural Language Processing. pp. 257-266, Singapore 2009
  • Rose S. et al. , "Automatic keyword extractionfrom individual documents", Text Mining: Applications and Theory, John Wiley & Sons Ltd, 2010
  • Luit Gazendam et al. "Thesaurus Based Term Ranking for Keyword Extraction", Workshops on Database and Expert Systems Applications, pp. 49-53, 2010
  • Litvak M. et al. , "DegExt — A Language-Independent Graph-Based Keyphrase Extractor", Advances in Intelligent and Soft Computing, Volume 86, pp 121-130, 2011
  • Ali Mehri et al. , "Keyword extraction by non-extensivity measure", Physical Review E, Volume 83, Issue 5, 2011
  • Decong Li, Sujian Li, Wenjie Li, Wei Wang, Weiguang Qu, "A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network", Proceedings of the ACL 2010 Conference Short Papers, pages 296–300, 2010
  • Decong Li, Sujian Li, "Hypergraph-based inductive learning for generating implicit key phrases", ACM 978-1-4503-0637, 2011
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Keyword extraction keyphrase extraction survey feature selection weighting measures

Powered by PhDFocusTM