Research Article

Automated Multiple Related Documents Summarization via Jaccardís Coefficient

by  Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 13 - Issue 3
Published: January 2011
Authors: Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin
10.5120/1762-2415
PDF

Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin . Automated Multiple Related Documents Summarization via Jaccardís Coefficient. International Journal of Computer Applications. 13, 3 (January 2011), 12-15. DOI=10.5120/1762-2415

                        @article{ 10.5120/1762-2415,
                        author  = { Huda Yasin,Mohsin Mohammad Yasin,Farah Mohammad Yasin },
                        title   = { Automated Multiple Related Documents Summarization via Jaccardís Coefficient },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 13 },
                        number  = { 3 },
                        pages   = { 12-15 },
                        doi     = { 10.5120/1762-2415 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2011
                        %A Huda Yasin
                        %A Mohsin Mohammad Yasin
                        %A Farah Mohammad Yasin
                        %T Automated Multiple Related Documents Summarization via Jaccardís Coefficient%T 
                        %J International Journal of Computer Applications
                        %V 13
                        %N 3
                        %P 12-15
                        %R 10.5120/1762-2415
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Today, in the hasty advancement epoch of technology, allotting and gathering of information are imperative. Readers enthrall with an undersized edition of copious prolonged text documents. In this paper, we represent our approach which we used in our Automated Text Summarization System known as MDSS (Multiple Documents Summarization System). We elucidate a new fangled approach which is based on statistical (rather than semantic) factors. In contrast to single document summarization, the issues of compression, speediness, superfluous and passage opting are more decisive in multiple documents summarization. For sentence comparison, Jaccard’s coefficient is used to improve the worth and quality of the summarization. Resemblance exists between our algorithms and dynamic time warping. Our experimental domino effects indicate that it is useful and effectual to enhance the quality of multiple documents summarization via Jaccard’s coefficient. Our system MDSS is implemented in Java (jdk 1.6).

References
  • Doru Tanasa, Brigitte Trousse, "Advanced Data Preprocessing for Intersites Web Usage Mining," IEEE Intelligent Systems, vol. 19, no. 2, pp. 59-65, Mar./Apr. 2004
  • Margaret H. Dunham and S.Sridhar, 2006, Data Mining (Introductory and Advanced Topics). Pearson Education, chapter 1.
  • Luhn. H.P. ‚ÄúThe Automatic Creation of Literature Abstracts‚Äù. IBM Journal of Research and Development, Vol. 2, No. 2, pp. 159-165, April 1958.
  • Tsutomu HIRAO, Takahiro FUKUSIMA, Manabu OKUMURA, Chikashi NOBATA. ‚ÄúCorpus and Evaluation Measures for Multiple Documents Summarization with Multiple Sources‚Äù.
  • Jade Goldstein, Vibhu Mittal, Jaime Carbonell and Mark Kantrowitz., Multi-Document Summarization by Sentence Extraction.
  • E. Qwiener, J.O. Pederson, and A.S.Weigned, ‚ÄúA neural network approach to topic spotting‚Äù, in Proceedings of the fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR‚Äô95), 1995.
  • Y.Yang and C.G.Chutte, ‚ÄúAn example-based mapping method for text categorization and retrieval‚Äù, ACM Transaction on Information Systems (TOIS), 12(3):252-277, 1994.
  • Joachims, T., ‚ÄúText Categorization with Support Vector Machines: Learning with Many Relevant Features‚Äù, in European Conference on Machine Learning (ECML), 1998.
  • Mani, I., Automatic Text Summarization. John Benjamins Publishing Company, (2000-01).
  • Mani, I. and Bloedorn, E., Multi-document Summarization by Graph Search and Matching 1997.
  • Witold Pedrycz, Knowledge based clustering from data to information granules.
  • Michael J. A. Berry, Gordon S. Linoff, Data Mining Techniques (For marketing, sales, and CRM).
  • Rada Mihalcea and Paul Tarau, A Language Independent Algorithm for Single and Multiple Document Summarization, University of North Texas
  • Derong Liu, Yongcheng Wang, Chuanhan Liu, and Zhiqi Wang, Multiple Documents Summarization Based on Genetic Algorithm.
  • V. Finley Lacatusu, Steven J. Maiorano and Sanda M. Harabagiu, Multi-Document Summarization using Multiple-Sequence Alignment, Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas
  • Huan Liu, Nitin Agarwal, Robert Grossman, 2009, Modeling and Data Mining in Blogosphere.
  • Stop Words List Available at: http://www.lextek.com/manuals/onix/stopwords1.html and http://www.lextek.com/manuals/onix/stopwords2.html
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Multi-document summarization Jaccard’s coefficient sentence comparison text mining

Powered by PhDFocusTM