Automated Multiple Related Documents Summarization via Jaccardís Coefficient

Huda Yasin; Mohsin Mohammad Yasin; Farah Mohammad Yasin

Research Article

Automated Multiple Related Documents Summarization via Jaccardís Coefficient

by Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 13 - Issue 3

Published: January 2011

Authors: Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin

10.5120/1762-2415

PDF

Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin . Automated Multiple Related Documents Summarization via Jaccardís Coefficient. International Journal of Computer Applications. 13, 3 (January 2011), 12-15. DOI=10.5120/1762-2415

                        @article{ 10.5120/1762-2415,
                        author  = { Huda Yasin,Mohsin Mohammad Yasin,Farah Mohammad Yasin },
                        title   = { Automated Multiple Related Documents Summarization via Jaccardís Coefficient },
                        journal = { International Journal of Computer Applications },
                        year    = { 2011 },
                        volume  = { 13 },
                        number  = { 3 },
                        pages   = { 12-15 },
                        doi     = { 10.5120/1762-2415 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2011
                        %A Huda Yasin
                        %A Mohsin Mohammad Yasin
                        %A Farah Mohammad Yasin
                        %T Automated Multiple Related Documents Summarization via Jaccardís Coefficient%T 
                        %J International Journal of Computer Applications
                        %V 13
                        %N 3
                        %P 12-15
                        %R 10.5120/1762-2415
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Today, in the hasty advancement epoch of technology, allotting and gathering of information are imperative. Readers enthrall with an undersized edition of copious prolonged text documents. In this paper, we represent our approach which we used in our Automated Text Summarization System known as MDSS (Multiple Documents Summarization System). We elucidate a new fangled approach which is based on statistical (rather than semantic) factors. In contrast to single document summarization, the issues of compression, speediness, superfluous and passage opting are more decisive in multiple documents summarization. For sentence comparison, Jaccard‚Äôs coefficient is used to improve the worth and quality of the summarization. Resemblance exists between our algorithms and dynamic time warping. Our experimental domino effects indicate that it is useful and effectual to enhance the quality of multiple documents summarization via Jaccard‚Äôs coefficient. Our system MDSS is implemented in Java (jdk 1.6).

References

Doru Tanasa, Brigitte Trousse, "Advanced Data Preprocessing for Intersites Web Usage Mining," IEEE Intelligent Systems, vol. 19, no. 2, pp. 59-65, Mar./Apr. 2004
Margaret H. Dunham and S.Sridhar, 2006, Data Mining (Introductory and Advanced Topics). Pearson Education, chapter 1.
Luhn. H.P. ‚ÄúThe Automatic Creation of Literature Abstracts‚Äù. IBM Journal of Research and Development, Vol. 2, No. 2, pp. 159-165, April 1958.
Tsutomu HIRAO, Takahiro FUKUSIMA, Manabu OKUMURA, Chikashi NOBATA. ‚ÄúCorpus and Evaluation Measures for Multiple Documents Summarization with Multiple Sources‚Äù.
Jade Goldstein, Vibhu Mittal, Jaime Carbonell and Mark Kantrowitz., Multi-Document Summarization by Sentence Extraction.
E. Qwiener, J.O. Pederson, and A.S.Weigned, ‚ÄúA neural network approach to topic spotting‚Äù, in Proceedings of the fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR‚Äô95), 1995.
Y.Yang and C.G.Chutte, ‚ÄúAn example-based mapping method for text categorization and retrieval‚Äù, ACM Transaction on Information Systems (TOIS), 12(3):252-277, 1994.
Joachims, T., ‚ÄúText Categorization with Support Vector Machines: Learning with Many Relevant Features‚Äù, in European Conference on Machine Learning (ECML), 1998.
Mani, I., Automatic Text Summarization. John Benjamins Publishing Company, (2000-01).
Mani, I. and Bloedorn, E., Multi-document Summarization by Graph Search and Matching 1997.
Witold Pedrycz, Knowledge based clustering from data to information granules.
Michael J. A. Berry, Gordon S. Linoff, Data Mining Techniques (For marketing, sales, and CRM).
Rada Mihalcea and Paul Tarau, A Language Independent Algorithm for Single and Multiple Document Summarization, University of North Texas
Derong Liu, Yongcheng Wang, Chuanhan Liu, and Zhiqi Wang, Multiple Documents Summarization Based on Genetic Algorithm.
V. Finley Lacatusu, Steven J. Maiorano and Sanda M. Harabagiu, Multi-Document Summarization using Multiple-Sequence Alignment, Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas
Huan Liu, Nitin Agarwal, Robert Grossman, 2009, Modeling and Data Mining in Blogosphere.
Stop Words List Available at: http://www.lextek.com/manuals/onix/stopwords1.html and http://www.lextek.com/manuals/onix/stopwords2.html

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Multi-document summarization Jaccard‚Äôs coefficient sentence comparison text mining