Analyzing Different Web Crawling Methods

Bhavin M. Jasani; C. K. Kumbharana

Research Article

Analyzing Different Web Crawling Methods

by Bhavin M. Jasani, C. K. Kumbharana

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 107 - Issue 5

Published: December 2014

Authors: Bhavin M. Jasani, C. K. Kumbharana

10.5120/18747-0000

PDF

Bhavin M. Jasani, C. K. Kumbharana . Analyzing Different Web Crawling Methods. International Journal of Computer Applications. 107, 5 (December 2014), 23-26. DOI=10.5120/18747-0000

                        @article{ 10.5120/18747-0000,
                        author  = { Bhavin M. Jasani,C. K. Kumbharana },
                        title   = { Analyzing Different Web Crawling Methods },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 107 },
                        number  = { 5 },
                        pages   = { 23-26 },
                        doi     = { 10.5120/18747-0000 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Bhavin M. Jasani
                        %A C. K. Kumbharana
                        %T Analyzing Different Web Crawling Methods%T 
                        %J International Journal of Computer Applications
                        %V 107
                        %N 5
                        %P 23-26
                        %R 10.5120/18747-0000
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

As we know that the no of internet users are increasing day by day at a enormous rate. To maintain the resource discovery of World Wide Web (WWW) is a crucial task in today's scenario. There are many algorithms and architectures have been introduced to make effective WWW resource discovery.

References

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proc. 23rd Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2000.
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, 1(1), 2001.
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.
Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine. Computer Networks and ISDN Systems, 1998.
S. Chakrabarti. Mining the Web. Morgan Kaufmann, 2003.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1998.
S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW2002, Hawaii, May 2002.
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks, 1999.
J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks, 1998.
B. D. Davison. Topical locality in the web. In Proc. 23rd Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2000.
P. M. E. De Bra and R. D. J. Post. Information retrieval in the World Wide Web: Making client-based searching feasible. In Proc. 1st International World Wide Web Conference, 1994.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000).
D. Eichmann. Ethical Web agents. In Second International World-Wide Web Conference, 1994.
M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalhaim, and S. Ur. The shark-search algorithm | an application: Tailored Web site mapping. In WWW7, 1998.
J. Johnson, T. Tsioutsiouliklis, and C. L. Giles. Evolving strategies for focused web crawling. In Proc. 12th Intl. Conf. on Machine Learning (ICML-2003),Washington DC, 2003.
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999.
V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings, 1994.
H. Lieberman, F. Christopher, and L. Weitzman. Exploring the Web with Reconnaissance Agents. Communications of the ACM, August 2001.
A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval,2000.
F. Menczer and R. K. Belew. Adaptive retrieval agents: Internalizing local context and scaling up to the Web. Machine Learning, 2000.
F. Menczer, G. Pant, M. Ruiz, and P. Srinivasan. Evaluating topic-driven Web crawlers. In Proc. 24th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2001.
F. Menczer, G. Pant, and P. Srinivasan. Topical web crawlers: Evaluating adaptive algorithms. To appear in ACM Trans. on Internet Technologies, 2003.
http://dollar. biz. uiowa. edu/~fil/Papers/TOIT. pdf.
G. Pant. Deriving Link-context from HTML Tag Tree. In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003.
rajagopalan. Automatic resource list compilation by analyzing hyperlink structure.
M. Porter. An algorithm for suffix stripping. Program, 1980.
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Steven S. Skiena, The Algorithm design Manual.
Ben Coppin, Artificial Intelligence Illuminated.
[Berners-Lee 1992]: Berners-Lee, T. , Cailliau, R. , Groff, J. F. and Pollermann, B. World-Wide Web: the information universe. Electronic Networking: Research, Applications and Policy.
[Bush 1945]: Bush, V. As We May Think. Atlantic Monthly, 1945.
[Coombs 1990]: Coombs, J. H. , Hypertext, Full Text, and Automatic Linking. In SIGIR, (Brussels, 1990).
[ DeRose 1999]: DeRose, S. J. and van Dam, A. Document structure and markup in the FRESS hypertext system. Markup Languages: Theory & Practice.
[Frisse 1988]: Frisse, M. E. searching for information in a hypertext medical handbook. Communications of the ACM.
[Nelson 1981]: Nelson, T. Literary Machines. Mindful Press, Sausalito, 1981.
[Nelson 1988]: Nelson, T. H. Unifying tomorrow's hypermedia. In Online Information. 12th International Online Information Meeting Learned Info, Oxford, UK, 1988.
[van Dam 1969] van Dam, A. , Carmody, S. , Gross, T. , Nelson, T. , and Rice, D. , A Hypertext Editing System for the 360. In Conference in Computer Graphics, (1969), University of Illinois.
[Van Dam 1988] van Dam, A. Hypertext '87 Keynote Address. Communications of the ACM.
Crawling the Web: Gautam Pant, Padmini Srinivasan, and Filippo Menczer, Department of Management Sciences, School of Library and Information Science
The University of Iowa, Iowa City IA 52242, USA.
WebCrawler: Finding What People Want: Brian Pinkerton
Effective Web Crawling: Carlos Castillo
www. wikipedia. com

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

WWW Web Crawling Web Tree Web Spamming Crawling Algorithms Querying.