Research Article

Analyzing Different Web Crawling Methods

by  Bhavin M. Jasani, C. K. Kumbharana
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 107 - Issue 5
Published: December 2014
Authors: Bhavin M. Jasani, C. K. Kumbharana
10.5120/18747-0000
PDF

Bhavin M. Jasani, C. K. Kumbharana . Analyzing Different Web Crawling Methods. International Journal of Computer Applications. 107, 5 (December 2014), 23-26. DOI=10.5120/18747-0000

                        @article{ 10.5120/18747-0000,
                        author  = { Bhavin M. Jasani,C. K. Kumbharana },
                        title   = { Analyzing Different Web Crawling Methods },
                        journal = { International Journal of Computer Applications },
                        year    = { 2014 },
                        volume  = { 107 },
                        number  = { 5 },
                        pages   = { 23-26 },
                        doi     = { 10.5120/18747-0000 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2014
                        %A Bhavin M. Jasani
                        %A C. K. Kumbharana
                        %T Analyzing Different Web Crawling Methods%T 
                        %J International Journal of Computer Applications
                        %V 107
                        %N 5
                        %P 23-26
                        %R 10.5120/18747-0000
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

As we know that the no of internet users are increasing day by day at a enormous rate. To maintain the resource discovery of World Wide Web (WWW) is a crucial task in today's scenario. There are many algorithms and architectures have been introduced to make effective WWW resource discovery.

References
  • C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.
  • B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proc. 23rd Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2000.
  • A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, 1(1), 2001.
  • K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.
  • Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine. Computer Networks and ISDN Systems, 1998.
  • S. Chakrabarti. Mining the Web. Morgan Kaufmann, 2003.
  • S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Conference, 1998.
  • S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW2002, Hawaii, May 2002.
  • S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks, 1999.
  • J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks, 1998.
  • B. D. Davison. Topical locality in the web. In Proc. 23rd Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2000.
  • P. M. E. De Bra and R. D. J. Post. Information retrieval in the World Wide Web: Making client-based searching feasible. In Proc. 1st International World Wide Web Conference, 1994.
  • M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000).
  • D. Eichmann. Ethical Web agents. In Second International World-Wide Web Conference, 1994.
  • M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalhaim, and S. Ur. The shark-search algorithm | an application: Tailored Web site mapping. In WWW7, 1998.
  • J. Johnson, T. Tsioutsiouliklis, and C. L. Giles. Evolving strategies for focused web crawling. In Proc. 12th Intl. Conf. on Machine Learning (ICML-2003),Washington DC, 2003.
  • J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999.
  • V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings, 1994.
  • H. Lieberman, F. Christopher, and L. Weitzman. Exploring the Web with Reconnaissance Agents. Communications of the ACM, August 2001.
  • A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval,2000.
  • F. Menczer and R. K. Belew. Adaptive retrieval agents: Internalizing local context and scaling up to the Web. Machine Learning, 2000.
  • F. Menczer, G. Pant, M. Ruiz, and P. Srinivasan. Evaluating topic-driven Web crawlers. In Proc. 24th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2001.
  • F. Menczer, G. Pant, and P. Srinivasan. Topical web crawlers: Evaluating adaptive algorithms. To appear in ACM Trans. on Internet Technologies, 2003.
  • http://dollar. biz. uiowa. edu/~fil/Papers/TOIT. pdf.
  • G. Pant. Deriving Link-context from HTML Tag Tree. In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003.
  • rajagopalan. Automatic resource list compilation by analyzing hyperlink structure.
  • M. Porter. An algorithm for suffix stripping. Program, 1980.
  • G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
  • Steven S. Skiena, The Algorithm design Manual.
  • Ben Coppin, Artificial Intelligence Illuminated.
  • [Berners-Lee 1992]: Berners-Lee, T. , Cailliau, R. , Groff, J. F. and Pollermann, B. World-Wide Web: the information universe. Electronic Networking: Research, Applications and Policy.
  • [Bush 1945]: Bush, V. As We May Think. Atlantic Monthly, 1945.
  • [Coombs 1990]: Coombs, J. H. , Hypertext, Full Text, and Automatic Linking. In SIGIR, (Brussels, 1990).
  • [ DeRose 1999]: DeRose, S. J. and van Dam, A. Document structure and markup in the FRESS hypertext system. Markup Languages: Theory & Practice.
  • [Frisse 1988]: Frisse, M. E. searching for information in a hypertext medical handbook. Communications of the ACM.
  • [Nelson 1981]: Nelson, T. Literary Machines. Mindful Press, Sausalito, 1981.
  • [Nelson 1988]: Nelson, T. H. Unifying tomorrow's hypermedia. In Online Information. 12th International Online Information Meeting Learned Info, Oxford, UK, 1988.
  • [van Dam 1969] van Dam, A. , Carmody, S. , Gross, T. , Nelson, T. , and Rice, D. , A Hypertext Editing System for the 360. In Conference in Computer Graphics, (1969), University of Illinois.
  • [Van Dam 1988] van Dam, A. Hypertext '87 Keynote Address. Communications of the ACM.
  • Crawling the Web: Gautam Pant, Padmini Srinivasan, and Filippo Menczer, Department of Management Sciences, School of Library and Information Science
  • The University of Iowa, Iowa City IA 52242, USA.
  • WebCrawler: Finding What People Want: Brian Pinkerton
  • Effective Web Crawling: Carlos Castillo
  • www. wikipedia. com
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

WWW Web Crawling Web Tree Web Spamming Crawling Algorithms Querying.

Powered by PhDFocusTM