International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 11 - Issue 7 |
Published: December 2010 |
Authors: Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan |
![]() |
Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan . Article:Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server. International Journal of Computer Applications. 11, 7 (December 2010), 34-38. DOI=10.5120/1593-2140
@article{ 10.5120/1593-2140, author = { Shekhar Mishra,Anurag Jain,Dr. A.K. Sachan }, title = { Article:Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server }, journal = { International Journal of Computer Applications }, year = { 2010 }, volume = { 11 }, number = { 7 }, pages = { 34-38 }, doi = { 10.5120/1593-2140 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2010 %A Shekhar Mishra %A Anurag Jain %A Dr. A.K. Sachan %T Article:Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server%T %J International Journal of Computer Applications %V 11 %N 7 %P 34-38 %R 10.5120/1593-2140 %I Foundation of Computer Science (FCS), NY, USA
Web crawler is used for downloading information from web. Web pages are changed without any notice. Web crawler frequently revisits websites to check updates. It is expected that 40% of present internet traffic is because of web crawling. In this paper we propose a file which maintains the list of updated URLs of web pages of web site. Format of file is based on HTML. Crawler will only visit the UPDATE File, and need not have to revisit the full website to know the updates. This scheme can easily implement on today’s system with little modification on web application and web crawler. In simulator we test proposed method; using a website of 13 pages for experiment. Experiment results shows that this scheme is very promising.