CFP last date
20 May 2024
Reseach Article

Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review

by Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 88 - Number 18
Year of Publication: 2014
Authors: Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh
10.5120/15454-3994

Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh . Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review. International Journal of Computer Applications. 88, 18 ( February 2014), 23-28. DOI=10.5120/15454-3994

@article{ 10.5120/15454-3994,
author = { Yogesh W. Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, Sachin N. Deshmukh },
title = { Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review },
journal = { International Journal of Computer Applications },
issue_date = { February 2014 },
volume = { 88 },
number = { 18 },
month = { February },
year = { 2014 },
issn = { 0975-8887 },
pages = { 23-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume88/number18/15454-3994/ },
doi = { 10.5120/15454-3994 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:07:59.047008+05:30
%A Yogesh W. Wanjari
%A Dipali B. Gaikwad
%A Vivek D. Mohod
%A Sachin N. Deshmukh
%T Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 88
%N 18
%P 23-28
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web contain huge amount of information on Web sites the user can retrieve this with help of the search input query to Web databases & fetch the relevant information. Perhaps Web databases return the multiple search output records dynamically on Web browser, these search record are containing the Deep Web pages in the form of HTML pages. It is time consuming &human efforts are involved. The traditional search engine does not index the hidden Web pages from Web databases, such as (Google, Yahoo etc. ). Many existing proposed techniques have addressed the problem of how to extract efficient structure data from Deep Web. The deep web refers to the hidden database used by web sites. But the information extraction & annotation is key challenge in web mining. The information retrieval should be done automatically & arrange in a systematic way for further processing. Various methodologies like wrapper induction is been induced. The labeling is done to the extracted information as per the concept. Various types of annotators are used on the basis of the data to be annotated. In this paper survey the automatic annotation approach on the basis of different feature of text node and data units.

References
  1. Y. Lu, H. He, H. Zhao, W. Meng, C. Yu "Annotating Search Results from Web Databases", IEEE Knowledge and Data Engg". , vol. 25, March-2013.
  2. J. Wang and F. H. Lochovsky, "Data Extraction and Label Assignment for Web Databases," Proc. 12th Int'l Conf. World Wide Web (WWW), 2003.
  3. S. Mukherjee, I . V. Ramakrishnan and A. Singh, "Bootstrapping Semantic Annotation for Content-Rich HTML Documents", Proc. IEEE Int'l Conf. Data Eng. (ICDE)", 2005.
  4. Davi de Casto Reis, Paulo B. Golgher and Altigran S. da Silva, "Automatic Web News Extraction Using Tree Edit Distance", Proc. ACM World Wide Web (WWW), 2004.
  5. L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, "Automatic Annotation of Data Extracted from Large Web Sites," Proc. Sixth Int'l Workshop the Web and Databases (WebDB), 2003.
  6. Y. Lu, H. He, H. Zhao, W. Meng, and C. Yu, "Annotating Structured Data of the Deep Web," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
  7. W. Liu, X Meng and W. Meng, "ViDE: A Vision-Based Approach for Deep Web Data Extraction," IEEE Trans. Knowledge and Data Engg. , vol. 22, no. 3, pp. 447-460, March 2010.
  8. H. He, W. Meng, C. Yu and Z. Wu, "Automatic Integration of Web Interface with WISE-Intigrator," VLDB J. , vol. 13, no. 3 pp. 256-273, Sept 2004.
  9. Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis and Khaled Shaalan "A Survey of Web Information Extraction Systems" IEEE, TKDE-0475-1104. R3.
  10. J. Madhavan, D. Ko, L. Lot, V. Ganapathy, A. Rasmussen, and A. Y. Halevy, "Google's Deep Web Crawl," Proc. VLDB Endowment, vol. 1, no. 2, pp.
  11. V. Crescenzi, G. Mecca, and P. Merialdo, "RoadRunner: Towards Automatic Data Extraction from Large Web Sites," Proc. Int'l Conf. Very Large Data Bases(VLDB),pp. 109-118,2001.
Index Terms

Computer Science
Information Sciences

Keywords

Data Extraction Data annotation Annotators Text nodes Data Units and Wrapper