An experimental approach to detect similar web pages based on 3-levels of similarity clues

Woosung Jung, Eunjoo Lee, Chisu Wu

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

It is hard to maintain web applications due to rapid changes and the proliferation of various techniques applied to web applications. Several approaches, such as clustering or refactoring web applications, have been suggested to improve their maintainability. The similarity measure is one of the principal criteria in these approaches. Existing studies on web similarity focused on semantic or context similarity. Most of the existing clone detection techniques concentrated on general applications, not web applications. In this paper, WSIM has been suggested to measure similarity in web applications, based on the usage degree of clues and two linking directions. The similarity clues include page relations, source and target entities, and parameters. WSIM can be classified in three levels and two directions. Six kinds of WSIMs are defined, and each WSIM has its own purpose. Finally, several experiments were conducted on simulated data and real open sources to validate the proposed WSIM.

Original languageEnglish
Pages (from-to)1787-1822
Number of pages36
JournalJournal of Information Science and Engineering
Volume27
Issue number6
StatePublished - Nov 2011

Keywords

  • Clues
  • Maintainability
  • Page clone
  • Similarity
  • Web application

Fingerprint

Dive into the research topics of 'An experimental approach to detect similar web pages based on 3-levels of similarity clues'. Together they form a unique fingerprint.

Cite this