Generating high dimensional data and query sets

Sang Wook Kim, Seok Ho Yoon, Sang Cheol Lee, Junghoon Lee, Miyoung Shin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Previous researches on multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space for performance evaluation. These kinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets for fair performance evaluation of multidimensional indexes, and then propose HDDQ.Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ-Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. Using these features, users are able to control the distribution characteristics of data and query sets appropriate for their target applications.

Original languageEnglish
Title of host publicationSOFSEM 2007
Subtitle of host publicationTheory and Practice of Computer Science - 33rd Conference on Current Trends in Theory and Practice of Computer Science, Proceedings
PublisherSpringer Verlag
Pages357-366
Number of pages10
ISBN (Print)9783540695066
DOIs
StatePublished - 2007
Event33rd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2007 - Harrachov, Czech Republic
Duration: 20 Jan 200726 Jan 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4362 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference33rd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2007
Country/TerritoryCzech Republic
CityHarrachov
Period20/01/0726/01/07

Fingerprint

Dive into the research topics of 'Generating high dimensional data and query sets'. Together they form a unique fingerprint.

Cite this