34
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Introduction
A Data Warehouse —DW— is a theme-oriented,
integrated, time variable and non-volatile data
collection in support of management’s decision-
making process. Data warehousing provides
architectures and tools for business executives to
systematically organize, understand and use the
data to make strategic decisions. Data warehousing
systems are valuable tools in today’s fast-changing
and competitive world. In recent years, many
companies have spent millions of dollars building
company-wide data warehouses. Many people feel
that with increasing competition across industries,
data warehousing is the newest indispensable
marketing strategy and a retention of customers
way by learning more about their needs (Han,
Kamber and Pei, 2012).
On the other hand, fragmentation is a distributed
database design technique that consists of dividing
each database relation in smaller fragments and
treating each fragment as an object in the database
separately, there are three alternatives for that:
horizontal, vertical and hybrid fragmentation (Ozsu
and Valduriez, 2020).
One of the main problems faced by DW designers
is fragmentation. Several studies have proposed data
mining-based horizontal fragmentation methods,
which focus on optimizing query response time
However, to the best of our knowledge there not
exists a horizontal fragmentation technique that
uses a decision tree to carry out fragmentation.
because their construction does not require any
domain knowledge or parameter setting, they can
handle multidimensional data, the learning and
simple and fast, and they have good accuracy (Han,
Kamber and Pei, 2012), and given the importance
obtaining pure partitions (subsets of tuples) in a
data set using measures such as Information Gain,
Gain Ratio and the Gini Index, the aim of this work is
to use decision trees in the DW fragmentation. This
paper presents the analysis of different decision
trees algorithms to select the best one to implement
the fragmentation method performed under version
3.9.4 of Weka, considering four evaluation metrics
(Precision, ROC Area, Recall and F-measure) for
different selected data sets, using the Star Schema
Benchmark —SSB— (Star Scheme Benchmark).
This paper is made up of the following parts: (i)
the introduction; (ii) a review of related works on
DW horizontal fragmentation; (ii) the methodology
used in this work for the analysis of decision tree
algorithms and a description of each algorithm is
given; (iii) reports the preliminary results in the
the future work.
Related Works
Cloud SDW (Spatial DW) and spatial OLAP (On-
line Analytical Processing) as a Service concepts
were presented in Costa et al. (2016). Later those
concepts were used to describe two different
hierarchy-based data partitioning techniques
for the SDW hosted in the cloud: Spatial-based
partitioning and Conventional-based partitioning.
and Ouzzif (2017), consisted of an incremental
horizontal fragmentation technique for the DW
through a web service. The goal was to automate
the implementation of incremental fragmentation
in order to optimize a new query load.
In Barkhordari and Niamanesh (2018), it was
proposed a method called Chabok, which uses two
phase Map-Reduce to solve DW problems with
big data. Chabok fragments horizontally the fact
table. If there are homogeneous nodes, the same
number of records is allocated to each Fact-Mapper
node. As part of their ongoing work on workload
-
driven partitioning (Boissier and Kurzynski, 2018),
implemented an approach called aggressive data
skipping and extended it to handle both analytical
and transactional access patterns. The authors
evaluated their approach with the workload
and data of a production system of a global 2000
company.
Likewise, Barr, Boukhalfa and Bouibede (2018)
used linear programming to solve the NP-hard
problem of determining a horizontal fragmentation
scheme in relational DW. Also, Nam, Kim and