ARTÍCULOS ORIGINALES
ISSN 2389-8186
E-ISSN 2389-8194
Vol.7, No. 2-1
Julio-diciembre de 2020
doi: https://doi.org/10.16967/23898186.667
pp. 31-43
rpe.ceipa.edu.co
* The authors are very grateful to Tecnológico Nacional de México for supporting this work. Also, this research paper was sponsored by
the CONACYT.
** Mtra. in Administrative Engineering. Tecnológico Nacional de México, Veracruz, México. E-mail: dci.nrodriguez@ito-depi.edu.mx.
ORCID: 0000-0002-8227-9476. Google Scholar: https://scholar.google.es/citations?hl=es&user=27GDu_gAAAAJ.
*** PhD in Computer Science. Tecnológico Nacional de México, Veracruz, México. E-mail: lrodriguezm@ito-depi.edu.mx.
ORCID: 0000-0002-9861-3993. Google Scholar: https://scholar.google.es/citations?user=2hZw4HAAAAAJ&hl=es.
**** PhD in Computer Science. Centro Universitario UAEM Zumpango, Zumpango, México. E-mail: alchau@uaemex.mx.
ORCID: 0000-0001-5254-0939. Google Scholar: https://scholar.google.es/citations?user=FqdMAaQAAAAJ&hl=es&oi=sra.
***** PhD of Science in the specialty of Electrical Engineering. Tecnológico Nacional de México, Veracruz, México.
E-mail: galorh@orizaba.tecnm.mx. ORCID: 0000-0003-3296-0981. Google Scholar: https://scholar.google.es/citations?hl=es&user=8Z-
gf4KwAAAAJ.
Comparative Analysis of Decision
Tree Algorithms for Data Warehouse
Fragmentation*
NIDIA RODRÍGUEZ MAZAHUA**
LISBETH RODRÍGUEZ MAZAHUA***
ASDRÚBAL LÓPEZ CHAU****
GINER ALOR HERNÁNDEZ*****
ISSN 2389-8186
E-ISSN 2389-8194
Vol.7, No. 2-1
Julio-diciembre de 2020
doi: https://doi.org/10.16967/23898186.667
COMO CITAR ESTE ARTÍCULO
How to cite this article:
Rodríguez, N. et al. (2020).
Comparative Analysis of Decision
Tree Algorithms for Data Warehouse
Fragmentation. Revista Perspectiva
Empresarial, 7(2-1), 31-43.
Recibido: 20 de agosto de 2020
Aceptado: 07 de diciembre de 2020
ABSTRACT
One of the main problems faced by Data Warehouse designers is fragmentation.
Several studies have proposed data mining-based horizontal fragmentation methods.
However, not exists a horizontal fragmentation technique that uses a decision tree. This
paper presents the analysis of dierent decision tree algorithms to select the best one to
implement the fragmentation method. Such analysis was performed under version 3.9.4
of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure)
for dierent selected data sets using the Star Schema Benchmark. The results showed that
the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was
selected because it is more ecient in building the model.
KEY WORDS
Data analysis, computer systems, databases, articial intelligence, decision
making.
Análisis comparativo de algoritmos de árboles de decisión
para la fragmentación de almacenes de datos
RESUMEN
Uno de los principales problemas a los que se enfrentan los diseñadores
de almacenes de datos es la fragmentación. Varios estudios han propuesto métodos de
fragmentación horizontal basados en la minería de datos. No obstante, no existe una
técnica de fragmentación horizontal que utilice un árbol de decisión. Este trabajo presenta
el análisis de diferentes algoritmos de árboles de decisión con el n de seleccionar el mejor
para implementar el método de fragmentación. Dicho análisis se realizó bajo la versión
3.9.4 de Weka, considerando cuatro métricas de evaluación (Precision, ROC Area, Recall
y F-measure) para diferentes conjuntos de datos seleccionados utilizando el Star Schema
Benchmark. Los resultados mostraron que los dos mejores algoritmos fueron J48 y Random
Forest en la mayoría de los casos; sin embargo se seleccionó J48 por ser más eciente en
la construcción del modelo.
PALABRAS CLAVE
análisis de datos, sistemas informáticos, bases de datos, inteligencia
articial, toma de decisiones.
33
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Análise comparativa de algoritmos de árvores de decisão
para a fragmentação de armazéns de dados
RESUMO
Um dos principais problemas aos que se enfrentam os desenhadores
de armazéns de dados é a fragmentação. Vários estudos hão proposto métodos
de fragmentação horizontal baseados na mineração de dados. Não obstante, não
existe uma técnica de fragmentação horizontal que utilize uma árvore de decisão.
Este trabalho apresenta a análise de diferentes algoritmos de árvores de decisão
com o m de selecionar o melhor para implementar o método de fragmentação. Dita
análise se realizou sob a versão 3.9.4 de Weka, considerando quatro métricas de
avaliação (Precision, ROC Area, Recall e F-measure) para diferentes conjuntos de
dados selecionados utilizando o Star Schema Benchmark. Os resultados mostraram
que os dois melhores algoritmos foram J48 e Random Forest na maioria dos casos;
entretanto se selecionou J48 por ser mais eciente na construção do modelo.
PALAVRAS CHAVE
análise de dados, sistemas informáticos, bases de dados,
inteligência articial, toma de decisões.
34
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Introduction
A Data Warehouse —DW— is a theme-oriented,
integrated, time variable and non-volatile data
collection in support of management’s decision-
making process. Data warehousing provides
architectures and tools for business executives to
systematically organize, understand and use the
data to make strategic decisions. Data warehousing
systems are valuable tools in today’s fast-changing
and competitive world. In recent years, many
companies have spent millions of dollars building
company-wide data warehouses. Many people feel
that with increasing competition across industries,
data warehousing is the newest indispensable
marketing strategy and a retention of customers
way by learning more about their needs (Han,
Kamber and Pei, 2012).
On the other hand, fragmentation is a distributed
database design technique that consists of dividing
each database relation in smaller fragments and
treating each fragment as an object in the database
separately, there are three alternatives for that:
horizontal, vertical and hybrid fragmentation (Ozsu
and Valduriez, 2020).
One of the main problems faced by DW designers
is fragmentation. Several studies have proposed data
mining-based horizontal fragmentation methods,
which focus on optimizing query response time

However, to the best of our knowledge there not
exists a horizontal fragmentation technique that
uses a decision tree to carry out fragmentation.
      
because their construction does not require any
domain knowledge or parameter setting, they can
handle multidimensional data, the learning and

simple and fast, and they have good accuracy (Han,
Kamber and Pei, 2012), and given the importance

obtaining pure partitions (subsets of tuples) in a
data set using measures such as Information Gain,
Gain Ratio and the Gini Index, the aim of this work is
to use decision trees in the DW fragmentation. This
paper presents the analysis of different decision
trees algorithms to select the best one to implement
the fragmentation method performed under version
3.9.4 of Weka, considering four evaluation metrics
(Precision, ROC Area, Recall and F-measure) for
different selected data sets, using the Star Schema
Benchmark —SSB— (Star Scheme Benchmark).
This paper is made up of the following parts: (i)
the introduction; (ii) a review of related works on
DW horizontal fragmentation; (ii) the methodology
used in this work for the analysis of decision tree
algorithms and a description of each algorithm is
given; (iii) reports the preliminary results in the

the future work.
Related Works
Cloud SDW (Spatial DW) and spatial OLAP (On-
line Analytical Processing) as a Service concepts
were presented in Costa et al. (2016). Later those
concepts were used to describe two different
hierarchy-based data partitioning techniques
for the SDW hosted in the cloud: Spatial-based
partitioning and Conventional-based partitioning.

and Ouzzif (2017), consisted of an incremental
horizontal fragmentation technique for the DW
through a web service. The goal was to automate
the implementation of incremental fragmentation
in order to optimize a new query load.
In Barkhordari and Niamanesh (2018), it was
proposed a method called Chabok, which uses two
phase Map-Reduce to solve DW problems with
big data. Chabok fragments horizontally the fact
table. If there are homogeneous nodes, the same
number of records is allocated to each Fact-Mapper
node. As part of their ongoing work on workload
-
driven partitioning (Boissier and Kurzynski, 2018),
implemented an approach called aggressive data
skipping and extended it to handle both analytical
and transactional access patterns. The authors
evaluated their approach with the workload
and data of a production system of a global 2000
company.
Likewise, Barr, Boukhalfa and Bouibede (2018)
used linear programming to solve the NP-hard
problem of determining a horizontal fragmentation
scheme in relational DW. Also, Nam, Kim and
35
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Han (2018) proposed a graph-based database
partitioning method called GPT that improves the
performance of queries with less data redundancy.
In Letrache, El Beggar and Ramdani (2018), it was
proposed a dynamic fragmentation strategy for
OLAP cubes, using association rule mining.
On the other hand, Kechar and Nait-Bahloul
(2019) presented a horizontal data partitioning
approach tailored to a large DW, interrogated
through a high number of queries, the idea was to
fragment horizontally only the large fact table based
on partitioning predicates, elected from the set of
selection predicates used by analytic queries. While,
in Ramdane et al. (2019), the authors assured that
horizontal partitioning techniques have been used
for many purposes in big data processing, such as
load balancing, skipping unnecessary data loads,
and guiding the physical design of a DW. Therefore,
they proposed a new data placement strategy in
the Apache Hadoop environment called Smart
Data Warehouse Placement —SDWP—, which
allows performing star join operation in only
one Spark stage. The problem of partitioning and
load balancing in a cluster of homogeneous nodes
was investigated; experiments using the TPC-DS
benchmark, showed that the proposed method
enhances OLAP query performances in terms of
execution time.
Likewise, in Ramdane et al. (2019), authors
mixed a data-driven and a workload-driven model
to create a new scheme for distributed big data
warehouses over Hadoop, called “SkipSJoin.” First,
SkipSJoin builds horizontal fragments (buckets) of
the fact and dimension tables of the DW using a hash-
partitioning method, and distributes these buckets
evenly over the nodes of the cluster. Then, it allows
skipping the scanning of some unnecessary data
blocks, by hash-partitioning some DW tables with

using the TPC-DS benchmark they showed that the
proposal outperforms some approaches in terms
of query execution time.
Finally, in Hilprecht, Carsten and Uwe
(2019), it was introduced that commercial data
analytics products such as Microsoft Azure SQL
Data Warehouse or Amazon Redshift provide
ready-to use-scale-out database solutions for
OLAP-style workloads in the cloud. Whereas the
provisioning of a database cluster is in general
fully automated by cloud providers, customers
still have to make important design decisions
which were traditionally made by the database
administrator such as selecting the partitioning
schemes, therefore, the authors proposed a
learned partitioning advisor for analytical OLAP-
style workloads based on Deep Reinforcement
Learning —DRL—. The leading idea was that a
DRL agent learns its decisions based on experience
by monitoring the rewards for different workloads
and partitioning schemes. The evaluation showed

partitioning that outperform existing approaches
for automated partitioning design but that it can
also easily adjust to different deployments.
Table 1 provides an analysis of the horizontal
fragmentation methods discussed above, in which


Table 1. Comparative table of works on horizontal fragmentation
Work Classication Validation
Costa et al. (2016) Hierarchy-based Spatial Data Warehouse Benchmark (Spadawan)
Ettaouk and Ouzzif (2017) Cost-based Benchmark APB-1
Barkhordari and Niamanesh (2018) Map-Reduce-based Benchmark TPC-DS
Boissier and Kurzynski (2018) Cost-based
Benchmarks TPC-C, TPC-CH (CH-benCHmark), data and
workload of a SAP ERP system of a Global 2000 company.
Barr, Boukhalfa and Bouibede
(2018)
Metaheuristic-based Benchmark APB-1
36
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Work Classication Validation
Nam, Kim and Han (2018)
Cost-based
Graph-based
Benchmark TPC-DS, The Internet Movie DataBase (IMDB)
y BioWarehouse
Letrache, El Beggar and Ramdani
(2018)
Data mining-based Benchmark TPC-DS
Kechar and Nait-Bahloul (2019)
Cost-based
Predicates-based
SSB
Ramdane et al. (2019) Hash-partitioning-based
TPC-DS benchmark using Scala language on a cluster
of homogeneous nodes, a Hadoop-YARN platform, a
Spark engine, and Hive.
Ramdane et al. (2019) Hash-partitioning-based TPC-DS benchmark
Hilprecht, Carsten and Uwe (2019)
Deep Reinforcement Learning-
based
Dierent databases schemata and workloads of varying
complexity.
Source: author own elaboration.
Methodology
In this section, the process followed for
the analysis of the decision tree algorithms is
established; after that, each of the algorithms
available in the version of Weka used are described.
Collection and Preparation of Data
In order to carry out the study of decision tree
algorithms to select the best one to fragment the DW,
we use SSB and PostgreSQL. We constructed eight



the algorithm proposed by Rodríguez et al. (2014)
to build the data sets. The resulting data set for 24
queries and two fragments is visualized in Figure 1.
Figure 1. Data set with 24 queries and 2 fragments. Source: author own elaboration.
37
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Application of Decision Tree Algorithms
The seven decision tree algorithms that offer
the version of Weka 3.9.4 were applied to the
eight data sets. A description of the algorithms is
presented below.
Hoeffding Tree: It is an incremental, anytime
decision tree induction algorithm that is capable
of learning from massive data streams, assuming
that the distribution generating examples does not
change over time. Hoeffding trees exploit the fact
that a small sample can often be enough to choose
an optimal splitting attribute. This idea is supported
mathematically by the Hoeffding bound, which

to estimate some statistics within a prescribed
precision (Hulten, Spencer and Domingos, 2001).


regression functions at the leaves. The algorithm can
deal with binary and multi-class target variables,
numeric and nominal attributes and missing values
(Landwehr, Hall and Frank, 2005).
J48: C4.5 Decision Tree is one of the most
broadly used and real world approaches. In C4.5

tree as sets of if-then rules to human readability
improvement. The decision tree is simple to be
understood and interpreted; besides, it can handle
nominal and categorical data and perform well with
large data set in short time. In C4.5 training, the
decision tree is built in a top-down recursive way
(Saeh et al., 2016).
Decision Stump: It is one level decision tree,

on feature values. In a decision stump, each node

and each branch represents a value that the node

root node and sorting them based on their feature
values (Kotsiantis, Tsekouras and Pintelas, 2005;
Shi et al., 2018).
Random Forest: This algorithm uses bootstrap
methods to create an ensemble of trees, one for
each bootstrap sample. Additionally, the variables
eligible to be used in splitting is randomly varied
in order to decorrelate the variables. Once the
forest of trees is created, they vote to determine
the predicted value of input data (Dean, 2014).
Random Tree: It constructs a tree that considers
a given number of random features at each node
(Witten, Frank and Hall, 2011).
REPTree: It builds a decision or regression tree
using information gain-variance reduction and
prunes it using reduced-error pruning. Optimized
for speed, it only sorts values for numeric attributes
once. It deals with missing values by splitting
instances into pieces, as C4.5 does. It can be set
the minimum proportion of training set variance
for a split, and number of folds pruning (Witten,
Frank and Hall, 2011).
Results
After having analyzed the different decision
tree algorithms, the following results were found
for the Area ROC, Precision, Recall and F-measure
metrics. Figure 2 to Figure 5 demonstrate that
considering Recall, Precision, ROC Area and
F-Measure metrics, respectively, for the 24 queries
data sets, J48 algorithm was better for three, four

overcome by Random Forest.
38
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Figure 2. Results of Recall metric for 24 queries data sets. Source: author own elaboration.
Figure 3. Results of Precision metric for 24 queries data sets. Source: author own elaboration.
Figure 4. Results of ROC Area metric for 24 queries dataset. Source: author own elaboration.
39
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Figure 5. Results of F-Measure metric for 24 queries dataset. Source: author own elaboration.
With regards to the data sets of 50 queries,
the results of the application of the decision tree
algorithms presented in the Table 2 showed that
for 2 fragments the best algorithm was REPTree
because has a better behavior for the 4 metrics.
While, the Table 3 demonstrates that for 3 fragments
the best algorithm was Random Forest since it
presented a better performance than the others.
In the Table 4 the results for 4 fragments are shown,
J48 was the best for major of metrics. Finally, in
the Table 5 the best decision tree algorithm was

Table 2. Results of decision trees algorithms with 50 queries for two fragments
Algorithm Precision Recall ROC Area F-Measure
Decision Stump 0.875 0.843 0.682 0.832
HoedingTree 0.875 0.843 0.885 0.832
J48 0.857 0.843 0.924 0.836
LMT 0.963 0.961 0.910 0.960
RandomForest 0.963 0.961 0.998 0.960
RandomTree 0.864 0.863 0.929 0.860
REPTree 0.964 0.961 0.994 0.961
Source: author own elaboration.
40
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Table 3. Results of decision tree algorithms with 50 queries for three fragments
Algorithm Precision Recall ROC Area F-Measure
DecisionStump 0.561 0.686 0.691 0.617
HoedingTree - 0.745 0.830 -
J48 0.681 0.725 0.679 0.693
LMT 0.722 0.725 0.907 0.723
RandomForest 0.770 0.784 0.934 0.767
RandomTree 0.654 0.647 0.782 0.627
REPTree 0.459 0.608 0.510 0.521
Source: author own elaboration.
Table 4. Results of decision tree algorithms with 50 queries for four fragments
Algorithm Precision Recall ROC Area F-Measure
DecisionStump - 0.431 0.610 -
HoedingTree 0.500 0.353 0.645 0.353
J48 0.709 0.706 0.825 0.707
LMT 0.572 0.588 0.830 0.579
RandomForest 0.690 0.686 0.886 0.678
RandomTree 0.501 0.490 0.715 0.487
REPTree 0.548 0.490 0.701 0.487
Source: author own elaboration.
Once the analysis of the decision tree
algorithms for 25 and 50 queries was concluded,
it was determined that the two best algorithms were
Random Forest and J48, so it was decided to select

because the computational complexity of the J48
algorithm given set D is , where
n is the number of attributes describing the tuples
in D and |D| is the number of training tuples in D
(Han, Kamber and Pei, 2012). In contrast, the time
complexity for building forest of M randomized
trees is , where K is the
number of variables randomly drawn at each
node and Ñ=0.632|D| (Louppe, 2015). Figure 6
represents a decision tree created by J48 for the
50 queries data set and four fragments.
Table 5. Results of decision trees algorithms with 50 queries for ve fragments
Algorithm Precision Recall ROC Area F-Measure
DecisionStump - 0.294 0.591 -
HoedingTree 0.464 0.314 0.654 0.304
41
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
Algorithm Precision Recall ROC Area F-Measure
J48 0.671 0.647 0.834 0.642
LMT 0.657 0.667 0.895 0.661
RandomForest 0.749 0.745 0.927 0.743
RandomTree 0.610 0.569 0.779 0.566
REPTree 0.613 0.569 0.770 0.582
Source: author own elaboration.
Figure 6. Decision tree created by J48. Source: author own elaboration.
Conclusions
DW are applied in several areas and allow
  
allows optimizing response times and execution
costs for OLAP queries. In this work it is proposed
to take advantage of the potential of the decision
 
process of horizontal fragmentation of the DW
for that reason, this article described the process
in which the analysis of different decision trees
algorithms was carried out, in order to determine
the best of them to be implemented in a horizontal
fragmentation method for data warehouses. As
a result of the analysis, both J48 and Random
Forest were the best algorithms for decision tree
induction, and J48 was the selected algorithm for
the method implementation because it has a time
complexity lower than Random Forest. The future
work is the design of the fragmentation method,
which will consist of determining the most frequent
OLAP queries, analyzing the predicates used by
the queries, and based on this build the decision
tree, from which the horizontal fragments will
be generated. The method will be implemented
in a Tourist Data Warehouse which is being

regulate tourist activity in Mexico.
42
ARTÍCULOS ORIGINALES
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194
References
Barkhordari, M. and Niamanesh, M. (2018). Chabok:
A Map-Reduce based method to solve data
warehouse problems. Journal of Big Data, 5(40),
1-25.
Barr, M., Boukhalfa, K. and Bouibede, K. (2018). Bi-
Objective Optimization Method for Horizontal
Fragmentation Problem in Relational Data
Warehouses as a Linear Programming Problem.
, 32(9-10), 907-923.
Boissier, M. and Kurzynski, D. (2018). Workload-
Driven Horizontal Partitioning and Pruning for
Large HTAP Systems. In IEEE 34
th
International
Conference on Data Engineering Workshops
(ICDEW), Paris, France.
Costa, M.R. et al. (2016). Spatial data warehouses and
spatial OLAP come towards the cloud: Design
and performance. Distributed and Parallel
Databases, 34(3), 425-461.
Dean, J. (2014). 
Learning Value Creation for Business Leaders and
. New Jersey, USA: John Wiley & Sons.

Incremental and Automatic Data Warehouses
Fragmentation.   
,
8(6), 1-10.
Han, J., Kamber, M. and Pei, J. (2012). Data Mining
  . Burlington, USA:
Morgan Kaufmann Publishers.
Hilprecht, B., Carsten, B. and Uwe, R. (2019). Learning

Learning. Recovered from https://arxiv.org/
pdf/1904.01279.pdf.
Hulten, G., Spencer, L. and Domingos, P. (2001).
Mining time-changing data streams. In
Proceedings of the Seventh ACM SIGKDD
International Conference on Knowledge
Discovery and Data Mining.
Kechar, M. and Nait-Bahloul, S. (2019). Bringing
Together Physical Design and Fast Querying
of Large Data Warehouses: A New Data
Partitioning Strategy. In BDIoT’19: Proceedings
of the 4
th
International Conference on Big Data
and Internet of Things, Rabat, Morocco.
Kotsiantis, S., Tsekouras, G. and Pintelas, P. (2005).
Local Bagging of Decision Stumps. In Ali, M.
and Esposito, F. (Eds.), 
 (pp. 377-391). Berlin,
Germany: Springer.
Landwehr, N., Hall, M. and Frank, E. (2005). Logistic
Model Trees. , 59(1-2), 161-205.
Letrache, K., El Beggar, O. and Ramdani, M. (2019).
OLAP cube partitioning based on association
rules method. , 49(2), 420-
434.
Louppe, G. (2015). 
   . Liège, Belgium:
Universidad of Liège.
Nam, Y.-M., Kim, M.-S. and Han, D. (2018). A Graph-
Based Database Partitioning Method for
Parallel OLAP Query Processing. In IEEE 34
th
International Conference on Data Engineering
(ICDE), Paris, France.
Ozsu, M.T. and Valduriez, P. (2020). 
   . Geneva,
Switzerland: Springer Nature Switzerland AG.
Ramdane, Y. et al. (2019). SDWP: A New Data
Placement Strategy for Distributed Big Data
Warehouses in Hadoop. In Ordonez, C. et
al. (Eds.), 
 (pp. 189-205). Berlin, Germany:
Springer.
Ramdane, Y. et al. (2019). SkipSJoin: A New Physical
Design for Distributed Big Data Warehouses
in Hadoop. In Laender, A.H.F. et al. (Eds.),
 (pp. 255-263). Berlin,
Germany: Springer.
Rodríguez, L. et al. (2014). Horizontal Partitioning
of Multimedia Databases Using Hierarchical
Agglomerative Clustering. In Gelbukh, A.
et al. (Eds.),  
 (pp. 296-309). Cham,
Switzerland: Springer.
43
ARTÍCULOS
NIDIA RODRÍGUEZ MAZAHUA, LISBETH RODRÍGUEZ MAZAHUA, ASDRÚBAL LÓPEZ CHAU, GINER ALOR HERNÁNDEZ
Revista Perspectiva Empresarial, Vol. 7, No. 2-1, julio-diciembre de 2020, 31-43
ISSN 2389-8186, E-ISSN 2389-8194

     
power grid with presence of PV power plants
using C-4.5. 
, 56, 283-290.
Shi, L. et al. (2018). Signal prediction based on
boosting and decision stump. 
    
, 16(2), 117-122.
Witten, I.H., Frank, E. and Hall, M. (2011). Data

. New York, USA: Elsevier.