Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation*

Nidia Rodríguez Mazahua; Lisbeth Rodríguez Mazahua; Asdrúbal López Chau; Giner Alor Hernández

doi:10.16967/23898186.667

Ver / Descargar

PDF

FLIP

HTML

How to Cite

Rodríguez Mazahua, N., Rodríguez Mazahua, L., López Chau, A., & Hernández, G. A. (2020). Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation*. Revista Perspectiva Empresarial, 7(2 Supl.1), 31–43. https://doi.org/10.16967/23898186.667

More Citation Formats

ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS) BibTeX

Published: Dec 1, 2020

Doi

https://doi.org/10.16967/23898186.667

Dimensions

PlumX

Issue

Vol. 7 No. 2 Supl.1 (2020): “1th International Workshop on Enterprise Decision-Making Applying Artificial Intelligence Techniques (WEDMAIT 2020)”

Section

ARTICLES

Nidia Rodríguez Mazahua Tecnológico Nacional de México

https://orcid.org/0000-0002-8227-9476

Lisbeth Rodríguez Mazahua Tecnológico Nacional de México

https://orcid.org/0000-0002-9861-3993

Asdrúbal López Chau Centro Universitario UAEM

https://orcid.org/0000-0001-5254-0939

Giner Alor Hernández Tecnológico Nacional de México

https://orcid.org/0000-0003-3296-0981

Abstract

One of the main problems faced by Data Warehouse designers is fragmentation.
Several studies have proposed data mining-based horizontal fragmentation methods.
However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.

Downloads

Download data is not yet available.

Author Biographies / See

Nidia Rodríguez Mazahua, Tecnológico Nacional de México

Mtra. in Administrative Engineering

Lisbeth Rodríguez Mazahua, Tecnológico Nacional de México

PhD in Computer Science

Asdrúbal López Chau, Centro Universitario UAEM

PhD in Computer Science

Giner Alor Hernández, Tecnológico Nacional de México

PhD of Science in the specialty of Electrical Engineering

References

Barkhordari, M. and Niamanesh, M. (2018). Chabok: A Map-Reduce based method to solve data warehouse problems. Journal of Big Data, 5(40), 1-25. https://doi.org/10.1186/s40537-018-0144-5 DOI: https://doi.org/10.1186/s40537-018-0144-5

Barr, M., Boukhalfa, K. and Bouibede, K. (2018). Bi- Objective Optimization Method for Horizontal Fragmentation Problem in Relational Data Warehouses as a Linear Programming Problem. Applied Artificial Intelligence, 32(9-10), 907-923. https://doi.org/10.1080/08839514.2018.1519096 DOI: https://doi.org/10.1080/08839514.2018.1519096

Boissier, M. and Kurzynski, D. (2018). Workload- Driven Horizontal Partitioning and Pruning for Large HTAP Systems. In IEEE 34th International Conference on Data Engineering Workshops (ICDEW), Paris, France. https://doi.org/10.1109/ICDEW.2018.00026 DOI: https://doi.org/10.1109/ICDEW.2018.00026

Costa, M.R. et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425-461. https://doi.org/10.1007/s10619-015-7176-z DOI: https://doi.org/10.1007/s10619-015-7176-z

Dean, J. (2014). Big Data, Data Mining, and Machine Learning Value Creation for Business Leaders and Practitioners. New Jersey, USA: John Wiley & Sons. https://doi.org/10.1002/9781118691786 DOI: https://doi.org/10.1002/9781118691786

Ettaoufik, A. and Ouzzif, M. (2017). Web Service for Incremental and Automatic Data Warehouses Fragmentation. International Journal of Advanced Computer Science and Applications, 8(6), 1-10. https://doi.org/10.14569/IJACSA.2017.080661 DOI: https://doi.org/10.14569/IJACSA.2017.080661

Han, J., Kamber, M. and Pei, J. (2012). Data Mining Concepts and Techniques. Burlington, USA: Morgan Kaufmann Publishers.

Hilprecht, B., Carsten, B. and Uwe, R. (2019). Learning a Partitioning Advisor with Deep Reinforcement Learning. Recovered from https://arxiv.org/ pdf/1904.01279.pdf. https://doi.org/10.1145/3329859.3329876 DOI: https://doi.org/10.1145/3329859.3329876

Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/502512.502529 DOI: https://doi.org/10.1145/502512.502529

Kechar, M. and Nait-Bahloul, S. (2019). Bringing Together Physical Design and Fast Querying of Large Data Warehouses: A New Data Partitioning Strategy. In BDIoT'19: Proceedings of the 4th International Conference on Big Data and Internet of Things, Rabat, Morocco. https://doi.org/10.1145/3372938.3372947 DOI: https://doi.org/10.1145/3372938.3372947

Kotsiantis, S., Tsekouras, G. and Pintelas, P. (2005). Local Bagging of Decision Stumps. In Ali, M. and Esposito, F. (Eds.), Innovations in Applied Artificial Intelligence (pp. 377-391). Berlin, Germany: Springer. https://doi.org/10.1007/11504894_57 DOI: https://doi.org/10.1007/11504894_57

Landwehr, N., Hall, M. and Frank, E. (2005). Logistic Model Trees. Machine Learning, 59(1-2), 161-205. Letrache, K., El Beggar, O. and Ramdani, M. (2019). OLAP cube partitioning based on association rules method. Applied Intelligence, 49(2), 420-434. https://doi.org/10.1007/s10994-005-0466-3 DOI: https://doi.org/10.1007/s10489-018-1275-2

Louppe, G. (2015). Understanding Random Forests: From Theory to Practice. Liège, Belgium: Universidad of Liège.

Nam, Y.-M., Kim, M.-S. and Han, D. (2018). A Graph- Based Database Partitioning Method for Parallel OLAP Query Processing. In IEEE 34th International Conference on Data Engineering (ICDE), Paris, France. https://doi.org/10.1109/ICDE.2018.00096 DOI: https://doi.org/10.1109/ICDE.2018.00096

Ozsu, M.T. and Valduriez, P. (2020). Principles of Distributed Database Systems. Geneva, Switzerland: Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-26253-2 DOI: https://doi.org/10.1007/978-3-030-26253-2

Ramdane, Y. et al. (2019). SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop. In Ordonez, C. et al. (Eds.), Big Data Analytics and Knowledge Discovery (pp. 189-205). Berlin, Germany: Springer. https://doi.org/10.1007/978-3-030-27520-4_14 DOI: https://doi.org/10.1007/978-3-030-27520-4_14

Ramdane, Y. et al. (2019). SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop. In Laender, A.H.F. et al. (Eds.), Conceptual Modeling (pp. 255-263). Berlin, Germany: Springer. https://doi.org/10.1007/978-3-030-33223-5_21 DOI: https://doi.org/10.1007/978-3-030-33223-5_21

Rodríguez, L. et al. (2014). Horizontal Partitioning of Multimedia Databases Using Hierarchical Agglomerative Clustering. In Gelbukh, A. et al. (Eds.), Nature-Inspired Computation and Machine Learning (pp. 296-309). Cham, Switzerland: Springer https://doi.org/10.1007/978-3-319-13650-9_27 DOI: https://doi.org/10.1007/978-3-319-13650-9_27

Saeh, I.S. et al. (2016). Static Security classification and Evaluation classifier design in electric power grid with presence of PV power plants using C-4.5. Renewable and Sustainable Energy Reviews, 56, 283-290. https://doi.org/10.1016/j.rser.2015.11.054 DOI: https://doi.org/10.1016/j.rser.2015.11.054

Shi, L. et al. (2018). Signal prediction based on boosting and decision stump. International Journal of Computational Science and Engineering, 16(2), 117-122. https://doi.org/10.1504/IJCSE.2018.090450 DOI: https://doi.org/10.1504/IJCSE.2018.090450

Witten, I.H., Frank, E. and Hall, M. (2011). Data Mining Practical Machine Learning Tools and Techniques. New York, USA: Elsevier.

Cited by