Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

Palabras clave: Data analysis, computer systems, databases, artificial intelligence, decision making

Resumen

One of the main problems faced by Data Warehouse designers is fragmentation.
Several studies have proposed data mining-based horizontal fragmentation methods.
However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.

Descargas

La descarga de datos todavía no está disponible.

Biografía del autor/a

Nidia Rodríguez Mazahua, Tecnológico Nacional de México

Mtra. in Administrative Engineering

Lisbeth Rodríguez Mazahua, Tecnológico Nacional de México

PhD in Computer Science

Asdrúbal López Chau, Centro Universitario UAEM

PhD in Computer Science

Giner Alor Hernández, Tecnológico Nacional de México

PhD of Science in the specialty of Electrical Engineering

Referencias bibliográficas

Barkhordari, M. and Niamanesh, M. (2018). Chabok: A Map-Reduce based method to solve data warehouse problems. Journal of Big Data, 5(40), 1-25. https://doi.org/10.1186/s40537-018-0144-5

Barr, M., Boukhalfa, K. and Bouibede, K. (2018). Bi- Objective Optimization Method for Horizontal Fragmentation Problem in Relational Data Warehouses as a Linear Programming Problem. Applied Artificial Intelligence, 32(9-10), 907-923. https://doi.org/10.1080/08839514.2018.1519096

Boissier, M. and Kurzynski, D. (2018). Workload- Driven Horizontal Partitioning and Pruning for Large HTAP Systems. In IEEE 34th International Conference on Data Engineering Workshops (ICDEW), Paris, France. https://doi.org/10.1109/ICDEW.2018.00026

Costa, M.R. et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425-461. https://doi.org/10.1007/s10619-015-7176-z

Dean, J. (2014). Big Data, Data Mining, and Machine Learning Value Creation for Business Leaders and Practitioners. New Jersey, USA: John Wiley & Sons. https://doi.org/10.1002/9781118691786

Ettaoufik, A. and Ouzzif, M. (2017). Web Service for Incremental and Automatic Data Warehouses Fragmentation. International Journal of Advanced Computer Science and Applications, 8(6), 1-10. https://doi.org/10.14569/IJACSA.2017.080661

Han, J., Kamber, M. and Pei, J. (2012). Data Mining Concepts and Techniques. Burlington, USA: Morgan Kaufmann Publishers.

Hilprecht, B., Carsten, B. and Uwe, R. (2019). Learning a Partitioning Advisor with Deep Reinforcement Learning. Recovered from https://arxiv.org/ pdf/1904.01279.pdf. https://doi.org/10.1145/3329859.3329876

Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/502512.502529

Kechar, M. and Nait-Bahloul, S. (2019). Bringing Together Physical Design and Fast Querying of Large Data Warehouses: A New Data Partitioning Strategy. In BDIoT'19: Proceedings of the 4th International Conference on Big Data and Internet of Things, Rabat, Morocco. https://doi.org/10.1145/3372938.3372947

Kotsiantis, S., Tsekouras, G. and Pintelas, P. (2005). Local Bagging of Decision Stumps. In Ali, M. and Esposito, F. (Eds.), Innovations in Applied Artificial Intelligence (pp. 377-391). Berlin, Germany: Springer. https://doi.org/10.1007/11504894_57

Landwehr, N., Hall, M. and Frank, E. (2005). Logistic Model Trees. Machine Learning, 59(1-2), 161-205. Letrache, K., El Beggar, O. and Ramdani, M. (2019). OLAP cube partitioning based on association rules method. Applied Intelligence, 49(2), 420-434. https://doi.org/10.1007/s10994-005-0466-3

Louppe, G. (2015). Understanding Random Forests: From Theory to Practice. Liège, Belgium: Universidad of Liège.

Nam, Y.-M., Kim, M.-S. and Han, D. (2018). A Graph- Based Database Partitioning Method for Parallel OLAP Query Processing. In IEEE 34th International Conference on Data Engineering (ICDE), Paris, France. https://doi.org/10.1109/ICDE.2018.00096

Ozsu, M.T. and Valduriez, P. (2020). Principles of Distributed Database Systems. Geneva, Switzerland: Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-26253-2

Ramdane, Y. et al. (2019). SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop. In Ordonez, C. et al. (Eds.), Big Data Analytics and Knowledge Discovery (pp. 189-205). Berlin, Germany: Springer. https://doi.org/10.1007/978-3-030-27520-4_14

Ramdane, Y. et al. (2019). SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop. In Laender, A.H.F. et al. (Eds.), Conceptual Modeling (pp. 255-263). Berlin, Germany: Springer. https://doi.org/10.1007/978-3-030-33223-5_21

Rodríguez, L. et al. (2014). Horizontal Partitioning of Multimedia Databases Using Hierarchical Agglomerative Clustering. In Gelbukh, A. et al. (Eds.), Nature-Inspired Computation and Machine Learning (pp. 296-309). Cham, Switzerland: Springer https://doi.org/10.1007/978-3-319-13650-9_27

Saeh, I.S. et al. (2016). Static Security classification and Evaluation classifier design in electric power grid with presence of PV power plants using C-4.5. Renewable and Sustainable Energy Reviews, 56, 283-290. https://doi.org/10.1016/j.rser.2015.11.054

Shi, L. et al. (2018). Signal prediction based on boosting and decision stump. International Journal of Computational Science and Engineering, 16(2), 117-122. https://doi.org/10.1504/IJCSE.2018.090450

Witten, I.H., Frank, E. and Hall, M. (2011). Data Mining Practical Machine Learning Tools and Techniques. New York, USA: Elsevier.

Publicado
2020-12-01
Cómo citar
Rodríguez Mazahua, N., Rodríguez Mazahua, L., López Chau, A., & Hernández, G. A. (2020). Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation. Revista Perspectiva Empresarial, 7(2 Supl.1), 31-43. https://doi.org/10.16967/23898186.667
Crossref Cited-by logo

Más sobre este tema

Artículos más leídos del mismo autor/a