Modelo matemático en primera aproximación para la relación entre los estadísticos de prueba de normalidad y el valor de p

José Mendoza Rodríguez; Andrés Salazar Andrade

doi:10.37135/ns.01.17.08

Authors

José Mendoza Rodríguez Unidad Educativa Joaquín Lalama https://orcid.org/0000-0002-0591-2768
Andrés Salazar Andrade Unidad Educativa Balandra-Cruz del Sur https://orcid.org/0009-0003-1062-495X

DOI:

https://doi.org/10.37135/ns.01.17.08

Keywords:

Test statistics, Mathematical model, Normality, Regression, p-value

Abstract

Normality tests are considered the starting point of parametric statistics; however, it is essential to understand the behavior of test statistics and p-values to make meaningful decisions. The objective of this study was to design a mathematical model capable of identifying existing patterns or trends between the test statistic and the p-value. The methodology consisted of applying the most common univariate normality tests to variables such as precipitation, crude oil, and temperature, using a subsample analysis tailored to each test. Each subsample provided both the test statistic and the p-value, which were processed using classical regression models as a first approximation: linear, second-, and third-degree polynomials, exponential, and logarithmic. Model validation was performed using the coefficient of determination (R²), obtaining the highest R² values as follows: in the KS test, for the crude oil variable (R² = 0.9784); in the AD test, for the precipitation variable (R² = 1); in the JB test, for the temperature variable (R² = 0.8902); in the CvM test, for the temperature variable (R² = 1); in the M.KS test, for the temperature variable (R² = 0.936); in the P χ² test, for the precipitation variable (R² = 0.3798); in the SF test, for the temperature variable (R² = 0.6019); and in the SW test, for the crude oil variable (R² = 0.3712). Conducting this type of preliminary analysis enables researchers to make more effective and optimal decisions when assessing the assumption of univariate normality in a dataset.

Downloads

Download data is not yet available.

References

A. Ferencek y M. K. Borštnar, “Data quality assessment in product failure prediction models,” Journal of Decision Systems, vol. 29, n. 1, pp. 79–86, jun. 2020, doi: 10.1080/12460125.2020.1776927.

R. H. Taplin, “Quantifying data quality after removing respondents who fail data quality checks” Current Issues in Tourism, vol. 28, n. 16, pp. 2570-2581, jul. 2024, doi: 10.1080/13683500.2024.2378611.

A. Inglis, A. Parnell, y C. B. Hurley, “Visualizing variable importance and variable interaction effects in machine learning models,” Journal of Computational and Graphical Statistics, vol. 31, n. 3, pp. 766–778, ene. 2022, doi: 10.1080/10618600.2021.2007935.

L. M. Hudiburgh y D. Garbinsky, “Data visualization: Bringing data to life in an introductory statistics course,” Journal of Statistics Education, vol. 28, n. 3, pp. 262–279, ago. 2020, doi: 10.1080/10691898.2020.1796399.

C. Avram y M. Mărușteri, “Normality assessment, few paradigms and use cases,” Revista Romana de Medicina de Laborator, vol. 30, n. 3, pp. 251–260, jul. 2022, doi: 10.2478/rrlm-2022-0030.

M. Kumagai, Y. Ando, A. Tanaka, K. Tsuda, Y. Katsura, y K. Kurosaki, “Effects of data bias on machine-learning–based material discovery using experimental property data,” Science and Technology of Advanced Materials: Methods, vol. 2, n. 1, pp. 302–309, ago. 2022, doi: 10.1080/27660400.2022.2109447.

S. Pawel y L. Held, “Closed-form power and sample size calculations for Bayes factors”, The American Statistician, vol. 79, n. 3, pp. 330–344, abr. 2025, doi: 10.1080/00031305.2025.2467919.

M. Javed y M. Irfan, “A simulation study: New optimal estimators for population mean by using dual auxiliary information in stratified random sampling,” Journal of Taibah University for Science, vol. 14, n. 1, pp. 557–568, ene. 2020, doi: 10.1080/16583655.2020.1752004.

Y. Pawitan, “Defending the p-value,” arXiv, 2020, doi: 10.48550/arXiv.2009.02099.

L. P. M. Diaz-Ballve, “El valor p: un concepto estadístico-metodológico omnipresente en la investigación biomédica. ¿Lo interpretamos correctamente?,” Argentinian Journal of Respiratory & Physical Therapy, vol. 2, n. 1, feb. 2020, doi: 10.58172/ajrpt.v2i1.91.

P. C. Austin, I. Eekhout, y S. van Buuren, “Evaluating the median p-value method for assessing the statistical significance of tests when using multiple imputation,” Journal of Applied Statistics, vol. 52, n. 6, pp. 1161–1176, oct. 2024, doi: 10.1080/02664763.2024.2418473.

A. Ebbelind, “A functional view on language: a methodology for mathematics education to study shifts in prospective teachers’ discursive patterns,” International Journal of Mathematical Education in Science and Technology, vol. 54, n. 8, pp. 1731–1745, may. 2023, doi: 10.1080/0020739X.2023.2204506.

J. F. Hair, C. M. Ringle, y M. Sarstedt, “PLS-SEM: Indeed a silver bullet,” Journal of Marketing Theory and Practice, vol. 19, n. 2, pp. 139–152, abr. 2011, doi: 10.2753/MTP1069-6679190202.

E. E. Pinto Aragón, A. R. Villa Navas, y H. A. Pinto Aragón, “Estrés académico en estudiantes de la Universidad de La Guajira, Colombia,” REV CIENC SOC-VENEZ, vol. 28, no. 5, pp. 87–99, may. 2022. Disponible en: https://dialnet.unirioja.es/servlet/articulo?codigo=8471675.

V. Glinskiy, Y. Ismayilova, S. Khrushchev, A. Logachov, O. Logachova, L. Serga, A. Yambartev y K. Zaykov, “Modifications to the Jarque–Bera test,” Mathematics, vol. 12, n. 16, p. 2523, ago. 2024, doi: 10.3390/math12162523.

B. O. Emmanuel, N. T. Maureen, y N. Wonu, “Detection of non-normality in data sets and comparison between different normality tests,” Asian Journal of Probability and Statistics, vol. 5, n. 4, pp. 1–20, ene. 2020, doi: 10.9734/ajpas/2019/v5i430149.

A. Kamath, S. Poojari, y K. Varsha, “Assessing the robustness of normality tests under varying skewness and kurtosis: a practical checklist for public health researchers,” BMC Medical Research Methodology, vol. 25, n. 1, p. 206, sep. 2025, doi: 10.1186/s12874-025-02641-y.

S. Tenny y I. Abdelgawad, “Statistical Significance,” StatPearls [Internet], Treasure Island (FL): StatPearls Publishing, Updated Nov. 23, 2023. Disponible en: https://www.ncbi.nlm.nih.gov/books/NBK459346/

R Core Team, “R: A Language and Environment for Statistical Computing.” r-project.org. https://www.R-project.org/

S. S. Uyanto, “An extensive comparisons of 50 univariate goodness-of-fit tests for normality,” Austrian Journal of Statistics, vol. 51, n. 3, pp. 45–97, ago. 2022, doi: 10.17713/ajs.v51i3.1279.

S. Korkmaz y Y. Demir, “Investigation of some univariate normality tests in terms of type I errors and test power,” Journal of Scientific Reports A, n. 52, pp. 376–395, mar. 2023, doi: 10.59313/jsr-a.1222979.

S. Demir, “Comparison of normality tests in terms of sample sizes under different skewness and kurtosis coefficients,” International Journal of Assessment Tools in Education, vol. 9, n. 2, pp. 397–409, jun. 2022, doi: 10.21449/ijate.1101295.

J. Gauthier, Q. V. Wu, y T. A. Gooley, “Cubic splines to model relationships between continuous variables and outcomes: a guide for clinicians,” Bone Marrow Transplant., vol. 55, n. 4, pp. 675–680, oct. 2019, doi: 10.1038/s41409-019-0679-x.

Y. Xie, Z. Jing, H. Pan, X. Xu, y Q. Fang, “Redefining the high variable genes by optimized LOESS regression with positive ratio,” BMC Bioinformatics, vol. 26, n. 104, abr. 2025, doi: 10.1186/s12859-025-06112-5.

S. K. Safi, M. Alsheryani, M. Alrashdi, R. Suleiman, D. Awwad, y Z. N. Abdalla, “Optimizing linear regression models with lasso and ridge regression: A study on UAE financial behavior during COVID-19,” Migration Letters, vol. 20, n. 6, pp. 139–153, sep. 2023, doi: 10.59670/ml.v20i6.3468.

S. N. Wood, “Inference and computation with generalized additive models and their extensions,” TEST, vol. 29, n. 2, pp. 307–339, abr. 2020, doi: 10.1007/s11749-020-00711-5.

A first-approximation mathematical model for the relationship between normality test statistics and the p-value

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Language

4

9

Information

1

2

3

5

6

8

10

Latest publications

facebook