IJCA Vol 4 i1 2025 webmag - Flipbook - Page 13
2025 | Volume 4, Issue 1
Minitab. The results of this test are presented in the
following 昀椀gure (Figure 1):
Figure 2: Analysis of Variance and Con昀椀dence Intervals for
the Means Across Generative AI Tools.
Figure 1. Equality of Variances Test and Con昀椀dence
Intervals for Standard Deviation Across Generative AI Tools
The chart illustrates the con昀椀dence intervals for
the standard deviation of responses from each
tool. It is evident that the intervals for Meta AI,
ChatGPT 4.0 Free, and ChatGPT o1 overlap, while the
interval for L-Squad is narrower. This indicates that
L-Squad exhibits lower variability in its responses,
suggesting greater consistency across different
levels of comprehension demands. Since variances
represent the dispersion of the data, a low p-value
(such as 0.031) suggests that at least one of the tools
demonstrates signi昀椀cantly different variability in its
performance compared to the others.
Welch ANOVA Test
Additionally, an analysis of variance (Welch ANOVA)
test (a statistical test used to assess the difference
between the means of more than two groups) was
conducted to evaluate whether the differences
observed in the average concordance rates among
the tools are statistically signi昀椀cant. The results of
this test are summarized in Figure 2:
13
The p-value = 0.533 indicates that there are no
statistically signi昀椀cant differences between the
means of the evaluated tools regarding their overall
performance. This suggests that the variations