> The problem, as has become clear in the last week through multiple blog posts discussing the phenomenon, is that Dunning-Kruger isn’t real.
Can you link to the blog posts? Now intrigued.
> People who know the least about a topic are not the most overconfident.
But this is exactly what the linked Gignaca and Zajenkowskib (2020) paper shows! It shows that the people who are most confident about a topic *are* the most overconfident, and makes a weak argument that overconfidence does not vary with knowledge (the argument is weak because it treats absence of a statistically significant effect as evidence for the null).
i.e. it matters lot whether or not you take Dunning-Kruger to be a statement about first moments or second moments.
> It turns out that the effect can be reliably generated with a model that actually doesn’t assume that people with less skill are more overconfident.
This is a more compelling argument about first moments, but still doesn't close the logical gap. Now we have two models that produce data like those in the post: one where the Dunning-Kruger effect exists, and one where it does not. The data is compatible with both of these models. Unless we find one of the models substantially better than the other we have no grounds to use one model exclusively for inference!
Personally I do find the measurement error model more appealing than OLS, but there is a lot more argument needed to arrive at a total dismissal of Dunning-Kruger here!
(2) I am not sure that I understand your argument here. In the Gignac & Zajenkowski abstract they write: "the association between objectively measured intelligence and self-assessed intelligence was found to be essentially entirely linear, again, contrary to the Dunning-Kruger hypothesis." That is, when you don't do the quantile scoring from the original DK paper, then you don't get an overconfidence effect. The raw data they used show that there's basically just a moderate linear correlation between self-perceived IQ and tested IQ (check out their Figure 3C).
They also show (in their Figure 1 and accompanying text) that the classic DK figure can be generated by the "better than average" effect. They then say that one theoretical argument for DK is that people who don't do well on a task also don't know how to assess themselves as well. Backing that into the more specific language of math, they argue that this means the theoretical account of DK implies you should see larger errors in accuracy among the lower scorers, as well as a nonlinear relationship between self-reported and tested scores. These are not seen in the data. So overall, they do a pretty good job demonstrating that the DK effect is not due to people who don't do well on tasks not knowing enough to evaluate themselves accurately. But I'm happy to hear more specifics on what you didn't find convincing.
(3) I don't think we have a compelling DK model right now (at least not in the sources I cited). When you plot raw data, it doesn't look like there is a DK model. When you think through the theoretical arguments put forward for DK to examine their other implications, those implications aren't found in the data. So from what I've read, I actually don't think the theoretical DK argument can explain the observed DK effect, while the measurement error + better-than-average model can. Again, happy to look at other sources or walk through other arguments, but my impression is that this does actually "kill" DK. Other modeling might revive it, but right now the field-wide default should be that DK graphs are caused by measurement error + better-than-average effects.
I think the key bit is a regression between self-perceived IQ and tested IQ. The interesting question is not whether the errors in that regression are heteroscedastic (which they test and fail to reject) but rather if the errors differ systematically as a function of tested IQ.
In particular, I want to know if overconfidence_i = SAIQ_i - IQ_i varies with IQ_i. And the paper does not present these residuals; it presents a heteroscedasticity test of abs(overconfidence_i). In theory we could look for a patter in these residuals by looking at a LOESS regression overconfidence_i ~ IQ_i, but this is not what we actually get in the paper either, which instead presents a LOESS regression SAIQ_i ~ IQ_i.
If you do a regression SAIQ_i ~ IQ_i rather than overconfidence_i ~ IQ_i, you need to test if the fit from the SAIQ_i ~ IQ_i regression equals the identity, not whether the regression itself is linear. For example, if you found a (significant) model
Hmmm. I see what you're saying. If subjective IQ is always judged at 70% of tested IQ, then you would see a bias in the raw scores, where people who are lower in objective IQ will actually have more overall error.
Of course, when I see that equation, my interpretation of it is that judging subjective IQ to always be 70% of true IQ doesn't necessarily seem biased, because that's roughly what we'd expect in other correlation analyses where we think one variable is indexing another. That's something I'll need to think through.
Also, when I redo my simulations with zero bias, and both subjective and objective measures are just noisy measures of an underlying true IQ, then I get r ~ .5, which is not far off from the proposed correlation in that model. If you also assume that the measurement error is a bit lower for test IQ than subjective IQ, then you get almost exactly the equation proposed above: SAIC = 0.7 * IQ + noise
I'm going to give this a bit more thought, and may post about it again.
> The problem, as has become clear in the last week through multiple blog posts discussing the phenomenon, is that Dunning-Kruger isn’t real.
Can you link to the blog posts? Now intrigued.
> People who know the least about a topic are not the most overconfident.
But this is exactly what the linked Gignaca and Zajenkowskib (2020) paper shows! It shows that the people who are most confident about a topic *are* the most overconfident, and makes a weak argument that overconfidence does not vary with knowledge (the argument is weak because it treats absence of a statistically significant effect as evidence for the null).
i.e. it matters lot whether or not you take Dunning-Kruger to be a statement about first moments or second moments.
> It turns out that the effect can be reliably generated with a model that actually doesn’t assume that people with less skill are more overconfident.
This is a more compelling argument about first moments, but still doesn't close the logical gap. Now we have two models that produce data like those in the post: one where the Dunning-Kruger effect exists, and one where it does not. The data is compatible with both of these models. Unless we find one of the models substantially better than the other we have no grounds to use one model exclusively for inference!
Personally I do find the measurement error model more appealing than OLS, but there is a lot more argument needed to arrive at a total dismissal of Dunning-Kruger here!
Thanks for reading! I'll respond to your points in order:
(1) The blog post I first wrote on it is linked above. Here's the link again: https://www.psychologytoday.com/us/blog/how-do-you-know/202012/dunning-kruger-isnt-real
Another one I've been pointed to recently is here: https://dlm-econometrics.blogspot.com/2020/12/nonclassical-classics.html
(2) I am not sure that I understand your argument here. In the Gignac & Zajenkowski abstract they write: "the association between objectively measured intelligence and self-assessed intelligence was found to be essentially entirely linear, again, contrary to the Dunning-Kruger hypothesis." That is, when you don't do the quantile scoring from the original DK paper, then you don't get an overconfidence effect. The raw data they used show that there's basically just a moderate linear correlation between self-perceived IQ and tested IQ (check out their Figure 3C).
They also show (in their Figure 1 and accompanying text) that the classic DK figure can be generated by the "better than average" effect. They then say that one theoretical argument for DK is that people who don't do well on a task also don't know how to assess themselves as well. Backing that into the more specific language of math, they argue that this means the theoretical account of DK implies you should see larger errors in accuracy among the lower scorers, as well as a nonlinear relationship between self-reported and tested scores. These are not seen in the data. So overall, they do a pretty good job demonstrating that the DK effect is not due to people who don't do well on tasks not knowing enough to evaluate themselves accurately. But I'm happy to hear more specifics on what you didn't find convincing.
(3) I don't think we have a compelling DK model right now (at least not in the sources I cited). When you plot raw data, it doesn't look like there is a DK model. When you think through the theoretical arguments put forward for DK to examine their other implications, those implications aren't found in the data. So from what I've read, I actually don't think the theoretical DK argument can explain the observed DK effect, while the measurement error + better-than-average model can. Again, happy to look at other sources or walk through other arguments, but my impression is that this does actually "kill" DK. Other modeling might revive it, but right now the field-wide default should be that DK graphs are caused by measurement error + better-than-average effects.
I think the key bit is a regression between self-perceived IQ and tested IQ. The interesting question is not whether the errors in that regression are heteroscedastic (which they test and fail to reject) but rather if the errors differ systematically as a function of tested IQ.
In particular, I want to know if overconfidence_i = SAIQ_i - IQ_i varies with IQ_i. And the paper does not present these residuals; it presents a heteroscedasticity test of abs(overconfidence_i). In theory we could look for a patter in these residuals by looking at a LOESS regression overconfidence_i ~ IQ_i, but this is not what we actually get in the paper either, which instead presents a LOESS regression SAIQ_i ~ IQ_i.
If you do a regression SAIQ_i ~ IQ_i rather than overconfidence_i ~ IQ_i, you need to test if the fit from the SAIQ_i ~ IQ_i regression equals the identity, not whether the regression itself is linear. For example, if you found a (significant) model
overconfidence_i = -0.3 * IQ_i + noise
this would still be evidence for Dunning-Kruger.
Sorry that should be phrased in terms of the SAIQ_i ~ IQ_i regression. If you found a (significant) model
SAIQ_i = 0.7 * IQ_i + noise
that would still be evidence for Dunning-Kruger.
Hmmm. I see what you're saying. If subjective IQ is always judged at 70% of tested IQ, then you would see a bias in the raw scores, where people who are lower in objective IQ will actually have more overall error.
Of course, when I see that equation, my interpretation of it is that judging subjective IQ to always be 70% of true IQ doesn't necessarily seem biased, because that's roughly what we'd expect in other correlation analyses where we think one variable is indexing another. That's something I'll need to think through.
Also, when I redo my simulations with zero bias, and both subjective and objective measures are just noisy measures of an underlying true IQ, then I get r ~ .5, which is not far off from the proposed correlation in that model. If you also assume that the measurement error is a bit lower for test IQ than subjective IQ, then you get almost exactly the equation proposed above: SAIC = 0.7 * IQ + noise
I'm going to give this a bit more thought, and may post about it again.
I blogged about this after doing some more math to calculate out the actual SAIQ ~ IQ regression coefficients.
https://www.alexpghayes.com/blog/dunning-kruger/