Survivorship bias is the idea that there might be an unknown filter that’s filtering the data before you even get to see it. In the case of the plane, that’s referring to a story from WW2, where planes returning from combat were recorded for where they were shot. Famously, the recommendation was to thicken the armor on places where the planes weren’t hit, because the “unknown filter” in this case is that if the plane were shot down, then you would never be able to record where bullets hit on that plane. Hence, the most important areas of the plane are actually the places that weren’t shot in the surviving planes.
In the case of the graph, this is a graph compiled from looking through a lot of papers and recording how significant a result is. Essentially a measure of how “interesting” the data is. Here, the unknown filter is that if a result weren’t interesting, then it wouldn’t get published. As a result, there’s a gap right in the middle of the graph, which is where the data is least interesting. In recent times, there’s been a philosophical argument that even uninteresting data should be published, so that at least it would prevent wasted time from multiple people attempting to do the same thing, each unaware that it’s already been done before. Hence the reason why people made the graph in the first place
I can see why journals would not want to publish boring papers in the days of paper magazines and limited space but why would they not be published digitally nowadays? Limited by people able to review them?
No reason, I suppose. In my opinion it seems to just be a holdover from the previous systems of publishing. The prestige of a journal is ranked based on how often it gets cited (or in other words, how influential the papers are within the journal). Publishing insignificant/uninteresting data would lower a journal’s average citation count, which would make it seem less prestigious than other journals. Hence journals are incentivized to only publish interesting data. It’s a shitty system that everyone knows is shitty but nobody has a good solution for how to fix it
Z values are measurement of how many standard deviations something is from the mean. 95% of your values fall between -2 and +2. Most “interesting cases” are about outliers, something that’s very uncommon. If it’s common, you don’t tend to write a paper about it. Nobody cares if someone had a slightly above average tumor, but if they have 50kg tumor, that’s publishable.
The survivorship bias plane shows a world war 2 chart of where the bullet holes were on planes. The conclusion famously isn’t to armor those parts often hit, but to armor the parts that weren’t hit, because no planes hit there returned to be recorded.
explain?
Survivorship bias is the idea that there might be an unknown filter that’s filtering the data before you even get to see it. In the case of the plane, that’s referring to a story from WW2, where planes returning from combat were recorded for where they were shot. Famously, the recommendation was to thicken the armor on places where the planes weren’t hit, because the “unknown filter” in this case is that if the plane were shot down, then you would never be able to record where bullets hit on that plane. Hence, the most important areas of the plane are actually the places that weren’t shot in the surviving planes.
In the case of the graph, this is a graph compiled from looking through a lot of papers and recording how significant a result is. Essentially a measure of how “interesting” the data is. Here, the unknown filter is that if a result weren’t interesting, then it wouldn’t get published. As a result, there’s a gap right in the middle of the graph, which is where the data is least interesting. In recent times, there’s been a philosophical argument that even uninteresting data should be published, so that at least it would prevent wasted time from multiple people attempting to do the same thing, each unaware that it’s already been done before. Hence the reason why people made the graph in the first place
I can see why journals would not want to publish boring papers in the days of paper magazines and limited space but why would they not be published digitally nowadays? Limited by people able to review them?
No reason, I suppose. In my opinion it seems to just be a holdover from the previous systems of publishing. The prestige of a journal is ranked based on how often it gets cited (or in other words, how influential the papers are within the journal). Publishing insignificant/uninteresting data would lower a journal’s average citation count, which would make it seem less prestigious than other journals. Hence journals are incentivized to only publish interesting data. It’s a shitty system that everyone knows is shitty but nobody has a good solution for how to fix it
Z values are measurement of how many standard deviations something is from the mean. 95% of your values fall between -2 and +2. Most “interesting cases” are about outliers, something that’s very uncommon. If it’s common, you don’t tend to write a paper about it. Nobody cares if someone had a slightly above average tumor, but if they have 50kg tumor, that’s publishable.
The survivorship bias plane shows a world war 2 chart of where the bullet holes were on planes. The conclusion famously isn’t to armor those parts often hit, but to armor the parts that weren’t hit, because no planes hit there returned to be recorded.
Oh, I thought it was about p-hacking