I'll take your word on that one - as you know way more about this stuff than I do. But I still don't know how it wouldn't raise the averages and why that doesn't matter - which is probably why I'm in sports marketing and not an accountant. I do maths bad.
I'm not saying it doesn't matter. I'm saying neither of you are incorrect. I took a solid hour out of work to write this up, so please digest
Their 4 is like a 1, you're right. If they're working on a scale from 4 to 10 because they can't award the band a 1-3, it's a somewhat arbitrary scaling. Somewhat. View it like a vector...
You(Real): <1, 2, 3, 4, 5, 6, 7, 8, 9, 10>
Them: <4, ..., 10>
What's the difference?
[((Their Max - Their Min)*(x - Real Min) / (Real Max - Real Min)) + Their Min]*<You>
[((10 - 4)*(x - 1) / (10 - 1)) + 4]*<You>
[6(x-1)/9 + 4]*<You>
[0.333333(x-1) + 4]*<You>
or...
Them: <4, 4.666, 5.333, 6, 6.666, 7.333, 8, 8.666, 9.333, 10>
The difference between the two of you? A scaled
and shifted range of (1/3)*(x-1) + 4.
What does it mean? Well, the scaling has a minor effect. They've reduced their granularity. So assuming they have the same distribution as you do (like, they award as many 10s as you do and they award as many 4s as you award 1s and everything in between), their step size is 0.666 per grade, whereas yours is 1. So, they're going to exhibit less deviation than you do. Think of their distribution as being "scrunched" up. If someone else said "well geez, I can't give U2 anything less than an 8" the maximum their data could deviate would be 1... 9 +/- 1. Does that matter when pooling you guys? No, not really. Not as long as you're consistent along your entire data set and so are they. Their contribution to the deviation of each data set might not be as much as it should be if I normalized their results before pooling (which I can do). But, the relative deviation from one song to the next
will remain the same.
And that's the important point here, if that remains the same, the order of the rankings doesn't change. It just means that the actual estimation for song value is a mild percentage off of what it should be.
And that's actually what brings up the second half of where I think neither of you are incorrect. The relativity from one song to the next is fixed, regardless of what scale anyone is using. Boots placed where it did because, relatively speaking, that's where people put it (on average). It likely *did* do better than expected, to Luzita's point. 36 places above last place by the original ranking's merit, *with* a considerable deviation, indicating people probably disagreed. If you look at the histogram, you see two peaks. One around 4 and one around 6. Something like that is bimodal. The average is 5, but nobody felt it's worth a 5. People either thought it was worth more than that or less than that.
NOW, to the point about
shifting. Someone who chooses to rescale their scoring options *does* shift the scoring up. They did so by that whole (1/3)*(x-1)
+ 4. They have actually *skewed* the distribution somewhat. Why isn't that a problem? Well, it would be a problem if we told them they had to be normally distributed, because your distributions wouldn't be compatible. If you said x amount have to be a 5, y amount have to be a 10 or 1, I would have to rescale their numbers before adding your distributions together. Otherwise, the overall distribution would start to pull right. Why do we accept that right now though?
It's a result! If someone was given a loose definition to say 1 is the absolute worst music and 10 is the very best and they can't help but say everything is a 7/8/9/10 because they love U2 so much, there's an inherent bias of the participant. That should be expected... our sample is a sample filled with diehard U2 fans. Perhaps the problem statement should never have said "best of U2's music" and "worst of U2's music," as that seems to suggest there should be a normal distribution (some should be best, some should be worst, some should be average). But that's also why I explicitly said you're free to interpret it as you wish.
So why isn't this shifting important? I think the thing is, you're using your definition of 1-10 and applying it to the global definition, which is sort of undefined. A 90 isn't an A, 80 B, etc. etc., just higher = better, lower = worse. Bottom line, I could rescale the results such that average = 5 on a scale of 1-10, no problem. I could rescale it from 0-10, too. Or from 1-100. I couuld've left it as total points received, instead of averaging it. The number it has is mathematically arbitrary. The only thing that matters is the relative number a song has versus another song, and that you're all consistent with your own approaches.