Thread:Eppidiah/@comment-5116140-20121201171007

HI Eppi, jumping over here since that thread was getting quite long.

All your points are valid and well thought out.

1. The fusion data analysis does indeed require adjusting for the high-value reporting bias you mention. This problem is actually somewhat managable since the "most coveted dino status" usually only exists for a few weeks after introduction. Just like pollsters at election time, we can actually measure the bias (take a two week snapshot from the archives) and apply filtering. Much of that kind of bias gets smeared out naturally in time. Also even though some results get reported more than others, since we only want relative probabilities, some of that naturally cancels out. Suppose one cell gets twice as many reports due to "being exciting"

Well 2n/2N = n/N  [ problem cancels out!]

2. You are also correct that the very best data is in fact the pure multifusion data, (since such data has no bias of any kind) but I am hoping to crack the code using only the reported data, even if a significant part of the reported data is unacknowledged targeted fusion.

This way, every wikicon is free to contribute. The alternative seems to be to propose a rule that says if you are using  targeted fusion, don't report at all! But that also has its drawbacks!

Not only could it really dampen enthusiam (not good); it necessarily forces the same ackowledgment that the technique exists).

Will definitely think some more about this.  