Things can be anything; ideas, comments, images, videos, discussions, questions, answers, books, friends, relationships, movies, pizzas, game of thrones characters, and everything imaginable. If there is a way to solicit data about the “quality” of the thing in question, then that thing can be ranked. I will use the word “things” to mean anything that can be ranked for the purpose of this post.
Ranking things based on up/down votes is very important for a lot of businesses –
Reddit ranks comments, yelp ranks local businesses, facebook ranks stories among many other things, quora ranks answers, stackoverflow ranks questions, and answers, youtube ranks videos, amazon ranks everything, ebay ranks everything, imdb ranks movies, goodreads ranks books, and so on.
Ranking based on up/votes or star ratings is kind of difficult to perfect –
There are a few challenges to selecting the best set of “things” :-
- Because it is kind of fuzzy. For example, what is a better thing, ‘A’ with 10 up votes and 5 down votes, or ‘B’ with 12 up votes and 6 down votes, or ‘C’ with 100 up votes and 60 down votes, or ‘D’ with 1000 up votes and 700 down votes? Volume is different. Ratios don’t work perfectly well. Averages don’t work as they skew because of outliers. Higher order polynomial equations might work little better (e.g. up^2/total) but are unpredictable when the volume of votes for ideas is different.
- Some things can be repeated or be very similar to one another.
- Every thing doesn’t get same number of votes.
- Some are submitted earlier than others therefore some ideas have a longer evaluation span.
- Not all things get equal attention as a result of page layout or other user interface choices.
- There is no way to determine which portion of your content producers (most of the time, your customers) the next big thing might come from. It might come from anyone and not just your star performers.
- Different things are evaluated differently. Movies may have 10-star ratings, a book may have only text review as a rating, comments may only have up vs. down rating, etc.
There maybe even more challenges in rating that we are not aware of right now.
Wilson’s algorithm (described in the previous post) is one way to rate things. It takes up and down votes received by the thing as its input and outputs a score. The algorithm for computing the score is given as follows –
where C is the damping coefficient which is usually determined by the selected confidence interval. When a 95% confidence is selected, C works out to 3.8416.
Selecting a C which fits all cases of observable data is difficult because there will always be cases which don’t “feel” correct. For example like in a case of [u,d]=[3, 0] which intuitively feels like it should be lower than [20,15]. To illustrate this look at the attached image. It shows 4 scenarios with varying C values.
- C=3.8416 – For this C an idea with [5,7] is almost the same as one with [20, 50] which also doesn’t make “intuitive” sense. But you will see that not much makes sense even for differing values of C.
- C=20 – For this [10,0] is almost same as [20,10] and [7,5] is almost same as [20,50]. Doesn’t make intuitive sense.
- C=40 – For this [30,0] is almost same as [50,15] and [15,7] is almost same as [30,50]. Doesn’t make intuitive sense.
- C=80 – For this [12,0] is almost same as [20,15] and [30,0] is almost same as [50,30]. Doesn’t make intuitive sense.
I haven’t tried higher values but I suspect this will be true for higher values also. One reason for this is because we “feel” something should be so and something shouldn’t. For example [3,0] should be lower ranked than [20,15]. Maybe we feel [8,2] should be ranked equally with [20,15]. IF we can somehow conclusively say [u1,d1] should be equivalent to [u2,d2], or [u3,d3], …, [un, dn] and so on, then we may be able to find a value for C. C can be simply computed using Wilson’s equation as, where f1 and f2 are the Wilsons algorithm functions. Solving these equations for C will give a value of C that we “feel” is correct. The hard part is how to come up with u1,d1,u2 and d2 which feel correct given that no C can ever fit all possible scenarios.