November 20th, 2007
While on the netflix website, we couldn't help but notice that so many of the movies had a 4 ("really like it") or 5 ("loved it") stars. While we are not expecting something amazing likenormal distribution according to the central limit theorem, we have to wonder why such consistently positive reviews? Are movies on Netflix *that* good? Are Netflix users wearing rose-colored 3D glasses? Are users ashamed to admit they rented a bad movie? We think answer is all about semantics.
Rating Scales: A Primer
If you ever answered a questionnaire, there is a high likelihood you have experienced a Likert scale. Named after Rensis Likert, the psychometric response scale measures respondents subjective level of agreement to a statement. Scales can consist of 4, 5, 7, 9 or any number of points. Another way to think of it is a graphical scale. After a question, draw a horizontal line. Now, place anchors on each side -- great and terrible. Now, let a respondent rank the statement "How was dinner?". Ask them to draw a dot anywhere on the line that best represents their opinion of the question. Now, break the line into 5 equal sections. Wherever the dot falls most approximately would be the corresponding point on a 5-point Likert scale.
What is wrong with the Likert scale?
The problem is that Likert scales are subject to several types of distortions
The Netflix Rating Scale
So what are the specific problems with the Netflix rating scale?
For one, anchoring. While a Likert scale is a semantical differential, "have/love" are strong words indeed. Why must it be so black and white? We would recommend something more along the lines of "Really Liked" and "Really Disliked."
There there is labeling. The 2 and 4 points are labeled "Didn't Like It" and "Really Liked It". Do these sound like even-weighted labels? Perhaps if they had called the 2-point label "Really Didn't Like It" to correspond to the 4-point, or, better yet, changed "Really Liked It" simply to "Liked It". However, this was reserved for the the neutral point. That brings us to the third and final problem:
The central point being labeled "I liked it." Typically, odd scales have the central point neutrally labeled. Even scales have no central point and thus force a choice. However, Netflix's central point indicates a positive preference. In essence, the scale is rigged for a positive response. We recommend they strike the 3-point all together, or add another negative point for a force choice 6-point scale.
What Netflix does right
While some statisticians prefer more points on their scales, Netflix plays it smart and limits to five. Limiting the number of points increases the reliability of the measurement. Meaning, if you were ask someone the same question again, the likelihood of them providing the same value on a five point scale is greater than one with more points. Imagine trying to answer the same value on a 100 point scale several weeks later.
Also, limiting the number of points provides each value weighted with more meaning. The difference between a 87 and an 88 is slight, while 4 to 5 is a 20% difference. However, too few points can also create a loaded question. A 2-point test (aka yes/no format) forces a response that may not best represent the users opinion.