Rating Differences T-Tests

The figures below show statistically significant differences between review tools for these news reports. An equal sign (=) indicates that differences between groups are statistically insignificant (based on a T-test). A left bracket (<) indicates that the average rating for the group in the left column is significantly lower than the rating for the group in the top row; conversely, a right bracket (>) indicates that the average rating for the group in the left column is significantly higher than the rating for the group in the top row.

Table 2. Rating Differences T-Test – Hi-Q News Report

Rating T-Test Table - Hi-Q report

In this table, detailed and short review tools elicit the same response as the full review tool, while being slightly different from each other (this is possible because though a rating of 3.30 appears much higher than one of 3.03, the difference could be the product of chance). Most notably, we find that the short review tool generated higher ratings than the detailed review tool, and that mini-review ratings were statistically higher than the rating for any other tool.

Mini review tool ratings are the least conservative when assessing high quality news stories. One interpretation of this result is that when people only have one question to answer, they may tend to give a more generous rating. On the other hand, asking more detailed questions about quality leads to somewhat lower ratings. Perhaps detailed questions cause reviewers to think more carefully about their rating, and give a more accurate score to the story. Alternately, it may be that more moderate scores for the additional questions are diluting exceptionally high or low overall ratings.

Table 3. Rating Differences T-Test – Lo-Q News Report

Rating T-Test Table - Lo-Q report

When looking at how people rated low quality news stories, we find other interesting patterns. Even though the average rating for the mini-review tool (2.86) appears higher than that of the full review tool (2.56), the difference is not statistically meaningful. This difference could be a result of the relative low number of reviewers using the tools or of large variations in ratings among reviewers within the two groups. Again we find that the detailed review tool elicits a rating comparable to the full review tool. Finally, the short review tool evoked significantly higher ratings than either full or detailed tools, and was quite similar to the mini-review.