Quality Ratings - AverageSurvey Report - Summary


Overview
NewsTrust and Michigan State University conducted an online survey in December 2005. The purpose of this survey was to learn how people rate news stories, and to develop reliable online review tools for NewsTrust's news rating service.

Be sure to check the independent research paper on this study, co-authored by Cliff Lampe of Michigan State University and Kelly Garrett of University of California at Irvine. Also check our test results from previous studies. For more information about this survey,

Here are our key findings from this survey:

  • most respondents accurately rated the quality of news stories using NewsTrust's review tools.
  • review forms with multiple ratings offer greater accuracy and satisfaction than single-rating forms.
  • longer review tools tend to generate more critical ratings than shorter reviews tools.
  • respondents with limited experience rated as reliably as more experienced reviewers.
  • more respondents participated in the shorter reviews than in the longer ones.
  • respondents were more satisfied with longer review tools than shorter ones.
  • most respondents said they were interested in the NewsTrust service.

Survey Design
The purpose of this survey was to answer these research questions:

  • What is the optimal review process for rating news stories?
  • How can we streamline our review process to reduce the burden on the reviewers, without sacrificing review quality and accuracy?
For testing purposes, we created four different versions of our news review tool:
  • Full Review Tool (13 detailed and generic questions)
  • Detailed Review Tool (8 detailed questions)
  • Short Review Tool (6 generic questions)
  • Mini Review Tool (1 generic question)
The review tools included different sets of rating questions:

NewsTrust Review Tools
Full
Detailed
Short
Mini
Quality Type Rating Question (scale of 1-5)
Review
Review
Review
Review
Information How much new information did you get from this story?
 
Evidence How well does it support its points with factual evidence?
   
Transparency How well does this story identify its sources?
   
Diversity How well does the story seek out diverse sources?
   
Credibility How credible are this story's sources?
 
 
Fairness How fair is this story?
 
 
Balance How well does this story represent all important viewpoints?
   
Facts vs. Opinions How well does this story seek out facts, rather than opinions?
   
Accuracy How accurate is this story?
 
 
Clarity How clear is this story?
   
Originality How original is this story?
 
 
Context How well does this story help you see the "big picture?"
   
Overall Quality How do you rate the overall quality of this story?
Respondents were asked to answer each of these questions with a rating on a scale of 1 to 5 (1 = low quality; 5 = high quality).

We tested each of our four review tools with two types of news: For each story type, we selected an original story ("high-quality" version), then carefully edited it to generate a second, degraded version ("low-quality"). Each review tool was then tested with these versions:

We originally planned to select "high-quality stories" which we thought would score between a 3 and a 4 rating (on a scale of 1 to 5), and degraded them into "low-quality stories," so they would score about a point lower, between 2 and 3. The actual stories we ended up presenting to survey respondents were rated by our editors a bit lower than planned (e.g.: original news report = 2.7 average; degraded news report = 2.2 average), but with a consistent 0.5 rating difference between "high" and "low" quality. The above links show the stories as they were presented to reviewers (without publication names or bylines, so the stories would be judged on their own merits). Links to all 16 survey forms are provided here.

Survey Methodology
An email invitation to participate in our online survey was sent December 15th, 2005 to about 12,000 respondents in NewsTrust's mailing list. This online survey lasted one week.

About 1,011 people responded to this invitation and completed the survey (8% response rate). These respondents were originally recruited through previous surveys conducted from March to May 2005, when they had expressed interest in participating in future surveys about NewsTrust. Most of these respondents were political activists and/or avid news readers invited by civic groups MoveOn.org and MediaChannel.org (as a result, most respondents identify with a liberal political viewpoint).

Participants were assigned one of our four stories to review, using one of four review tools, selected at random. We did not tell participants about our story quality assumptions. The email invitation for respondents assigned to a Full Review informed them that the survey would take about 20 minutes to complete; other respondents were told the survey would take 15 minutes for the Detailed and Short Reviews and 10 minutes for the Mini Review. Transcripts of our survey invitations and review forms are provided here.


Survey Findings
Here are our key findings from this survey.

Quality Ratings
Our tests indicate that the proposed NewsTrust review tools allow citizen reviewers to accurately discriminate between high and low-quality content, even with relatively few reviewers. As shown in the first graph below, participants using NewsTrust review tools gave higher ratings to higher-quality stories (green bars) than lower-quality stories (yellow bars). The second graph shows that each of the four review tools was effective for determining news quality. This supports our hypothesis that citizen reviewers using our tools can effectively differentiate between good and bad journalism.

Quality Ratings - Average
Quality Ratings by Review Tool

A more detailed view of ratings distribution by review tool is shown in our distribution histograms for the high-quality news report.

The table below shows the numerical rating differences between the high and low-quality stories. It also demonstrates that the rating differences associated with each tool were statistically significant. This means that these differences should be consistently reproducible.

Quality Ratings Hi-Q Lo-Q Rating Statistical
Total
by News Type Rating Rating Difference Significance
Respondents
  (1-5) (1-5)   (T-Test)  
News Report          
Full Review (13Qs) 3.0 2.6 0.5 p < 0.05 81
Detailed Review (8Qs) 3.0 2.6 0.4 p < 0.05 97
Short Review (6Qs) 3.3 3.0 0.3 p < 0.05 126
Mini Review (1Q) 3.7 2.9 0.8 p < 0.001 114
News Report Average 3.3 2.8 0.5 p < 0.001 Total 418
         
Blog Post          
Full Review (13Qs) 2.7 2.4 0.3 p < 0.05 133
Detailed Review (8Qs) 2.5 2.0 0.4 p < 0.01 135
Short Review (6Qs) 3.1 2.6 0.5 p < 0.01 146
Mini Review (1Q) 3.0 2.7 0.3 p < 0.05 179
Blog Post Average 2.8 2.4 0.4 p < 0.001 Total 593

Note: The Statistical Significance of Difference (p-value) is based on a T-test and describes the likelihood that the observed differences are due to chance. For example, if the significance of the difference in score has a p-value of less then 5% (p<.05), then the differences we observed occur by chance less than 5 times in 100.

Overall, these results indicate that respondents using the shorter review tools tend to rate stories higher than respondents using the longer review tools. Conversely, longer review tools tend to invite more critical ratings than shorter reviews. This could be due to the fact that the longer review tools focus the reviewer's attention on more evaluation criteria than they typically consider in casual readings. Alternatively, it may be that the longer tools artificially depress ratings by encouraging reviewers to seek out flaws in the story.

More details about rating differences can be found in these T-Tests for the high-quality news report.

Ratings by Experience
Here are key differences in ratings between experienced and inexperienced reviewers:
Quality Ratings - Average
These results suggest that quality ratings from inexperienced reviewers are about as effective as ratings from experienced reviewers (e.g.: experienced ratings of hi-Q stories are only 0.2 higher than inexperienced ratings). The most notable difference is that ratings from experienced reviewers show a wider gap between high-quality and low-quality stories. This could suggest that experienced reviewers may be slightly more likely to have a strong opinion about story quality.

For the purpose of this study, experienced reviewers were selected based on these criteria: topic knowledge, journalistic experience, previous survey participation, news and Internet usage. Respondents scoring a 3.5 average for those criteria (on a 1-5 scale) are considered experienced, others are not. On average, 16 experienced reviewers and 47 inexperienced reviewers responded for each survey group (63 total per group).

Participation by Review Tool
Here are response rates of participants who completed each news report survey, compared to the total number of email invitations sent for that group (772 on average):
Quality Ratings - Average
On average, more respondents completed the shorter reviews than the longer ones. This could be because email invitations and survey pages gave different survey lengths (20 minutes for full reviews, 15 minutes for detailed or short, 10 minutes for mini).

It is also worth noting that many more participants completed surveys for the blog posting (59% of total responses) than for the news report (41% of total), even though the same number of invitations was sent out for both groups. That significant difference could be explained by the fact that the news report took almost three times as long to read as the blog posting (836 words for the news report vs. 308 words for the blog post). This suggests that story length may have a substantial impact on review participation (in this case, a 42% increase in participation for the shorter story).

Overall, the number of respondents who started, but did not complete their survey is quite low (about 10% of total surveys completed). And the average response rate of 8.3% across all review tools is on the high side for a consumer survey.

Satisfaction by Review Tool
Here's how respondents perceived the effectiveness of our review tools, in response to this question: " How well did this review tool help you evaluate the quality of the story?
Quality Ratings - Average
On average, respondents seemed more satisfied with full and detailed review tools than shorter ones. This could be because respondents prefer more in-depth questions, rather than generic questions. The comments section below offers more insights on this topic. Also note that the perceived effectiveness of the tools did not vary much between high-quality and low-quality stories.

Service Interest
Here is the quantitative feedback we collected about NewsTrust and our review tools:

Service Interest Total %   Participation Interest Total %
How interested are you in this service? Would you like to participate in this project?    
Very interested 235 23%   I would like to check the pilot site. 687 68%
Interested 467 46%   I would like to rate the news on the pilot site. 513 51%
Somewhat interested 157 16%   Notify me when the public site launches. 426 42%
Not very interested 78 8%   I would like to volunteer to develop this service. 84 8%
Not interested at all 29 3%   I would like to make a donation. 9 1%
        Please take me off your mailing list. 52 5%
Average Interest: 3.8          
             
Review Length       Tool Usage    
What about the length of this review? How often would you use this tool? Total %
A bit short 148 15%   Once a day or more 114 11%
Just right 683 68%   Once a week 349 35%
A bit long 84 8%   Once a month 118 12%
Too long 13 1%   Once a quarter 41 4%
Not sure 50 5%   Never 85 8%
        Not sure 299 30%

Overall, most respondents said they were interested in the NewsTrust service. Seven out of ten wanted to check the pilot site. Nearly half of the respondents thought they would use this tool at least once a week. Two thirds felt the length of the review was just right.

Comments
About half of survey respondents wrote comments about the NewsTrust review tools. These comments are summarized here, with sample comments in these 4 categories:

  • Positive Comments
  • Negative Comments
  • Editorial Comments
  • Technical Comments
These comments were very helpful in developing the NewsTrust pilot site, and many requested improvements were included in the latest version of NewsTrust's review tools. We're very grateful for all the thoughtful recommendations from the NewsTrust community.

Demographics
Here are key demographics for our 1,011 survey respondents:

Gender Total %   Education Total %
Female 455 46%   High school graduate 18 2%
Male 535 54%   Some college 163 16%
        College graduate 289 29%
Age Total %   Post-graduate school 532 53%
17 or under 2 0%        
18-24 20 2%   Income Total %
25-34 113 11%   Less than $25k 83 11%
35-49 239 24%   $25-49k 191 24%
50-64 440 44%   $50-74k 195 25%
65 or over 179 18%   $75-99k 126 16%
        $100k or more 185 24%
Journalistic Experience Total %        
More than 20-years 24 2%   Politics Total %
10-20 years 29 3%   Very conservative 3 0%
5-9 years 36 4%   Conservative 21 2%
1-4 years 80 8%   Moderate 144 15%
Less than 1 year 77 8%   Liberal 392 41%
None 739 75%   Very liberal 388 41%

Overall, respondents tend to be educated, mature, with average incomes:
  • 81% are college graduates
  • 62% are 50 years-old or older
  • 60% have a household income of under $75k
  • 75% have no journalistic experience
  • 82% have a liberal viewpoint

Conclusions
Here are our overall findings from this survey, based on the results above:

  • most respondents accurately rated the quality of news stories using NewsTrust's review tools.
  • review forms with multiple ratings offer greater accuracy and satisfaction than single-rating forms.
  • respondents with limited experience rated as reliably as more experienced reviewers.
  • more respondents participated in the shorter reviews than in the longer ones.
  • respondents were more satisfied with longer review tools than shorter ones.
  • most respondents said they were interested in the NewsTrust service.
The above results indicate that most respondents using NewsTrust's review tools differentiated effectively between good and bad journalism. From that standpoint, each of the four review tools was effective for determining news quality.

Statistically speaking, there aren't many differences in the accuracy of the different tools. But the differences that do exist suggest that the full and short review tools are a bit more accurate than the mini and detailed tools, when compared to our editors' expectations. On that basis, the short and full review tools show the least rating difference between experts and citizens, while the mini and detailed tools show the highest difference (accuracy statistics illustrating this finding will be added to the next version of this report). We also note that on average, people seem more satisfied with the longer review tools, while the shorter tools seem to perform better from a participation standpoint.

Overall, reviewers with limited experience appeared to rate the quality of news stories as effectively as experienced reviewers. At the same time, experienced reviewers gave slightly higher scores to the good stories and lower scores to the bad stories. Our current interpretation is that the effectiveness of each review tool may vary with the reviewer's experience. For example, reviewers with limited experience may perform better with fewer, generic questions (as provided in the short tool). Conversely, experienced reviewers may give better feedback with more in-depth questions (as provided in the full tool). Therefore, offering different review tools based on experience may increase the overall effectiveness of our review process. This leads us to conclude that offering the short review tool to new reviewers and the full tool to experienced reviewers may be most effective solution to boost their rating performance.

In conclusion, the current NewsTrust review process appears generally effective for rating news stories. Overall effectiveness could be improved by starting with short review forms for new reviewers, and over time introducing them to the full review form; full review forms appear most accurate, particularly for experienced reviewers. It appears that this multi-level approach could reduce the burden on reviewers without sacrificing review quality and accuracy.


Next Steps
NewsTrust is implementing many of the recommendations from this survey on its private pilot site, for testing in spring 2006. We plan to conduct future research based on the data from that pilot site. To participate in that pilot site,

To complement this summary, our full report can be downloaded here. Other reports are expected from our research partners. We look forward to discussing our findings with other researchers and interested parties. We are also happy to share our survey data with other researchers, upon request.

For more information about this project, please contact NewsTrust's Executive Director, .

Credits

Authors:
The following individuals conducted this research and the analysis of its results:

Advisors:
The following individuals contributed to the design of the NewsTrust review tools: Other contributors to this project include Evan Derkacz and Ming Liu.

Original publication date: 03/03/06. Updated on 04/08/06.