NLTimes – our final starting point for a lifelong journey.

The first step may be the hardest, but remember it takes many more steps to make a journey.

In the past two months, we have often talked about the role of the audience in modern journalism. It shouldn’t therefore come as a surprise that during our final fact checking project, we came to realize that it is up to the audience to decide if they want to use this medium or not. Our advice is to use it as a starting point, but to be aware that you need to look beyond the information presented to you there in order to get the full picture.

The Medium: NL Times 

Founded in 2013, NL times ( is a English-writing news medium based in the Netherlands, it focuses on Dutch news and writes short but snappy reports about different aspect of the country, including politics, business, sports, health, and weird news. With three international students in our group, it was the logical medium to choose because it is a good starting point for foreigners to know what’s going on in the country  they currently live in.

The Articles:

  1. No sick days for over half of Dutch employees by Ingrid Grinstad, published on 2014-11-24.
  2. Dangerous levels of B6 found in multivitamins by Janene van Jaarsveldt, published on 2014-11-24.
  3. Psychological problems cost Dutch business 20 billion euros by Janene van Jaarsveldt, published on 2014-12-02.
  4. Over 19 pct. Drop in Auto sales by Ingrid Grinstad, published on 2014-12-03.
  5. New police reports filed with digid by Janene van Jaarsveldt, published on 2014-12-01.

“No sick days for over half of Dutch employees”

The fact that the article contains so many statistical claims, immediately raises questions in Wernard’s head about the validity of those claims. After crosschecking the sources, he found that two out of nine claims were unconfirmed. The editor was willing to change one of them but judged the other a semantic issue. Regarding the choice for the title and the way NLtimes reporters write stories from large data reports he said the following:

“The choice of the title is because in our opinion it is the most interesting piece of information that we could accurately write in a fairly short headline that fits the design of our website.

We choose what to write based on what we find most interesting, what we consider to be most relevant, and sometimes with a consideration for details that may play out long-term.

Dangerous levels of B6 found in multivitamins

Gabriele found that the article manages to provide quite accurate information about vitamin B6. Most facts presented in the article are valid with additional possibilities to be more accurate. Nonetheless, it is important to stress out that small accuracy mistakes count. The opening sentence, which is quite strong and brings negative connotation to the readers, can be considered false and misleading as it might signalise false message about the vitamin industry. The headline and the text match well but evoke the question if those dangerous levels are found in the Netherlands, Europe, or somewhere else. Moreover, some key words indicate vague message as the journalist could have indicated what kind of vitamin pills were examined and which of them indicated excessive level of B6. Had the journalist provided another source and point of view, a more solid foundation for a trustworthy article would have been created. Interestingly enough, the journalist doesn’t mention that this article is a direct translation of the press release provided by the Consumentenbond. By refraining from providing a link to the actual results of the research, the article gives an impression that the journalist did not find it important to disclose which multivitamins are “dangerous”. All in all, this article brings a deeper level to the question regarding trustworthiness of the source used in the article. Even if the results mentioned in the article match the results in the research, it does not valid that they are correct. This kind of investigation would overstep fact-checking process as additional research would be needed. This press release raises the question what was the motivation to conduct this research and publish a press release. This could be a great idea for future investigation of the same topic, research reliability of Consumentenbond.

Psychological problems cost Dutch business 20 billion euros” 

April found that the article retrieved only partial information form the original OECD report, which resulted in a biased story. Visualization and biased words also contributed to the prejudged statement.

Not mentioning sources puts the journalists’ credibility at risk, especially when they involved a politician in their report. Providing a scenic photo of the event might be the most common solution to deal with the political thing. Providing the link and referring to the source would have been an effective approach to avoid this risk.

In short, this article gave basic but inaccurate information. Framing is used to represent the attitude of the journalist that leaves April to ask the following question:

“How to demonstrate a standpoint that represents the comparative information in line with the original sources when a journalist write a news based on a specific study?”

Over 19 pct. Drop in Auto sales

Than found this article to basically be a summary of different reports from different organizations. The information was collected online and the statements were quoted from somewhere else. There are two similar articles from other media were posted one day before.

The articles share something in common, especially the data. However, the author of the article being fact checked misused an important number. There is a reasonable suspicion on the journalist that she has not read the original report from AUMACON, and she just copied the words from the second article above and misunderstood the meaning of the data. The author analyses some possible reasons of the phenomenon, and she claimed that what she said was quoted from some trustable sources. However, the original sources of the quotes cannot be found. There are also no enough supports to the journalist’s statements in this article. Overall, there is no original researches done by the author, and it did not offer a valid level of analysis to the issue. The validity of the statements were not proofed. There are many mistakes in this article, for example the use of the data. In summary, there is a suspicion on this article of misleading readers due to the big attracting number which was wrongly used in the beginning and the unconfirmed claims that the author made in the article.

New police reports filed with digid

This short article left Wineke with more questions than sentences. What was the background of these bold claims made about past and present problems and future improvements? With no links or source given, she had to look for other articles on the same topic and found several government publications that answered her questions. One was a report about current problems published last summer by the Inspection of Safety and Justice, along with a promise by the minister to make improvements, the other an official statement by the national police department. She suspected the latter had been used as the source, and that much information was lost in translation. A mail from the editor confirmed and explained this:

In our opinion, we source the information to the police department. Perhaps this could have been clearer, but we are satisfied with the work. Further, some articles are long, some are short, and that is a decision made at an editorial level based on many factors.

It would be wise to also keep in mind what other news stories are going on that day when considering another story’s word count. (…) you can safely say that December 1, 2014, was an extremely busy news day. We marshalled our resources and placed more emphasis on the stories mentioned above.


Our analysis has lead to some interesting findings, both positive and negative.

On the negative side:

  1. Source reliability was questionable or unclear at times.
    • Although it is an online medium, no direct links were given to any sources which we found odd.
  2. Unvalidated information was presented as valid.
    • Articles contained statements or information that was incorrect or not accurate enough.
  3. Grammar and spelling error were found.
    • Apart from making the site look bad, it could mislead users.

On the positive side:

  1. NLTimes has been quick to accept corrections when errors were pointed out.
    • We have noted before how important it is for media to acknowledge their mistakes these days.
  2. NLTimes delivers news in a clear and fast manner.
    • Given how it means to be a place where foreigners can quickly check the Dutch news, this is important.
    • However, this is likely the reason why some articles were too short or biased to convey the full story.

Our advice to readers of NLTimes is therefore that it is a good site to go to if you want to know what topics are currently discussed in the Dutch news media. However, in order to get the full story it is best to engage in conversation with the people around you.


The truth, the whole truth, and the occasional fiction

“What event in the news did you read last week that turned out to be incorrect?”

In an ideal world, answering this question would require significant thought, research, and fact checking. In reality, I asked this question over diner and my sister-in-law provided me with an immediate answer: “I’d read Cesar Milan, the dog whisperer, had died but that wasn’t true.”

I’m not sure which of the following points scares me most:

  1. A casual news reader with no interest in journalism can immediately provide a recent example.
  2. My first reaction being ‘Oh, that’s just one of those silly Internet death hoaxes.”
  3. Even articles about the dangers of misleading data journalism can themselves be misleading.

The answer is that it probably depends on which cap I’m wearing, so I’ll address them all below.

Articles about the dangers of misleading journalism can themselves be misleading.

In it’s article  ‘The Power of Data Journalism‘, the Harvard Political Review writes the following:

“Many prominent media outlets such as the New York Times unintentionally misreport data predictions when they report to the general public. For example, this article falsely asserts that Nate Silver has “already decided the election.”

The referred CNN opinion piece however actually poses a question:

On Election Day in 1980, when news outlets reported early that it looked like Reagan was going to beat Carter, voter turnout in California dropped 2%. Now we’re reporting the results weeks, even months, before voters show up at the polls. Why get excited about voting? Nate Silver has already decided the election, right?”

As a student of journalistic data analysis, this point concerns me the most. However, it shouldn’t surprise anyone who has followed this course since right from the start we were told that the principles of verification are timeless and can be applied to any situation” (Steve Buttry). Evidently, this is also true for articles about data journalism itself.

Internet Death Hoaxes are common place

Exactly when did it become normal to consider fake death reports common place? They have been around for decades. In 1969 rumors started to surface about the death of Paul McCartney, but in the last few years they have become so common that they’re now an Internet meme. This troubles me as a human being. Death should not be treated as a joke. During the last year, my mom suffered from cancer and every time a death came in the news it reminded us of our impending loss. Journalists in my opinion should therefore be careful before publishing this type of news, not only for the fans but those who’re struggling with loss themselves.

Readers are used to reading incorrect news all the time

It is said that non frequently occurring events tend to be remembered better than everyday events. Unfortunately, both my sister-in-law and guest lecturer Carel van Wyk presented us with recent examples. This suggests that this happens far too often. Carel van Wyck gave us four types of unreliable news:

  1. Reliable of information
  2. Reliability of wording
  3. Reliability of sourcing
  4. Reliability of visualizations

And it wasn’t so hard to find examples of all of them.

Reliability of information

The Harvard Political Review article is a good example of this. By presenting a question in an opinion piece as an assertion, it misrepresented not only CNN’s point, but their own as well. It also didn’t help that they used CNN as an example straight after name-dropping the New York Times.

Reliability of wording

“The Best Holiday Shopping Partner: A Capuchin Monkey”

When reading this headline at I couldn’t resist clicking on the link, but I was disappointed. Instead of an organization which rented out monkeys for the holidays, it reported recent research which showed that capuchin monkeys cared less about the prize of things than humans did. An interesting article for sure, but in my opinion this headline didn’t live up to its expectations.

Reliability of Sources

One of the news sites that broke the false news of Cear Milan’s death was Distrita, which calls itself an independent “new and fresh magazine portal for electronics, travel, media and lifestyle.” In their apology, they offered the following explanation:

“Its a trend to post news about people die on social networks and my source Noticiasunam made me think its true.”

Noticiasunam is a satire site, and very open about this. It showed that the people behind Distrita were so eager to publish this news, that they did not think to check their source. This is unlikely to happen again in the future, but for now a good example of how sources can prove to be very unreliable, particularly when they set out to present false information.

Reliability of Visualizations

For this example, look no further than the one given in my second blog, which mentions how created a very misleading visual map by not verifying their data.

Verification and reliability go hand in hand

All four examples given here could have been solved simply by properly verifying the information that was given or presented. Journalists aren’t perfect and media struggle with deadlines, but in the end… what matters more to the public? A reliable medium that allows you to check the news without wondering whether it is all correct? Or one that posts all the exciting rumors straight away and gives you lots of gossip, but not enough facts? The divide used to be clear, but with the onset of the Internet, journalists need to become more aware that if they want to belong in the former category, they should take the time to verify everything. Or be honest when they can;t.

The human factor – why facts are a work in progress

Fact checking is the bread and butter of journalism. The first principle of professional ethics in journalism is people’s right to true information. This leads right into the second principle: the journalist’s dedication to objective reality. In short, journalists are expected by everyone, including themselves, to verify every statement they publish.

In practice, this doesn’t always happen which is why we’re used to editorial comments and corrections in news papers and on internet news sites. There are various reasons why journalists don’t always check the facts. The following three justifications were given in a recent Dutch study:

  1. Explicit accordance:
    1. We followed the rules, so it’s not our responsibility if our report turns out to be false after all.
  2. Practical accordance
    1. Not enough time, resources, or money was available to check all the facts.
  3. Exceptional divergence
    1. there was no reason to check the facts because this clearly wasn’t real news.

Speaking as a programmer, these justifications sound all too familiar. If you ever wonder why computer programs you paid good money for have bugs in them, just take a look at these justifications above. They are as true for programmers as they are for journalists. In fact, they are probably true for most professions since these three justifications can be summed up as follows: Humans aren’t perfect.

Unrealistic expectations 

One of life’s ironies is that imperfect people expect others to be perfect. In fact, they often expect themselves to be perfect… provided the world around them cooperates. And when they don’t, well… clearly this isn’t their fault. Personally, I don’t object to that point of view as it leads to one very important thing: humans strive for the unattainable.  Objectively, the two principles above may be unattainable, but in practice most journalists do strive to adhere to them and are  held to account by both public and editors if they fail to do so.

But what about facts?

If people have unrealistic expectations about journalists, what about facts? Do we hold unrealistic expectations about them as well? In my opinion, yes.

According to, a scientific fact is defined as follows:
“any observation that has been repeatedly confirmed and accepted as true; any scientific observation that has not been refuted”

This works well in science, but obviously not so much in the real world where some events only happen once and cannot be observed from afar. Journalists have found ways to deal with these limitations, such as the rule of thumb that a story isn’t published unless it has been confirmed by two independent sources.

However, data journalism deals with data and thus allows its professionals to take a more scientific approach to facts, right? No matter how many times and how many people run the same numbers, the result should always remain the same. Unfortunately, the numbers aren’t the starting point but the end result of a human process.

Example: Figures from the Dutch Ministry of Education

Each year, the Dutch ministry of education presents the numbers of youths who leave school without a starting qualification. The numbers for each previous school year are to be handed in by the regional and local authorities before november first. Meanwhile, the national ’Service of Education Execution’ (otherwise known as DUO) also hands in the number of student registrations that are sent to them by each school. Based on these numbers, the ministry officials calculate the national, regional, and local numbers that are used by the government to determine policy.

On january 16, the Dutch government published the following statement on its website (translated from Dutch:

“The number of premature school leavers has dropped significantly during the last school year to just 27.950 youths…. … On the one hand, this decrease is due to the combined efforts of schools, local governments, and other partners. On the other hand this decrease is due to the better measurements which clarify which youths truly leave school prematurely.”

The government admits that a new statistical measurement allowed these numbers to drop. However, what they don’t admit is an observation that was made a few weeks ago by someone at a meeting between DUO and the programmers which develop the software used by local governments:

“The problem with the government is that when they ask a question, they expect an immediate answer. They don’t understand that the data first needs to be gathered.” 

In other words… the numbers presented here are due to people being told that they needed to register their work.  However, people from different areas used different software programs. That is why the government and the different software companies got together in order to try and determine what exactly needed to be registered. But this was of course only one difference. The main difference was the different work processes used by the various local authorities. A big city in the west of the country for example automatically excluded youths that had found a job. A big, underpopulated region in the east automatically sent mail every 6 months to verify whether employed youths without a diploma didn’t want to get back into school after all. Different circumstances require not only different actions, but different terms as well. And that lead to different interpretations of the same questions.

Facts are a work in progress

If a good fact checker is a good reporter, how is he or she to deal with this issue? I’m afraid I’m just going to have to repeat myself from earlier columns: keep the bigger picture in mind and delve deeper into the subject. Do not accept the facts, but ask yourself how they came about and be open about these questions. Both to yourself and the audience.

Biased but Balanced – honesty in the media

Is true honesty the price of democratic representation?

This claim was made in 2012 by the Columbia Journalism Review in an article which discussed the necessity for presidential candidates to “adjust their positions over their careers for political reasons.”

During the 2012 presidential elections for the USA, a controversy arose when incumbent president Obama appeared to come out in favour of gay marriage. Which is to say, the president himself was “struggling with the issue” while his vice president came out in favour. Previously, Obama had always opposed the issue. However, back in 2004 he had in fact indicated that he was in favour when he was a candidate for the state senate in Illinois.

The Columbia Journalism Review criticizes the media for framing Obama as someone who ‘evolves’ while referring to his opposing candidate Romney as ‘flip-flopping’. Romney had changed his position numerous times during his years in government and later as a presidential candidate in 2008 and 2012. Clearly, both men changed their tunes when the electorate changed in order to win over voters. The conclusion of the CJR was that politicians simply could not be honest all the time and this was the price to pay in a democracy.

Now maybe it’s just me, but I certainly hope that’s not the only reason why politicians ‘evolve’. When I vote for someone to lead my country, I expect them to be able to change their mind when they become aware of new evidence or developments. The ability to adapt is what has kept our species alive. It would be a truly frightening world if we expected our leaders to have their opinions set in stone and refused to consider alternatives.

Unfortunately, this attitude is reflected in Dutch media as well. For years I’ve read comments in the right-wing media about the Dutch Labour Party (PvdA) as the “Dutch Immigrant Party” due to its relaxed stance on immigrant integration. To the point that – according to the right-wing media – Labour seemed to forsake its own social-democratic heritage. However, lately the party has become more strict in this regard and two weeks ago this lead to two members of parlement leaving the party. Imagine my surprise when those same right-wing media published articles criticizing Labour’s party leaders for either betraying their own voters (Jalta) or for being hypocrites by changing their stance (Elsevier by means of Blendle). Shouldn’t they have applauded Labour for standing up to its own principles?

Of course, my surprise wore off when I thought of these incidents in the context of framing and media basis.


According to Entman, framing is.. “selecting and highlighting some facets of events or issues and making connections among them so as to promote a particular interpretation, evaluation, and/or solution” (Entman, 2004, p. 5). This is often done to highlight the interests of elites (Entman, 2004, p. 5).”

Jalta and Elsevier are right-wing media operations which cater to a right-wing audience. Even as they attempt to  objectively describe developments that they have historically been arguing for, they will frame it in such a manner that their readers will not consider voting or supporting Labour in the next election.


Everyone is biased. Full disclosure: I’ve spent a year tutoring immigrant children in the ‘Schilderswijk’ in the Hague. Those children gave up their free time in order to gain a better grasp of the Dutch language and school system. It was very rewarding for all involved since their grades and thus chances of getting into good schools improved. However, as a result I am slightly biased both against immigrants who’re not willing to make that effort and against native Dutch speakers who complain about immigrants but don’t give up their free time helping them integrate. I thus support Labour’s attempts to further integration. There is thus a very good chance that in judging the articles mentioned above, I suffer from the hostile media effect. 

“The hostile media effect states that supporters of both sides will consider an objective story which details the struggle as biased against them.” – H. van der Kaa (Journalistic DatA Analysis course, 2014)

Are media biased?

If you ask the question this broad, according to Dave D’Alessio in his book ‘Media Bias in Presidential Election Coverage, 1948 – 2008′, the answer is bound to be “Yes”. Journalists, editors, publishers, and audiences all have their own biases. However, those actually cancel each other out… both within and across mediums.

Journalists tend to be more progressive, while publishers tend to be more conservative. However, both tend to be careful to suppress their preferences in order to live up to the ideal of journalistic objectivity. In the end however, newspapers and other media cooperations will only survive if they manage to sell their product to the audience and thus it is the audience which has the final say.

Is the audience biased? 

Most certainly so…  but because there are so many different audiences, there are so many different media. Which is why D’Alessio’s meta-analysis of 60 years of presidential coverage showed that The media is well balanced.

So where does this leave us?

“It is as important to know how people reason incorrectly as it is to know how they reason correctly. Once we understand that, it is possible that we can learn to teach people to reason correctly and accurately.” – Dave D’Alessio

Accept that you are a biased person, accept that your audience is biased… but also accept that both you and your audience crave for news to be both educational and informative and thus as objective as possible. Do not just read the articles, comments and books you happen to agree with. Read the ones that you firmly disagree with. Try to come up with arguments both against and in favor of your own views. Engage with your audience… both your supporters and detractors. You will evolve, your audience will evolve, and so will your political representatives. It is a fact of life, and your work will be all the better for it.


De ontsluiering van de Partij van de Arbeid – Syp Wynia (2014)

Lodewijk Asscher en de gerecyclede onschuld van de Partij van de Arbeid – Joshua Livestro (2014)

Media Bias in Presidential Election Coverage, 1948 – 2008 – Dave D’Alessio (2012)

Obama ‘evolves,’ Romney ‘flip-flops’ – Brendan Nyhan (2012)

N.B. Full disclosure: Dave D’Alessio is a friend of mine, and I’ve really wanted to use his book in a blog.:)

Modeling the real world ain’t easy

“Illustrations and graphics should be as smarts as the worlds in the newspaper.” (Edward Tufte)

When I tell people my thesis is about the evolution of morality they usually blink. Some ask for clarification, others joke, and most are eager to engage in a philosophical discussion. But absolutely no one assumes I’m basing my experiments on real people.

Data journalists aren’t so fortunate however. They report current events, and as such any piece of visualization they use has to represent the truth. Virtually no one realizes that a representation of the world is just that… a representation. It is just as much a model of the world as the one I use to test an obscure theory. But the context of journalism changes the audience’s perception, and thus their expectations. A good data journalist is aware of these expectations and keeps them in mind when choosing which visuals to accompany her story.

In his lecture on the main principles of datavisualization, Alberto Cairo gives two definitions for visualizations:
1. A graphical representation of evidence.
2. A tool for analysis, communication, and understanding.

Both stress the fact that visualizations are tools, to be used to better explain the data than mere words are capable of. It is a wonderful tool for sure, one with the potential to reach out across language barriers and show audience a pattern they otherwise might never have been aware of. However, like any tool, it should be treated with care and respect.

“Charts, graphs, maps, and diagrams don’t lie. People who design graphics do.” (Alberto Cairo)

It is very easy to tell a completely different story using the exact same data but two different graphs. Below is an example from… which graph  shows the greatest increase?


The answer is of course, neither… but because the right one’s Y-axe starts at 48%, the impression is given that the data to the right is far more volatile.

When you look at these graphs side by side in the context in this blog, the difference is easy to pick up. But as I have mentioned in previous blogs… usually the audience doesn’t have the time or energy to study these things in detail.  And those that do have the time, tend to notice these things and publish them. It is therefore the journalist’s responsibility to ensure that graphs and other forms of visualization are always correct. Not only to protect one’s audience from drawing the wrong conclusion, but also to protect your reputation as a journalist.Fortunately, there are many tips out there on how to avoid these mishaps.

“Three rules to keep in mind (when choosing your graphics form)” (Alberto Cairo)

  1. Think about the audience and the publication.
  2. Think of the questions your graphic should help answer.
  3. Can you understand the graphic without reading every single number?

Think about the audience and the publication.

If you are a data journalist, chances are that your audience is highly interested in data processing, visualizations and the subject matter itself. As such, they will notice when you present a misleading visualization and they won’t be shy to tell others about it. Do not insult your audience. 

Think of the questions your graphic should help answer.

What will your audience do with this information? Will they share it with others? Will they try to see if it matches their own data on the subject? Or will they simply glance at it and either accept or reject it, based on their existing world views? And is that what you would want them to do with it? Cairo’s advice is to make a list of these questions and use that as a guideline to decide which graph to use. The Graphic Cheat Sheet is a great help for that.

Can you understand the graphic without reading every single number?

The right graph in the example above clearly fails this last test, as it can only be read correctly if the reader looks at the numbers. Of course, exposition is at times necessary, but it’s best to put that part in words, either in the main text or as an explanation near the graph itself. An example was made during this week’s class presentations, when mapping a warzone area turned out to be very difficult due to the constantly changing situation. It was therefore suggested that readers be told that this map was made at a given time at a given date, and would be frequently updated. Personally, I agree with that idea… better to acknowledge your limits than to pretend and be caught.

Do not insult your audience, they know your world ain’t real.

They say the first step in recovery is accepting you have a problem. Data journalists need to accept that visualizations have a big problem: they are open to abuse. It is therefore important to accept that whenever you attempt to visualize your data, you ask Cairo’s three questions and use the cheat sheet to choose the appropriate format. And when you come across any limits to your chosen form… acknowledge them openly on your site, so that they themselves may become aware of these limitations. Who knows, one of them might come up with a solution.

The Path to Hell is Paved with Poor Assumptions

Validating data may cost time, but refraining from it will cost more.

How wonderful is the life of a data journalist. There is so much data publicly available that whenever you whenever you are in need of a story, all you need to do is go to an interesting data repository and start questioning it and low and behold… you have a story. Now all you need to do is write it down in clear, readable prose and maybe throw in some exciting visuals and you’ve got something truly exciting that will surely get people talking.

Take this story at for example: “Mapping Kidnappings in Nigeria“. This story was published shortly after Boko Haram kidnapped over 300 schoolgirls that started the international “Bring back our girls” campaign. It features a nifty interactive map that shows how the number of kidnappings rose rapidly in the past decade. Naturally it created a lot of buzz on social media…


As I am sure you will agree, this is not exactly the kind of publicity any journalism organization would want, but is actually an organization that prides itself on being dedicated to journalism. Its foundational manifesto specifically states “… one of our roles will be to critique incautious uses of statistics when they arise elsewhere in news coverage.” As commendable a goal that is, it does give the impression that they would then think twice before publishing a story that refers to ‘media reports’ as ‘discrete events’.

I admit I do not know the precise number of articles devoted to the murders of presidents Lincoln and Kennedy, but when I typed in the phrase “president of the united states murdered” at Google, it yielded “about 41,700,000 results”. If we were to take all of those as discrete events, we’d all sincerely believe that being the president of the United States is the deadliest job in the world.

Yes, this veers into the ridiculous, but this is a ridiculous mistake to make, especially for an experienced data journalism organization.

In all fairness to however, they did live up to one of the hallmarks of good journalism, namely that of transparency. Rather than pull the article, they owned up to their mistake. The article is still available, but now it starts with an admission of guilt, followed by an apology and a long explanation of the many errors made.


So how come this mistake was made in the first place? Unfortunately, the editors did not explain that. However, the comments that I have read seem to agree on two things:

  1. “Poor proxy variables”.
  2. “Data has no meaning without context.”

Proxy Variables

The blog ‘Adventures in Statistics’ defines proxy variable as “… an easily measurable variable that is used in place of a variable that cannot be measured or is difficult to measure.” In this instance, the journalist relied on news reports about kidnappings since she could not do so on official police reports. Unfortunately for her, and her editors, she forgot that, paraphrasing Erin Simpson, “All trend analyzing using < a news article database> has to take into account the exponential increase in news stories which generate the data.”

In other words… if you’re using proxy variables, think carefully whether they really are applicable. In fact, it is probably best to check with both a statistician and an expert in the field you are covering. This will take time, but validating the data is a data journalist’s first responsibility.

Data without context is meaningless.

The original criticism by Erin Simpson actually provided some good questions. Had they been asked, answered and used in the original analysis, they would have yielded quite an interesting, and validated, story.

  1. “Total number of stories coded for Nigeria over time (what is the shape of that curve)?”
  2. “What are the total number of events generated for Nigeria over time? (What is the shape of that curve?)”
  3. “How does the number of kidnappings compare to the number of coded events? Same shape? Key differences?”
  4. “How many overall events are coded with a specific geolocation? How many get coded to a centroid? (And where is the centroid?)”
  5. “How many kidnapping events are coded with a specific geolocation? Does that change over time?”
  6. “How does this information track with other open source reporting? HRW, UN, WB local NGO crime reporting? Can we corroborate trends?”

So why didn’t they take the time to ask these questions? Data visualization expert Alberto Cairo offers several suggestions for data journalism organizations to help them prevent making these costly mistakes. In my humble opinion, they all apply to this case.

“Data and explanatory journalism cannot be done on the cheap.”

Traditionally, data journalists worked in large news organizations with an excellent network and many resources. Organizations like lack those and would thus struggle to find the required expertise in time before publication.

“Data and explanatory journalism cannot be produced in a rush.”

This was likely the most crucial element in this example. In an environment which needs stories to be produced daily, journalists may well not have the time to stop, think, and verify that the way they have questioned their data set is actually valid.

“Part of your audience knows more than you do.”

Of course, that has always been the case since journalists are not expected to be lawyers, engineers, or physicians. However, the combination of journalistic transparency and public data means readers can verify your conclusions and if they find fault with it, let you (and the world) know. It is an additional risk that data journalism organizations need to take into incorporate into their work processes and business models.

“Data journalists cannot survive in a cocoon.”

As professor Paul Bradshaw mentioned in his lecture on “Setting Up ‘Data Newswires'”, the accuracy of the data needs to be checked by asking the following four questions:

  1. Who collected the data?
  2. When did they collect it?
  3. How did they collect it?
  4. Find another source of the same sort of data for comparison.

In other words, the data journalist needs to either know the domain herself or work with someone who does. This actually could work out well for the organization in combination with the previous suggestion. By reaching out to devote audience members whom you have reason to suspect are experts, one can both increase audience satisfaction and the verify the validity of one’s data.

What about time?

In data journalism, you cannot afford a hidden trade-off between time and validity. Once you have gathered a dataset for a story, you always need to take the time to validate your data and make sure you ask the right questions. Not doing so can have disastrous consequences that will make people question the value of your organization.

One potential solution for this problem is to make your audience part of the process. Turn the story into a series that is updated regularly. Start with an introduction which explains what the data set and what the question you want to see answered is. Then invite audience members to collaborate with you. If they don’t have the knowledge themselves, they might know someone who does.

Of course, this does require another process of verification: whether these experts are indeed who they claim to be. However, over time you will build up a network of reliable experts you can count on.  Better to have them assist before, than criticize your story after publication.

If Data journalism cannot afford a hidden trade-off between time and validity, then it is best to be open about it and get people to keep coming back to you.


Bradshaw, Paul (2014). “Setting up ‘Data Newswires'”.

Cairo, Alberto (july 09, 2014). “Data journalism needs to up its own standards”.

Chalabi, Mona (may 13, 2014). “Mapping kidnappings in Nigeria” 

Frost, Jim (september 22, 2011) “Proxy Variables: The Good Twin of Confounding Variables”.

Simpson, Erin (may 13, 2014). “If a data point has no context, does it have any meaning?”

The Functional Art (may 13, 2014). “When plotting data, ask yourself: Compared to what, to whom, to where, and to when?”

When Worlds Collide in Journalism

The need for fact checking in journalism only grows as we become more aware of the world around us.

We live in the middle world, at least according to Richard Dawkins. Our brains have evolved to make sense of our direct environment and cannot fathom the rules of the cosmos or the microscopic. That’s what makes big data so fascinating, because it reveals patterns that challenge our intuition. We may be able to explain parts of it, but the entire picture? That is far too complex for us to understand, let alone explain.

That was the message I took home with me last Tuesday after attending the SAP big data college tour in Eindhoven. I’m fairly sure that wasn’t their main point – the benefits of working for SAP were mentioned a couple of times – but it nevertheless made the biggest impression. And it got me thinking as to how this applies to journalism.

We live in the middle world, but ours is not the same as that of our ancestors. In her critic of Miller’s article Kindness, Fidelity and other Sexually Selected Virtues Catherine Driscoll states that she finds it difficult to believe that a sense of ethics could evolve as a sexual signal because we all know how suitors may lie in order to present a false picture of themselves. Miller replied that our distant ancestors lived in small, isolated tribes, which made hiding your true self quite challenging. The critic thus was judging what happened in the past with her own mindset and was completely unaware of this.

Our own parents grew up in a world recovering from a world war, a world without home computers, but also a world in which news played an important role in everybody’s lives. Journalism became a force to be reckoned with in the 20th century and reliable news agencies and journalists were greatly respected. Nevertheless, if you visit you parents this weekend and ask for newspaper clippings from their childhood, you’ll likely be disappointed by the ‘mundane’ reporting. Journalists, and parents, are a product of their world and what they considered important and appropriate news may not be what journalists and young adults in our world believe it to be.

This does not make our parents small-minded, nor does it make us broad-minded, much as we’d like to think so. Generally speaking, we have become more aware of the world beyond our borders and how international developments may influence our own lives and vice versa. But we grew up in a world where globalisation has become the norm.

Note how I keep using the term ‘world’ here, not ‘time’. Different though they might be, our world is much more similar to our parents’ than to that of a poor farmer in Niger who risked his life trying to get to Europe last year. His story is told in the German periodical der Spiegel (the article is in English), and I highly recommend you check it out. It is a beautifully written description of a world completely different from our own, but which nevertheless interacts with ours in ways that we are mostly unaware of.

According to the Columbia journalism review, der Spiegel is “home to what is most likely the world’s largest fact checking operation.” Back in 2010, it had 80 fulltime positions for fact-checkers, most of which were consulted during or even before writers started on their articles. And when you see the type of articles that are presented here, you can understand why. The story blends vivid witness accounts with dry facts in a way that both moves and educates the reader. Without facts, a sceptic would write it off as a sob story, and without the witness accounts, particularly the last line, it wouldn’t have the same punch.

In Chapter 2 of the Verification Handbook, Verification Fundamentals: Rules to Live By, Steve Buttry states that journalists need to ask two questions when verifying stories:

  • How do they know that?
  • How else do they know that?

These days, people will often point to sources on the Internet to answer these questions. Which is why a good journalist tries to locate the original source, as detailed by Claire Wardle in her chapter Verifying User-Generated Content, also in the Verification Handbook. According to her, there are four elements to check and confirm content:

  • Provenance: Is this the original piece of content?
  • Source: Who uploaded the content?
  • Date: When was the content created?
  • Location: Where was the content created?

Personally, I believe another element needs to be examined as well:

  • Why do people check out this content?

Journalists are paid to take the time to examine the four elements, but ordinary people will often quote, link, or share a site because of the trust they (or their peers) put in it. And the reason for their trust is likely due to their world view, even if they themselves are unaware of this. Journalists however do need to be more aware of this, and try and explain this to their own audience.

The world has become more complex, for journalists as well as their audience. Before journalists ‘just’ needed to be experts in fact checking, interviews, and investigations. Now they are faced not only with much more data than in earlier decades, but they need to be aware of the meta-data as well and share this with their audience. An audience that is frequently too unwilling to accept that the big world they now live in, is in fact made up of a network of small, interconnected worlds.



  • Buttry, Steve (2014). “Verification Fundamentals: Rules to Live By”. Verification Handbook.
  • Driscoll, Catherine (2007). “Why Moral Virtues are Probably not Sexual Adaptations”. Moral Psychology volume 1 The evolution of Morality: Adaptations and Innateness.
  • Hage, Willem van (November 4 2014). Big Data College Tour at Eindhoven.
  • Miller, Geoffrey (2007). “Kindness, Fidelity and other Sexually Selected Virtues”. Moral Psychology volume 1 The evolution of Morality: Adaptations and Innateness.
  • Goos, Hauke; Riedmann Bernhard (October 21, 2014). “Death in the Sahara: An Ill-Fated Attempt to Reach Fortress Europe”. Der Spiegel.
  • Silverman, Craig (April 9, 2010). “Inside the World’s Largest Fact Checking Operation”. Columbia Journalism Review.
  • Wardle, Claire (2014). “Verifying User-Generated Content”. Verification Handbook.