Data Journalism Pages on Facebook – Is there a community?

SMART Data Sprint: What is the data journalism debate on social media?
23-27 January 2017, Universidade Nova de Lisboa | NOVA FCSH | iNOVA Media Lab

Project title: Data Journalism Pages on Facebook – Is there a community?

Team members: Kalina Drenska | Vanessa Amaral Moreira | Bruno Horta | Mara Magalhães

Facilitator: Janna Joceli Omena

»Research Questions


Data Journalism is a recently coined term and, by means of practical work or applied research, it is still in a pilot phase. We propose to analyse networks emerged from data journalism related pages by relying on page-like network method. We aim to detect whether these pages interact with each other, which kind of interaction they build (e.g. whether pages mutually like each other) and if they forge a debate about the subject they support, namely Data Journalism. To do so, we analyzed data journalism related pages in six languages (English, Portuguese, Spanish, Italian, German and Russian).

»Research Questions

1. Is there a community of Data Journalism on Facebook?
→ pages that mutually like each other
→ looking both for international and local interconnections
2. Do the pages interact each other?
3. Are these pages sharing content of one another?
4. Are these pages promoting significant debate for the field of data journalism?
→ “talking about”


We analyzed data journalism related pages in six languages (English, Portuguese, Spanish, Italian, German and Russian) through Network Analysis. Netvizz was the tool used to data extraction and Gephi to visualise and analyse our dataset. Regarding inclusion and exclusion criteria of Facebook Pages, we considered Facebook pages that explicitly state in their title or description that they are related with the topic of data journalism. In addition to inclusion criteria, we counted Facebook Pages that (even if not explicitly stated in the title/description) have a long record of sharing a promoting content about data journalism (events, studies, articles, videos etc. on the topic). Therefore, we excluded pages that deal with big data issues or only sporadically make publications concerning data journalism.

In Gephi, we explored the pages like network in depth 1 applying the following metrics and measures: clustering, degree, ego-network, talking about count and post activity.

First, we searched for relevant Facebook Pages using the Search Module of Netvizz. This module “provides an interface to Facebook’s search functions for pages, groups, places and events” (Netvizz, n.d.), and the final script may to get up to 1000 results. Our search query was based in one keyword (data journalism) for the the chosen six languages:

English: data journalism
Portuguese: jornalismo de dados
Italian: giornalismo di dati
Spanish: periodismo de datos
Russian: Журналистика данных
German: Datenjournalismus

After the initial search on Netvizz, we managed to find a total of 68 pages. See below the number of pages according to language:

English (48)
Portuguese (3)
Italian (1)
Spanish (11)
Russian (3)
German (2)
Hence, we checked out page’s description and timeline and realized that many of them would not fit our inclusion and exclusion criteria. After that, we ended up with the following lists of pages:

English (25)
Portuguese (3)
Italian (0)
Spanish (4)
Russian (2)
German (0)
The exploration of our database was done by network analysis, to do so, all the relevant pages (the list above) were put together in the Gephi. At first, we ran the algorithms for Modularity Class (which find clusters) and Average Degree (which shows node connections). 62 communities were found; we had a network with three big and disconnected clusters, and also some small clusters and single nodes spread around. Right away we identified a lot of outliers, for instance, New York Times, BBC News, La Nación and decided to exclude them from our database.

The process of running visualization algorithms, looking at network details and excluding outliers was repeated several times in order to accomplish our inclusion and exclusion criteria. Our final network was composed by 41 clusters that were analysed according to the following attributes: “talking about count” and “post activity”.


Observing the clusters separately we had an overview of how the community of data journalism is like. In the network below, we analyzed Modularity Class and Average Degree, noticing that “Data Journalism Crew” (an Italian page created in 2012) is the most connected node in the network. Although the connections made by “Data Journalism Crew” show the page interest in being tuned with others pages related to data journalism, these latter do not like “Data Journalism Crew” in return.

However, after running the attribute “talking about count”, the larger node was the page “Pro Publica” (see graph below) – a non-profit organization founded in 2008 that produces (and promotes) investigative journalism in the public interest. Due to “Pro Publica” popularity it was expected that the page have generated large and specialized debate on Facebook, nonetheless the reminding data journalism pages seems to be not representative in generating debate. We may state that actually these pages do not evoke (or forge) debate about data journalism.

Finally, we observed that Data Journalism network under the attribute “page activity” – posts per hour, based on the last 50 posts (Netvizz, n.d.), and we found that the most active page in the network is “Jenny Milkowski Fox 32 Chicago” – a page of an American journalist who works for Fox (see graph below). The page is not related to Data Journalism, but news. This page was the only outlier allowed to remain in the network.

And, we also found several non-connected nodes which were not generating debate nor activity on Facebook Platform.


After our investigation, and taking into consideration our research strategy, we observed that there is not an active community of Data Journalism related pages on Facebook. The great majority of data journalism pages are neither mutually connected nor generating relevant debate within the network. We arrived at the conclusion that the debate of data journalism on Facebook platform is very incipient; still to find its own shape and practices.


During the research process we realised that is fundamental to have a closer look at the dataset in order to have a better understanding of the data and, then, be able to identify what is relevant and what should be discarded in the analytical process. Such awareness may promote a more accurate result in the end of the research process, which is a tough work, but very important step to be considered before further exploration.


Borgatti, P. S. et al.(2009). Network Analysis in the Social Sciences. Science 323(892), 892-895. doi: 10.1126/science.1165821.

Gillespie, T. (2010). The politics of “platforms”. New Media & Society, 12 (3), 347-364. doi:10.1177/1461444809342738

Netvizz (n.d.). Page Like Network Module. Application available at

Rogers, R. (2009). The End of the Virtual: Digital Methods, and at

Van Dijck, J. (2013). The culture of connectivity: A critical history of social media. New York, NY: Oxford University Press.