SMART Data Sprint: What is the data journalism debate on social media?
23-27 January 2017, Universidade Nova de Lisboa | NOVA FCSH | iNOVA Media Lab
Project title: The Data Journalism Debate on Facebook Groups –
Applying Critical Analytics and Engagement Metrics
Team members: Janna Joceli Omena |Sofia Correia | Débora Bedeschi
Facebook Groups organize communication and knowledge (or information) exchanging in a specific format (public, closed or secret), these channels are addressed to particular groups with particular interests; from cats lovers to “the cultural economy of fandom” (Fiske, 1992), from old school mates to futebol supporters, from political debate and civic engagement to science fiction, and further on. Among the several reasons that motivate the creation of a group on Facebook, what makes the existence of such structures is something called ‘common interests’. Following this logic, and assuming Facebook Groups generate specialised debate, we propose to investigate the data journalism debate (professional or academic) on Facebook Platform; taking into consideration the group’s mechanism, and the drawbacks and limitations of Facebook API (e.g. until early 2015 developers were allowed to access private group data for research purposes, which is no longer an option – see Omena, 2016).
The analytical approach relies on Rogers’ proposal of critical analytics and engagement metrics (Digital Methods Initiative, 2016, February 19; Rogers, 2016, January 13) to identify the data journalism debate on Facebook Groups. Add to that, we follow the work of Rosa and Omena (2015) and their exploratory study on Facebook Groups supported by digital methods . With this study, we also expect to raise awareness of the studies of Facebook Groups under a medium-research perspective (keeping in mind that Facebook Groups studies are still part of exploratory and experimental research).
The research questions were grouped according to our research interests: actions performed within the medium (1 and 2); distribution of phenomena (3); and, analytical perspective (4).
- What is the main debate around Data Journalism on Facebook Groups and how it changed over the years? (Are early concerns still up to date?)
- Who are the most credible voices?
- How the debate about data journalism expands from Facebook Groups to Facebook Platform and to the Web?
- What findings can be achieved by applying critical analytics and engagement metrics in Facebook studies?
» Research Strategy:
The first step was to call Facebook API through the Search Module of Netvizz in order to list groups related to Data Journalism. Key words in Portuguese and English composed our query design: “data journalism”, “jornalismo de dados”, “journalism and data”, and “jornalismo e dados”. We found 76 groups (50 closed and 26 open), and among them we selected three groups to perform in-depth content analysis; selecting the most engaged posts per month to identify key actors and topics of discussion over the years. The selection criteria were the group’s created time and activity (e.g. a group created in 2011, but inactive in 2016 or 2017 would be discarded).
In the second step we analysed the chosen groups: Jornalismo de Dados (Brazil), Jornalismo e Dados (Portugal), and Data-Driven Journalism (USA), and after that, we explored how the debate about data journalism expanded across Facebook Platform and the web. To do so, we relied on a list of monthly top engaged URLs (post type: link).
»Data Collection and Data Visualization
The sample was collected in March 2016 and January 2017, we used Netvizz (search and group data modules) to extract data from Facebook Groups, and Excel, IssueCrawler and Gephi to visualize and analyse our dataset. The period of time changes according to each group: Jornalismo e Dados covers the year of 2015 to 2016; Data-Driven Journalism encompasses 2013 to 2015, and the dataset of Jornalismo de Dados goes from 2013 to 2016.
» Analytical Approach
In order to detect dominant voice and the main topics of discussion (concern) over the years, we first generated an overview of group members reactions according to their engagement (and comments) over different types of posts. Thereafter, we conducted content analysis by listing monthly top 10 posts (according to users’ engagement). In parallel, we created a second list containing the URLs that triggered high engaged levels by group members.
We performed content analysis and link-based analysis in three representative data journalism groups on Facebook:
i) Data-Driven Journalism: The Basics – USA (3,735 members) » The group was created on 18 July 2013 by Raquel Barrera with the intention of being channel of communication between students and course tutors. The online course named ‘Data-Driven Journalism: The Basics’ was the main reason for the group creation. This open online course is an initiative of the Knight Center for Journalism in the Americas, University of Texas.
ii) Jornalismo de Dados – Brazil (539 members) » The group was created on 21 October 2013 by Diego Rabatone Oliveira, and it also was generated from the Knight Center Journalism in the Americas, with basis on the online course: ‘Introduction to Data Journalism’ which had José Roberto Toledo as main lecturer.
iii) Jornalismo e Dados – Portugal (175 members) » a Portuguese closed group created on 13 June 2015 by Ana Pinto Martinho. The goal of the group is information sharing about journalism and data, and also being a useful platform for the users.
» General overview on group member reaction
Through the general overview of group members reaction we first identified that most of interactions originated from posts type link, then we noticed that group activity tend to increase after the creation time, e.g. Jornalismo e Dados and Jornalismo de Dados (see bar charts below). Data-Driven Journalism contradicts this flow of activity; the group attracted a high level of participation in 2013, but dropped significantly in 2014.
Another aspect to consider is that group members mainly participate by liking posts or sharing content and not by commenting; the number of comments per year is generally low, they are mostly related to posts type link (see bar charts here). An exception to this is the year of 2013 in Data-Driven Journalism – more than 400 comments, and the year of 2016 in Jornalismo e Dados – the group had a total of 113 comments (see bar charts here).
This overview on group member reactions drawn our attention to investigate the issues related to the shared URLs, and it also led us to questioning how these URLs expand the debate about data journalism from Facebook Groups to Facebook Platform and to the Web.
» Most Credible Voice and Group Debate over years
The most credible voices varied along the years, but we noticed a common pattern in the first two years of the groups existence: group admins or course instructors are featured as dominant voice within each group (see chart below). From then on, journalists and researchers also joined the debate positioning themselves as credible voices. These actors dictated (and conducted) the topics of discussion concerning Data Journalism.
Regarding topics of discussion over the years, we can see the evolution of the data journalism debate in two blocks; one in a more theoretical and discursive format – group members are interested in articles, what tools can do or practical examples of data journalism (see image below), while the other block brings a more applied perspective of data journalism – group members not only interact with but also produce data journalism outputs, pose technical questions concerning the use of tools or software, expanding the debate to a more specialised degree. For instance, in Jornalismo de Dados, the most engaged post in 2015 relates to a technical question: how to open a 500 pages pdf. File on an excel sheet?; and, in 2016 we have the first Brazilian Conference on Data Journalism.
In Jornalismo e Dados we have another example of a more specialized debate, in 2016, the group was concerned with topics such as algorithms, robot journalism, API’s, Artificial Intelligence, Open Data and Curator Editor (see image below).
» Mapping the data journalism debate through co-link based analysis
With basis on the monthly top 10 most engaged post, we made a list with the most popular URLs. Note that we created one list for each group, then we integrate the three lists in one. Our proposal is to perceive how the data journalism debate was expanded from Facebook Groups to Facebook Platform. To do so, we categorized the URLs (e.g. course, article, tools) and verified the numbers of times they were shared on Facebook (chart below).
In 2013, Natália Mazotte (group admin of Jornalismo de Dados) shared Planet Money webpage in order to exemplify a data journalism technique; scrollytelling. This link was shared on Facebook more than 40.000 times, which does not mean that all shares are linked to data journalism. We did also realise a significant increase on the debate about data journalism courses; in 2015 the quantity of shares (11.229) was more than double comparing with 2014 (4.556). Nevertheless, 2016 was not a representative year in terms of sharing issues on data journalism; there was a reduction in the act of sharing articles and in the amount of URLs shared concerning tools.
The next step is to move the perspective from Facebook platform to the Web, in this case we used the list of most engaged links generated by Jornalismo de Dados, and applied an exploratory co-link analysis advanced by IssueCrawler . The proposal was track down data journalism related actors across the web. (see co-link map below) First we mapped relevant non-profit organizations and initiatives linked to Journalism, Data Journalism, and Data, such as Open Knowledge International (Okfn), International Center for Journalists (ICFJ), The Data Journalism Handbook, Data World Bank, and School of Data . Then, we checked the pages that received more links from crawled population: Okfn, European Journalism Center (EJC), World Bank, and Data World Bank (World Bank Open Data). We can say these are influential pages that play key role on what constitute data journalism.
After exporting the file to Gephi, we found six clusters generated from most engaged URLs by Jornalismo de Dados (see graph below): opendefinition.org, infogr.am, worldbank.org, worldbank.data.org, datadrivenjournalism.net, and ICFJ. The co-link network of each cluster can show associated connections concerning the subject of study or page authority within the network. For instance, Open Definition (pink cluster) is a free source for open data and content, and its majority inlinks came from datadryad.org – a digital repository that “makes the data underlying scientific publications discoverable, freely reusable, and citable” (Dryad n.d), and dados.gov.br, and br.okfn.org – brazilians open data websites. Nevertheless, and due to time constraints, detailed analysis shall be applied on further research.
Over the years the debate around data journalism evolved; from the general interest in the subject (mainly demonstrated by the sharing of articles or existence tools) to the learning process and actual intervention in data journalism field (by taking practical courses or learning to code and use software). In the same way, the most credible voices among Facebook Groups have shifted from a small expertise group (only group admins – mainly journalists, or course tutors) to a broader active group including not only group admins but also journalists, researchers, data analysts, and students. And this is the current data journalism debate on Facebook groups: sharing links about online courses or tools/ software or sharing personal outputs related with data journalism.
Two considerations concerning how the data journalism debate expands from Facebook Groups to Facebook Platform and the web; first, 2015 was an important year on data journalism debate with a significant increase in the sharing of articles, courses and granting projects related to data journalism (more than 25.000 URLs shared); and, second, we were able to map data journalism non-profit organizations and initiatives on the web – detecting key and influential actors that not only are interested in the data journalism debate, but are constituent parts of data journalism.
These conclusions and findings summarize a share of what can be achieved by applying critical analytics (Rogers, 2016) to social media platforms, in particular Facebook Group Studies.
We can affirm that the data journalism debate on Facebook Groups is much more an act of sharing knowledge than an act of making comments, for instance. Many would believe that Facebook Group Studies should rely on comments because the concept of groups brings the idea of discussion or dialogues. However, social media platforms introduce new forms of communication, which are not always represented by textual content. In this exploratory work we realized URLs speak louder than comments or textual grammars; links are the language of data journalism groups.
We may also consider the limitations of the present work, for instance our dataset sample: the content analysis were based in three groups, and the link-based analysis emerged from content generated from these groups. On one hand, we have a framed perspective, but on the other hand we also present a medium research perspective that may shed light on Data Journalism and Facebook studies advanced by digital methods.
 The authors analysed two groups (one that gathers cinema lovers, and the other music lovers) basing the analytical process in three main steps: i) an overview of the group and user analysis; ii) content analysis according to user’s engagement (considering the type of post); iii) network analysis to visualise the interaction between members and publications.
 IssueCrawler is a hyperlink analysis tool that crawls “a seed list of websites, locate hyperlinks either between them or between them and beyond them, and map the interlinkings, showing uni-directional, bi-directional as well as the absence of linking between websites” (Rogers, 2017, p.12).
Digital Methods Initiative (2016, February 19). Otherwise Engaged: Critical Analytics and the Meanings of Engagement (video file). Lecture given at DMI Winter School, retrieved from https://www.youtube.com/watch?v=sNwl-qGrK7M&list=PLKzQwIKtJvv_cfzQWIaVCb1H60xaeDjkL
Dryad, n.d. The Organization Overview at http://datadryad.org/pages/organization accessed 26 January 2017
Fiske, J. (1992). The Cultural Economy of Fandom, in: The Adoring Audience: Fan Culture and Popular Media, New York: Routledge, pp. 30-49
Omena, J.J. (2016, November 30). Potential scenarios for API Research, The Social Platforms, available at https://thesocialplatforms.wordpress.com/2016/11/30/potential-scenarios-for-api-research/ accessed 17 February 2017.
Rogers, R.(2016, January 13). Otherwise Engaged: Critical Analytics and the Meanings of Engagement (slides). Retrieved from http://www.slideshare.net/digitalmethods/richard-rogers-otherwise-engaged-critical-analytics-and-the-new-meanings-of-engagement-online
Rogers, R. (2017). “Digital Methods for Cross-platform Analysis: Studying Co-linked, Inter-liked and Cross-hashtagged Content,” in Jean Burgess, Alice Marwick and Thomas Poell (eds.) Sage Handbook of Social Media . London: Sage, forthcoming.
Rosa, J. and Omena, J.J. (2015). Nós na rede: Conexão e Participação em Dois Grupos do Facebook a partir dos Digital Methods. SopCom Congress, Coimbra-PT, 12-14 November 2015