SMART Data Sprint: What is the data journalism debate on social media?
23-27 January 2017, Universidade Nova de Lisboa | NOVA FCSH | iNOVA Media Lab
Project title: #PanamaPapers: 48 minutes on Twitter
Team members: Fabíola Mousinho | Ilo Aguiar
Facilitator: Janna Joceli Omena
Panama Papers corresponds to 11.5 million leaked confidential documents that detail financial information about wealthy individuals and more than 214,488 offshore entities. The documents were from the records of Mossack Fonseca, a Panama-based law firm, and leaked by a confidential source through encrypted channels to journalists. The documents cite many heads of state (e.g. Argentina, Iceland, Saudi Arabia, Ukraine and United Arab Emirates) and offshore companies that were being used for illegal purposes; including fraud, kleptocracy, tax evasion, and evading international sanctions.
The discussion about the Panama Papers was very intense, especially on Twitter. With approximately 319 million monthly active users (Frommer & Wagner, 2017), Twitter has become an important social media platform to spread news and information (Bruns & Stieglitz, 2012). The use of the hashtag “Panama Papers” created not only a conversational network, but users were also denouncing and protesting against corruption. The first publication about the leak came up on April 3, 2016 on Süddeutsche Zeitung, a German newspaper, and a tweet of Edward Snowden, a former Central Intelligence Agency employee, provoke enormous impact as well.
In this background, our goal is to detect the main topics of discussion and identify the main actors around the hashtag “Panama Papers”. Data were collected through Twitter Search API on April 5, 6 and 8, 2016. A total of 9,083 tweets were analyzed, which represents 48 minutes of debate about the Panama Papers.
- Considering that the sample was collected in the emergence of Panama Papers, can we identify whether the debate were related to the documents themselves or users were subverting the hashtag original meaning?
- Regarding tweets’ mentions, what associated debate raises more concerns among users?
- Can small samples of data conduct the analysis process to significant insights?
» Research Strategy
In order to identify the debate around the hashtag “Panama Papers” the research strategy was divided in two steps:
- i) Tweets analysis according to Mentions (top 10) and Favorites (top 10)
- ii) Analysis of retweets:
- First step: all retweets were organized in a list according to engagement
- Second step: the retweet list were divided in three groups (high, medium and low) in accordance with engagement level.
- Third step: a small sample of each group was analysed
- retweets with high level of engagement (from 21 RTs)
- retweets with medium level of engagement (from 11 to 20 RTs)
- retweets with low level of engagement (from 1 to 10 RTs)
Favoriting a tweet, which is similar to the “like” button in Facebook, may have differents meanings to the Twitter’s user. Gorrell & Bontcheva (2014) identified five categories of favorites usage (i.e., like, bookmark, thanks, conversational, and self-promotion). However, since it is not possible to ensure that the tweets were favored for one of these purposes, for this study is only highlighted the 10 most favored ones.
Twitter mentions usually have low value as an impact measure since tweets are easily manipulated (e.g., bots) (Haustein et al., 2016). Even so, scholars uses Twitter mentions to examine communities and the conversation within and between them (Chorley & Mottershead, 2016; Nelhans & Lorentzen, 2016). In this paper, the Twitter mentions were used to identify the most mentioned accounts and associated actors (@mention).
Retweet (RT) occur when other tweets are quoted. Normally, RTs denote popularity: more retweets received by a tweet, more popular is the tweet (Litou et al., 2016; Hong et al., 2011). The 10 most retweeted tweets from each group (high, medium and low level of engagement) were highlighted in the Findings.
» Data Collection
We used Twitonomy for data extraction, this tool make calls to Twitter Search API. In three days we collected a total of 9,083 tweets, which represent 48 minutes. The sample was collected on:
April 5 » 11:29:51 – 11:36:23
April 6 » 08:29:51 – 8:47:29
April 8 » 09:57:04 – 10:20:51
We assumed that most favorited tweets indicate the dominant voice, which was represented by people and institutions related to the media, such as newspapers, journalists and a humoristic blog.
The main topics of discussion were directly related with the panama papers scandal; most of them shared links to newspaper article or, in a specific case, a GIF that was related with the original sense of the hashtag (see image below).
Some of the most favorited tweets mentioned people that was directly involved in the scandals or suspects, for instance Pedro Almodóvar, a famous film director and screenwriter, he was accused of having an offshore in British Virgin Islands; Barbara d’urso, an Italian actress also related to a offshore and Sigmundur David Gunnlaugsson, Iceland premier, who resigned after his name came up in the Panama Papers scandal (revealing that he has been using an offshore to hide investments worth millions of pounds).
An interesting thing about the most favorited tweets is the language, for instance, in the top 10 we identified five different languages: French, Italian, Spanish, German and English (see table below). This variety of language indicates not only a global debate, but the positioning of key and influential actors, such as the French site and newspaper Le Monde Diplomatique (@lemondfr); Der Humor-Austicker (@SatireFrosch) a german humoristic blog, and the argentinian journalist Marcelo Bonelli (@BonelliOK).
5630 out of the 9,083 tweets collected were retweeted, meaning these retweets represented 62% of the total. 99% (5572 RTs) were classified as retweets with low level of engagement (from 1 to 10 RTs); 0,8% (44 RTs) were classified as retweets with medium level of engagement (from 11 to 20 RTs); and 0,2% (14 RTs) were classified as retweets with high level of engagement .
In the retweets with low level of engagement, tweets originated from ordinary/common users and the most recurrent topics were David Cameron, then Prime Minister of the United Kingdom, Nigeria, Iceland, Argentina’s President Mauricio Macri and Russian President Vladimir Putin, respectively. In the retweets with medium level of engagement, the tweets are mostly informative and from professional users (e.g., journalists) and entities (e.g. media organizations, NGOs, etc.). Finally, in the retweets with high level of engagement the volume is so insignificant that there is no pattern. The following table bring the most retweeted tweets.
|Twitter account||Tweet||Number of Retweets|
|@AymericOff||Et planquer du pognon dans des sociétés off shore au Panama. #PanamaPapers Voilà le vrai visage de ce parti.||103|
|@BBCBreaking||.@JeremyCorbyn calls for investigation into tax of all Britons linked to #PanamaPapers http://bbc.in/1TyCuXT||95|
|@Organic_Dawn||A gathering at Downing Street tomorrow noon until Cameron resigns #resignCameron #London #DavidCameron #PanamaPapers||68|
|@Snowden||The next 24 hours could change #Britain. https://t.co/x5e1YOJenx||63|
|@kerviel_j||Je souhaite une bien belle semaine @SocieteGenerale ainsi qu’à ses conseils. #panamapapers #PanamaLeaks||42|
|@Snowden||#UK Twitter right now: “Let’s hope Cameron resigns.” With respect, hope is not a strategy. #PanamaPapers https://twitter.com/b_judah/status/718039469736202241||42|
|@lemondefr||#PanamaPapers Le marchand d’art cachait le Modigliani spolié par les nazis derrière une société offshore||34|
|@OwenJones84||It’s a national embarrassment if we don’t protest over the #PanamaPapers. Protest this Saturday: close the loopholes https://t.co/GZSvtqhK21||31|
|@jaraparilla||Compare: 1. Encryption? You must be hiding something. 2. Tax Haven Account? Could be totally legit. #PanamaPapers||29|
|@wikileaks||#PanamaPapers Putin attack was produced by OCCRP which targets Russia & former USSR and was funded by USAID & Soros.||27|
Retweets with high level of engagement
In relation to the top 10 mentions we identified that tweets were addressed to media or politicians profiles. In a total of 9,083 tweets the former Prime Minister of the United Kingdom was the most mentioned; @David_Cameron. He was one of the names leaked by the investigation and the tweets directed to him was in an accusatory tone. Other profiles were also accused such as @mauriciomacri, an argentinian politician; @SocieteGenerale, one of the most influential bank in Europe, and @HillaryClinton, american politician. These examples show how Twitter was used as a denouncement channel.
We can affirm that the use of the hashtag “Panama Papers” was very political and the network conversation around this tag was based in sharing news or tweets of public figures. Trying to answer the research questions about the favorites tweets and mentions, we noticed that the original meaning of the hashtag was maintained. In the top 10 favorites and top 10 mentions the associated debate was political and about the documents or news about the subject. However, this does not mean that all the tweets can be framed as serious; we also detected humourous and ironic tweets, and they all depicted a more broad sense of the hashtag (see an example below).
Tweet mentions reveal politicians (either being accused or under suspicion) and the main media channels that were specially sharing links to documents or news about the Panama Papers.
Although most tweets and retweets using the hashtag Panama Papers have kept the original sense, there are cases of subversion, such as: “Jobs! Jobs! Jobs! Hiring immediately! Compensation commensurate with ability to influence Subscribe #panamapapers for new job openings”; and “Download Bible App and Customize your Reading Option Here: [link] #PanamaPapers”. These two examples depict a common practice on Twitter: the use of hashtags in trending topics to advertise something that is not tied to the original subject. This practice is considered as spam by users and scholars: “spammers do not drive the trending topics in Twitter, but instead opportunistically target topics with desirable qualities” (Stafford & Yu, 2013).
62% of all tweets collected are retweets, which indicates that practically only a third of the content using the hashtag Panama Papers is original. 99% of retweeted tweets received less than 11 RTs and were mostly from personal users sharing news and bad-mouthing politicians cited in articles on the subject. The 1% of tweets that received more than 11 RTs are mostly from news agencies and public figures such as journalists and the former Central Intelligence Agency employee Edward Snowden; pointing the media agents as the most popular actors during the period of our data collection.
Since the data sample was very modest: a couple of minutes during three days (April 5, 6 and 8), totaling 48 minutes and 9083 tweets, it is not advisable to extrapolate conclusions. The hashtag Panama Papers was on Twitter Trending Topics throughout the data collection and the main actors and topics discussed during those days were consistently mentioned in this paper. Thus, it is correct to say that here is a snapchat of the debate on Panama Papers from 5-8 April.
 Retweets with high level of engagement are usually few, but the result so small perhaps may be explained by the lack of virality of the subject (Goel et al., 2015).
Bruns, A., Stieglitz, S. (2012). Quantitative approaches to comparing communication patterns on
Twitter. Journal of Technology in Human Services 30(3–4): 160–185.
Chorley, M. J., Mottershead, G. (2016) Are You Talking to Me?, Journalism Practice, 10:7, 856-867, DOI: 10.1080/17512786.2016.1166978
Frommer, D., Wagner, K. (2017). Twitter only grew by two million users during Trump mania Facebook grew by 72 million. Available: http://www.recode.net/2017/2/9/14558890/trump-twitter-user-growth. Accessed 21 February 2017.
Goel A., Munagala K., Sharma A., Zhang H. (2015) A Note on Modeling Retweet Cascades on
Twitter. In: Gleich D., Komjáthy J., Litvak N. (eds) Algorithms and Models for the Web Graph. Lecture Notes in Computer Science, vol 9479. Springer, Cham.
Gorrell, G., Bontcheva, K. (2016). Classifying Twitter favorites: Like, bookmark, or Thanks?. J Assn Inf Sci Tec, 67: 17–25. DOI:10.1002/asi.23352
Haustein, S., Bowman, T. D., Holmberg, K., Tsou, A., Sugimoto, C. R. & Larivière, V. (2016). Tweets as impact indicators: examining the implications of automated “bot” accounts on Twitter. Journal of the Association for Information Science and Technology, 67(1), 232-238.
Hong, L., Dan O., Davison. B., D. (2011). Predicting popular messages in Twitter. In Proceedings of the 20th international conference companion on World wide web (WWW ’11). ACM, New York, NY, USA, 57-58. DOI=http://dx.doi.org/10.1145/1963192.1963222
Litou I., Kalogeraki V., Gunopulos D. (2016). On Topic Aware Recommendation to Increase Popularity in Microblogging Services (Short Paper). In: Debruyne C. et al. (eds) On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science, vol 10033. Springer, Cham.
Nelhans, G., Lorentzen, D. G. (2016). Twitter conversation patterns related to research papers. Information Research: An International Electronic Journal, v21 n2 Jun 2016.
Stafford, G., Yu, L. L. (2013). An Evaluation of the Effect of Spam on Twitter Trending Topics. In International Conference on Social Computing (SocialCom), vol., no., pp.373,378, 8-14, Sept. 2013.