The popularization of social networks has changed the way we communicate between today’s society and the influence they have on how individuals report trends in different areas of interest. Many of these social networks contain a huge amount of data and linkage that can be leveraged to analyze which thematic areas are of greatest interest and who are the benchmarks that promote that information.
With this idea, a study we have developed is the analysis of Twitter’s social network to try to have a photograph of how the area of aging is dealt with among users interested in these aspects. Social networks can be considered structures consisting of nodes representing individuals or organizations and relationship links, so the analysis aims to detect the most popular trends or topics that focus the interest of users in the field of and in a second part we analyzed the the sentiment polarization that is perceived by users in dealing with these issues. Another aspect that has been sought to analyze is the detection of users or networks of influential users within the social network that focus the flow of information.
To carry out this analysis, as mentioned, Twitter has been chosen as the source of data resources because it is widely used and simple in its dissemination mechanics in such a way that its exploration is accessible and wide enough to have a statistical relevance. The analysis has focused on the Spanish content of the social network and on users who had an interest in the areas of “aging, longevity and the elderly“.

Through connection with the Twitter API and through the advertools library and the use of the Python programming language, some 10,000 tweets dated December 2019 were downloaded.

Commenting that Twitter’s own API limits the download of tweets to the last 7 days before the query, so the analysis we do focused on December when the bulk download is done.

From this raw dataset, we perform a cleaning by staying with the most interesting fields with which to develop the network graph analysis.

From here we add a synthetic indicator that will show us the records that are retweet (an entry on Twitter in which a user’s comment, or tweet, is re-sent by another user) so we consider it as a mention to another user.

In addition, a user can mention several other users within the same record or tweet text, so we will extract those mentions in another record to be able to take into account all possible relationships for network analysis.

The graph analysis aims to identify major or emerging communities where there is a strongest follow-up relationship between users, and that can determine the interaction of the different communities and determine similarities and differences between them and determine which ones are most influential and in which areas they are potentially most interested. Our analysis focuses on identifying key influencers within the various communities detected.

To configure the network, I select the variables of the nodes and the relationships between the node pairs.

We clean the variable mentioned by removing the “A” to make the labels cleaner when analyzing the graph.

We develop the graph that will be analyzed from Gelphi.

We take a first look at the parameters that determine the network. We have 6986 nodes (Twitter users) and 7823 links or mentions to other users.

It is verified that the graph is not related since when there are nodes or users that publish information and no one is mentioned, islands of users that do not form communities are produced so from a user you cannot connect with any other user who is what you would define to a related graph.

The density of the graph is low, so it is a sparse graph, that is, the number of edges that binds to the nodes is well below the maximum edge potential that would exist if all users were joined with the other nodes. This provides an idea that this is a topic that brings the interest of very diverse users, in many cases disconnected from the rest, so it would be relatively easy to predict that various communities or networks of users will potentially appear disconnected from each other to a large extent, without large concentrations of topics of interest.

Other metrics that may be interesting is that the node that has the largest number of edges in this network, whether mentioned by others or mentions others, is 1395 edges that is 14% of the total potential edges that it could have if the entire network was c onectada, so despite not being too connected the network, that node or user will be a focus of interest and potentially the subject matter discussed will have a high value. Mention that the grade mean is 2.2, which means that on average each user is connected to two other users.

We try to identify who these most influential users are in the networks in the subject matter.

We can see the top 5 of the users that receive the most mentions from other users within the tweets, so we could mention that they have credibility among the community of users interested in these topics.

We could also highlight the top 5 of the most relevant in the network, i.e. the ability to influence a node in the network.

With this first data, let’s take a closer look at the network from the graph obtained through its analysis with Gelphi.
Once the network is configured and after applying the Openord and ForceAtlas2 directed force design algorithms, we get the following configuration of the analyzed social network.

If we filter by the modularity of the nodes we get 5 main communities of users or subnets that have greater interaction between them. If we tag these networks to identify them and filter them by their influence on the network we get:

We will focus on the most relevant communities within the network and detect which users are the influencers within the thematic field analyzed.

Red community
Red community

Undoubtedly the one that forms a more defined community with a greater number of followers is the one defined in red, whose nerve center of the activity focuses on the user @sninobecerra that corresponds to the twitter profile of the economist Santiago Niño Becerra, professor of ‘Economic Structure’ in IQS of the Ramon Llull University of Barcelona and which has 184,000 followers. If we do process the text to eliminate the empty words of information using the corpus of “stopwords” available in the python nltk library and add it to a cloud of words to improve visulization we get, that the topic that is most of concern in this high-influence network is about the economic areas that influence population aging specifically aspects related to low birth rates, labor productivity and influence over national GDP.

Red Community cloud words
Green Community

If we focus on the Green Community, which we could consider how the second most relevant hub, we see that this flow of information is also led by a node or user called @elbotiquinmx with 36,000 followers and that corresponds to a Mexican publication dedicated to disseminating health and wellness prevention issues. If we look again at the most repeated words in all the tweets of the green community, the topics they focus on the debate about when aging begins or healthy habits to fight it.

The purple community appears very isolated in the edges or links that join with the rest of the network, which makes you suspect that it may be a theme not directly related to the field at hand. In analyzing its most influential node, we find that the theme at hand corresponds to the recent protests towards Evo Morales and its aged appearance, which makes us dismiss that community for the interest of this analysis.

Finally, we review the orange and blue community, which, being less influential in the entire network, but stand out within the rest of the subnets. In the case of the Orange community, two more influential nodes stand out on the one hand commenting on aspects related to the exercise and through bridge nodes connected to another of the influential nodes that focus on nutrition issues and how this influences the aging.

In the case of the blue Community, it is more wasted, but with more links to the other nodes so surely the topics discussed are broader and not focused on a single theme. In this case several influential nodes are appreciated, among which @GrandesAmigos_ is an NGO dedicated to preventing loneliness in the elderly, @FPilares which is a Foundation to support dignity throughout life, among others.

Orange Community
Orange Community
Blue Community

Project website in Github: Github

Movie recommender with Python Next post Movie recommender by content based and collaborative filter