Centrality measures as a signature of roles in Rousseau’s Les Confessions

Here is a blog-formatted version of the text which won the Best Paper Award at the first Texas Digital Humanities Conference (TxDHC) in Houston, this spring (2014). The work was done jointly with Prof. Frédéric Kaplan and Cyril Bornet. The text of Les Confessions is available in [French] or [English].

In this work, we investigate how a selection of centrality measures can be used to differentiate roles of characters in Jean-Jacques Rousseau’s autobiography Les Confessions. We define methods to build automatically a network of characters, based on their co-occurrences. In the resulting network, each character of the novel is a node connected to other nodes representing other characters. We rank these with three centrality measures and find different ordering depending on the measures. We highlight how characters with high betweenness centrality tend to play positive roles in the narration as they act as important mediators and facilitators of Rousseau’s social life. On the contrary, we show that characters with high eigenvector centrality form a cluster of interchangeable figures, acting in practice like a “meta-character”, a crowd that conspires against Rousseau. Although we cannot yet generalise these findings to other work, we argue that these preliminary results motivate further research based on well-chosen centrality measures in digital literary studies.

The data and the scripts are available here.

Introduction

In digital literary studies, applying network analysis to literature usually consists in studying the influence of given novels or authors on other works, over time (Jockers, 2013; So and Long, 2013). Network analysis is also sometimes used within a single novel to understand, for instance, the evolution of the relationships between the different characters (Agarwal 2010; Moretti, 2011; Mac Carron and Kenna, 2012). In this post, we use networks to model the structure of relations between the characters of the autobiographical novel Les Confessions. This approach lets us analyse the proximity and influence of the characters among them, and identify the roles played throughout the narrative and the various story arcs. We build the network from an index of characters compiled by scholars (Rousseau, 2012). Using the index allows to bypass the text mining problematic of linking words in the text to the corresponding named entity (Elsner, 2012; Elson, 2012 [PDF 8.3 MB]) to address a different research objective: understanding how network analysis measures allows the characterisation of narrative roles in a novel.

This post begins with a preliminary investigation on centrality measures for the study of character networks. Centrality is a family of indices defined on networks that measure forms of importance based on properties of the network structure (Koschützki et al., 2005). This concept comes from social network analysis (Bavelas, 1948), which studies the application of graph theory to relational data in social groups (Wasserman and Faust 1994). Various centrality measures exist. Degree centrality is based on the number of connections of each node. Betweenness centrality measures the role of a node in terms of global connectivity in a network (Freeman, 1978). Eigenvector centrality computes centrality on the basis of the centrality of neighbouring nodes (Bonacich, 1987).

The core contribution of this post is to discuss how these measures rely to narrative roles. We show how, in Les Confessions, characters with high betweenness centrality differ from characters with high eigenvector centrality, the first ones playing a positive role as intermediaries in Rousseau’s early life, the latter being perceived as a cluster of negative characters, conspiring against him later in his life.

In the following sections, we describe in detail the method we use to construct the co-occurrence network based on indices and discuss the robustness of our approach. We first discuss the network general properties, then we focus more specifically on ranking characters using centrality. We conclude on the literary interpretation of these findings and the motivation for applying these measures beyond the particular case of Rousseau’s Les Confessions.

From an index of characters to a network

An index of characters is composed of at least two entries: one with the name of a character, the other with all the pages on which its name occurs. The index is expected to include all existing transcriptions of the said character (for example M. Dupont and Jean Dupont if they are the same character). Les Confessions is composed of twelve chapters, written in two periods (one to six, then seven to twelve). It spans fifty years of Rousseau’s life. In the edition we used, the index contains 583 entities, 2088 occurrences, 774 pages contain text, and among them 102 pages contain no name.

Our method is based on co-occurrences of characters on same page and consecutive pages. The fact that two names appear on the same page does not necessarily imply that they are linked in any manner, but recurring co-occurrences imply a narrative bond between them. This is why we combine for this method a rather flexible co-occurrence strategy (considering the co-occurrences spanning on consecutive pages) with a threshold allowing to discern recurring association from random ones.

In other words, we define (1) a system that takes into account co-occurrences on consecutive pages in order to create links. The domain is not the set of pages, but a set of overlapping couples of pages. We weight the links by counting the co-occurrences. (2) Then, we define a method that requires two characters to appear at least two times together, closely or not, in order to infer the link in the network. This minimum intensity condition implies the definition of a threshold to determine if a link is to be created. We define more formally these two steps below.

(1) The set of occurrences for any given character is defined on couples of pages instead of single pages. Let A be a character. If A appears on page i, we consider that A occurs on both couples of pages {i −1,i} and {i,i +1}. We build for each character a set of occurrences. The intersection of two such sets determines the number of co-occurrences between the concerned characters. The cardinality of the intersection is an attribute that estimates the intensity of the link between them. Thus, the relation between two co-occurrent characters on the same page is incremented by two. Overlapping co-occurrences allows to consider proximity as smooth and distributed. For the sake of comparison, the application of this method on the index generates 4919 edges instead of 2415, for 583 nodes.

(2) We select a threshold value of 3 to distinguish relevant associations from noisy ones. Here are examples of the three possible cases with intensity equal to three: (a) when two characters are once co-occurrent on the same page and once on a disjoint couple of consecutive pages, (b) when they are both co-occurrent on two consecutive pages, (c) when two characters are three times co-occurrent, never on the same page. In our corpus, we recorded these cases among the links with intensity equal to three: (c1) there are 75 all successive, (c2) 109 mixed, and (c3) 6 all disjoint. We give examples for each case in our thesis (linked here at the end of 2014, when publicly available).

The network. Nodes in red are characters appearing in parts one and two.

Despite the fact that we are working on his autobiography, this network is not “Rousseau’s social network”: it is a computational model derived from Les Confessions (Moretti, 2011). In the following section, we use it to for literary analysis, independently of any historical reality.

General characteristics of the network

The network is undirected. It has 226 vertices and 613 non-directed edges. This is 37% of the total number of characters at the beginning of the study: since our model is concerned only by relational data and main narrative, characters appearing in too coarse zones of pages are not retained by our method of building the network. To deal with the whole network, we need measures that go along with weights. At this step, the network is disconnected: the giant component is composed of 216 nodes out of 226. We focus on the analysis of these 216 nodes and the 608 edges composing the giant component. This is a critical step that allows us to use measures that are more efficient in connected cases, like betweenness centrality. From now on, we define the giant component as the character network of Les Confessions.

Centrality measures

Centrality is a concept from social network analysis. Most of the time, it is expressed by a mathematical index, or a family of indices, measuring “structural advantage, importance or dominance” (Hennig et al., 2012). Among the numerous choices, classic ones are degree, betweenness, closeness and eigenvector indices. Here we do not use closeness.

Degree centrality is a measure originated in graph theory (Berge, 1958). It is the sum of incident edges to a given node, thus it is also the count of its neighbours. The more a node is connected, the bigger the size of its direct neighbourhood, and the higher its measure of degree centrality. Degree measures popularity: a high value of degree centrality implies that the character appears on pages on which many other different characters occur in total. It is not influenced by the structure of the network at a distance further than one.

Betweenness centrality measures the control an actor has on the information flowing in the network. Mathematically, it is the sum on each couple of nodes of the proportion of shortest paths starting from and ending on the two nodes on which the node under study appears. If the node is in a very dense cluster, most of the flow will be distributed inside of it, and the node will get low betweenness centrality, since it doesn’t appear directly on shortest paths. In the case where a node is situated between two such clusters, then it will have high betweenness centrality. In the context of narratives, this concept implies appearing in disjoint events, not necessarily consecutive, while other characters don’t occur in these, for example. In such a case, the author uses that character in order to accompany the narrative while it takes a turn into another direction.

Eigenvector centrality is computed by solving a system of linear equations based on the adjacency matrix of the network. The eigenvector centrality of a node is a function of the neighbouring nodes own centrality values. If the node under study is connected to a highly central node, this will influence it. Nodes at a higher distance also influence the measure, which is inversely proportional to their proximity to the node. The computation of a node’s measure is based on its neighbours’. Practically, eigenvector centrality index is defined as the eigenvalues of the adjacency matrix.

Ranking based on centrality

In this section, we present the results of computing the three exposed centrality indices. In each case, we show the ten most central characters.

The 10 characters with highest degree centrality. — The ten characters with highest degree centrality.

Degree centrality shows that Mme de Warens is the most connected character, followed by Thérèse Levasseur. They are the two main female characters in Rousseau’s life: Mme de Warens is mostly cited from the second to the seventh books, while Thérèse Levasseur appears in that last book, and stays with him until the end at chapter twelve. This explains why they both are co-occurrent with many other characters. We remark that characters from the Parisian society are also well connected, such as Denis Diderot, Mme d’Épinay, Mme Dupin.

The 10 characters with highest betweenness centrality. — The ten characters with highest betweenness centrality.

Betweenness centrality shows the overwhelming role of intermediary influence Mme de Warens has played for Rousseau, hosting him after he left Geneva, and then regularly sending him to diverse places in France or Italy. That role of a hub is visible in the network, where the node representing her radiates in the many directions being linked to various nodes and clusters.

The Comte de Montaigu, ambassador of France in Venice, appears in the betweenness ranking but not in the degree one, thanks to a role of intermediary he plays between the French society and the Venetian one, where Rousseau worked for him as his secretary.

Eigenvector centrality is the only measure for which Mme de Warens is not the most highly placed.

Untitled3 — The ten characters with highest eigenvector centrality.

Characters from the second part of Les Confessions occupy the top of this ranking: thanks to their proximity, they reinforce one another’s centrality. We discuss the important difference between this index and the previous ones in the following section.

Discussion

We have seen from the different ranking systems that at least two groups of measures give rather different ordering. Betweenness centrality shows the importance of a character like Mme de Warens: she plays a role of intermediary between many characters in the novel. She is the person who introduces him to many different societies. In the first part of Rousseau’s life (which corresponds to the first six chapters), she definitely is a positive character, being the “red line” that weaves Rousseau’s pieces of network together. She literally has a central role in the sense that if she had not been there, most of Rousseau’s encounters would not have happened. She is one of the main driving forces of the story, acting like some sort of orchestrator.

On the contrary, if we consider the eigenvector centrality ranking, other characters play the more central roles. They (Épinay, Grimm, Diderot, Holbach, d’Alembert, de Boufflers) are a group of interconnected persons, situated in Paris. In the second part of Rousseau’s life, these persons tend to play a negative collective role. Rousseau suspects them of leading a conspiracy against him. Whether these threats are real or just the results of Rousseau’s paranoia is beyond our discussion. We can nevertheless spot that this group of highly interconnected characters play in fact the role of a kind of “meta-character”, a cluster of persons, where the clusters count more than each individual entities. This is precisely the meaning of a high eigenvector centrality. Characters recurrently associated with one another are in practice interchangeable from a narrative point of view. They act like a crowd. Despite their high eigenvalue centrality, none of them is really “central” in the sense that the cluster would still play its narrative role if that person were to be removed. This lack of individuality is correlated with fear and suspicion, as Rousseau associates most of the members of this group as faceless enemies.

In relation with these two opposite narrative roles, it is interesting to study the case of Denis Diderot, as he is one of the rare characters that globally score high in both betweenness and eigenvector rankings. This can be explained by considering the temporal unfolding of Rousseau’s autobiography. Indeed, Diderot is first perceived as a connector, like Mme de Warens, as someone who opens door for Rousseau. However, he is later linked with the Parisian crowd, when Rousseau decides to leave this society and starts to consider his enemy as a group of interchangeable figures.

In summary, this first study shows on these particular cases that various centrality measures can be associated with different narrative roles. We cannot yet argue for the generality of such a finding beyond the particular case we study here, but we believe that these encouraging results should motivate further study in this direction. As the method we described is simply based on an index and is not relying on specific text mining techniques, it is extremely simple to deploy it on any book that comes associated with such information.

In addition, these preliminary results obtained globally on a novel should encourage results that investigate in more details the evolution of such measures inside a novel. In many cases, characters evolve as the novel unfolds, and it is likely that the different centrality measures we consider in this article could capture potential changes in narrative roles if such situations would occur.

Eventually, it is interesting to remark that in a work like this one, it is not only various development in mathematics and graph theory that have offered interesting new tools of studying literature but conversely it is the particular case of the digital literary study we have considered that allows to give constructed and illustrative examples about the important differences that characterise these various centrality measures.

Further work

Soon I will post on this blog some of the work I’ve made since this paper (April 2014). In particular, I’ve dug into the workflow of transforming an index into a character network, from identifying, scanning and cleaning, to the final result.