Entropy of a crowd network

A crowd can be represented as a complex network of people. The people are represented by nodes which are connected to each other by relationships or edges. These edges have different kind of semantics associated with them. For example, the relationship can be that of “friends” or “follows” etc. Many different kinds of networks can be imagined using different semantic relationships. We already have Facebook which is a “friend” network, Twitter a “follow” network, Linkedin a “professional” network. Now there are new more specialized networks popping up like Doximity a network for doctors, Familyleaf is a networks for family, Anglelist is a network in the startup world, and many more. Spigit is a network for innovation in the same vein.

We have seen that networks are useful when they are active and engaged. There are multiple ways of measuring “engagement”. Metrics ranges from number of sign ins, number of page views, content submitted and many more. I propose that the number of messages sent between the nodes is a pretty good way to measure engagement among a crowd network.

An experiment

Consider a crowd of people, represented by a network $G=(V,E)$ where $V=(v_1, v_2,...,v_n)$ is the set of vertices and $E=(e_1,e_2,...e_n$ is the set of edges. A relationship $\alpha(e_1)=(v_i,v_j)$ gives the end point of an edge. The graph is represented by an adjacency matrix A where $A_{i,j}=a_{ij}=1$ if link exists, 0 otherwise.

Consider two random variables $\theta$ and $\omega$ which represent the incoming and outgoing probability distributions. Entropy is the $\emph{uncertainty or unpredictability}$ in the random variable. It is the information content of the variable. The entropy is the maximum when it is most difficult to predict what the value of the random variable is going to be. Assuming that the incoming and outgoing probabilities between nodes are completely independent of each other, the structure of the graph which results in the highest entropy signifies the most optimally communicative nodes. In other words, graphs which have high entropy in their incoming and outgoing probability distributions are likely to be more engaged. The incoming and outgoing probabilities are represented by two n-by-n matrices $X$ and $Y$ where $X_{i,j}=x_{ij}$ is the probability of “i” $\emph{sending}$ a message to “j” and $Y_{i,j}=y_{ij}$ is the probability of “i” $\emph{getting}$ a message from “j”.

We first construct an ER graph in which the probability of a link existing between $i$ and $j$ is given by $0<p<1$ . The entropy of $\theta$ and $\omega$ is computed as follows,

$H_\theta = \sum_{i=1}^n \sum_{j=1}^n a_{ij}*x_{ij}log(\frac{1}{x_{ij}})$

$H_\omega = \sum_{i=1}^n \sum_{j=1}^n a_{ij}*y_{ij}log(\frac{1}{y_{ij}})$

The entropies are combined into a single weighted variable say “crowd value” as follows,

$V(\beta) = \beta H_\theta + (1-\beta)H_\omega$

Figure shows the behavior of structural parameters plotted against the increasing cumulative entropy of the network. For all the networks the number of nodes n=100 and the weighing parameter is constant at . The link density probability (p) is varied from 0.5 to 0.9. For all values of , the parameters display more or less linear behavior. (a) The increase in the entropy is brought about by increase in the number of active links (m) in the network (b) shows as the number of links increases so does the clustering coefficient along with the network entropy, (c) shows the decrease in the average shortest path length along with the network entropy (d) shows number of disconnected components very quickly decreases to 1 forming a single giant component with increasing network entropy. In all cases after an initial sudden increase/decrease the values quickly normalize to the theoretical limits. — Figure shows the behavior of structural parameters plotted against the increasing cumulative entropy of the network. For all the networks the number of nodes n=100 and the weighing parameter is constant at $\alpha=0.5$ . The link density probability (p) is varied from 0.5 to 0.9. For all values of p<0.5 , the parameters display more or less linear behavior. (a) The increase in the entropy is brought about by increase in the number of active links (m) in the network (b) shows as the number of links increases so does the clustering coefficient along with the network entropy, (c) shows the decrease in the average shortest path length along with the network entropy (d) shows number of disconnected components very quickly decreases to 1 forming a single giant component with increasing network entropy. In all cases after an initial sudden
increase/decrease the values quickly normalize to the theoretical limits.

Figure shows the behavior of structural parameters plotted against the increasing cumulative entropy of the network. For all the networks the number of nodes n=100 and the weighing parameter is constant at $\alpha=0.5$ . The link density probability (p) is varied from 0.5 to 0.9. For all values of p<0.5 , the parameters display more or less linear behavior. (a) The increase in the entropy is brought about by increase in the number of active links (m) in the network (b) shows as the number of links increases so does the clustering coefficient along with the network entropy, (c) shows the decrease in the average shortest path length along with the network entropy (d) shows number of disconnected components very quickly decreases to 1 forming a single giant component with increasing network entropy. In all cases after an initial sudden
increase/decrease the values quickly normalize to the theoretical limits.

To maximize the entropy of the graph we iteratively change the structure of the graph and compute the entropy for the resultant structure. If the entropy is higher than the previous stage the graph is accepted as a “better” version. At this point we compute some structural properties of the graph namely

1. number of links

2. clustering coeffcicient

3. average shortest path length

4. number of components in the graph

These metrics of graph are recorded after every iteration and their progress is charted along with the increasing communication entropy. The results can be seen in the figure. The whole process is repeated for different values of $p$ which is the probability of a link existing.

It can be clearly seen that a highly connected graph with low average shortest path length and high clustering are usually $\emph{``valueable"}$ crowds.

Small data blog

Notes on networks, ranking, voting, social choice, personalization

Related

Leave a comment Cancel reply