# Entropy of a crowd network

A crowd can be represented as a complex network of people. The people are represented by nodes which are connected to each other by relationships or edges. These edges have different kind of semantics associated with them. For example, the relationship can be that of “friends” or “follows” etc. Many different kinds of networks can be imagined using different semantic relationships. We already have Facebook which is a “friend” network, Twitter a “follow” network, Linkedin a “professional” network. Now there are new more specialized networks popping up like Doximity a network for doctors, Familyleaf is a networks for family, Anglelist is a network in the startup world, and many more. Spigit is a network for innovation in the same vein.

We have seen that networks are useful when they are active and engaged. There are multiple ways of measuring “engagement”. Metrics ranges from number of sign ins, number of page views, content submitted and many more. I propose that the number of messages sent between the nodes is a pretty good way to measure engagement among a crowd network.

An experiment

Consider a crowd of people, represented by a network $G=(V,E)$ where $V=(v_1, v_2,...,v_n)$ is the set of vertices and $E=(e_1,e_2,...e_n$ is the set of edges. A relationship $\alpha(e_1)=(v_i,v_j)$ gives the end point of an edge. The graph is represented by an adjacency matrix A where $A_{i,j}=a_{ij}=1$ if link exists, 0 otherwise.

Consider two random variables $\theta$ and $\omega$ which represent the incoming and outgoing probability distributions. Entropy is the $\emph{uncertainty or unpredictability}$ in the random variable. It is the information content of the variable. The entropy is the maximum when it is most difficult to predict what the value of the random variable is going to be. Assuming that the incoming and outgoing probabilities between nodes are completely independent of each other, the structure of the graph which results in the highest entropy signifies the most optimally communicative nodes. In other words, graphs which have high entropy in their incoming and outgoing probability distributions are likely to be more engaged. The incoming and outgoing probabilities are represented by two n-by-n matrices $X$ and $Y$ where $X_{i,j}=x_{ij}$ is the probability of “i” $\emph{sending}$ a message to “j” and $Y_{i,j}=y_{ij}$ is the probability of “i” $\emph{getting}$ a message from “j”.

We first construct an ER graph in which the probability of a link existing between $i$ and $j$ is given by $0. The entropy of $\theta$ and $\omega$ is computed as follows,

$H_\theta = \sum_{i=1}^n \sum_{j=1}^n a_{ij}*x_{ij}log(\frac{1}{x_{ij}})$

$H_\omega = \sum_{i=1}^n \sum_{j=1}^n a_{ij}*y_{ij}log(\frac{1}{y_{ij}})$

The entropies are combined into a single weighted variable say “crowd value” as follows,

$V(\beta) = \beta H_\theta + (1-\beta)H_\omega$

To maximize the entropy of the graph we iteratively change the structure of the graph and compute the entropy for the resultant structure. If the entropy is higher than the previous stage the graph is accepted as a “better” version. At this point we compute some structural properties of the graph namely

These metrics of graph are recorded after every iteration and their progress is charted along with the increasing communication entropy. The results can be seen in the figure. The whole process is repeated for different values of $p$ which is the probability of a link existing.
It can be clearly seen that a highly connected graph with low average shortest path length and high clustering are usually $\emph{valueable"}$ crowds.