A crowd can be represented as a complex network of people. The people are represented by nodes which are connected to each other by relationships or edges. These edges have different kind of semantics associated with them. For example, the relationship can be that of “friends” or “follows” etc. Many different kinds of networks can be imagined using different semantic relationships. We already have Facebook which is a “friend” network, Twitter a “follow” network, Linkedin a “professional” network. Now there are new more specialized networks popping up like Doximity a network for doctors, Familyleaf is a networks for family, Anglelist is a network in the startup world, and many more. Spigit is a network for innovation in the same vein.
We have seen that networks are useful when they are active and engaged. There are multiple ways of measuring “engagement”. Metrics ranges from number of sign ins, number of page views, content submitted and many more. I propose that the number of messages sent between the nodes is a pretty good way to measure engagement among a crowd network.
Consider a crowd of people, represented by a network where is the set of vertices and is the set of edges. A relationship gives the end point of an edge. The graph is represented by an adjacency matrix A where if link exists, 0 otherwise.
Consider two random variables and which represent the incoming and outgoing probability distributions. Entropy is the in the random variable. It is the information content of the variable. The entropy is the maximum when it is most difficult to predict what the value of the random variable is going to be. Assuming that the incoming and outgoing probabilities between nodes are completely independent of each other, the structure of the graph which results in the highest entropy signifies the most optimally communicative nodes. In other words, graphs which have high entropy in their incoming and outgoing probability distributions are likely to be more engaged. The incoming and outgoing probabilities are represented by two n-by-n matrices and where is the probability of “i” a message to “j” and is the probability of “i” a message from “j”.
We first construct an ER graph in which the probability of a link existing between and is given by . The entropy of and is computed as follows,
The entropies are combined into a single weighted variable say “crowd value” as follows,
To maximize the entropy of the graph we iteratively change the structure of the graph and compute the entropy for the resultant structure. If the entropy is higher than the previous stage the graph is accepted as a “better” version. At this point we compute some structural properties of the graph namely
1. number of links
2. clustering coeffcicient
3. average shortest path length
4. number of components in the graph
These metrics of graph are recorded after every iteration and their progress is charted along with the increasing communication entropy. The results can be seen in the figure. The whole process is repeated for different values of which is the probability of a link existing.
It can be clearly seen that a highly connected graph with low average shortest path length and high clustering are usually crowds.