A crowd can be represented as a complex network of people. The people are represented by nodes which are connected to each other by relationships or edges. These edges have different kind of semantics associated with them. For example, the relationship can be that of “friends” or “follows” etc. Many different kinds of networks can be imagined using different semantic relationships. We already have Facebook which is a “friend” network, Twitter a “follow” network, Linkedin a “professional” network. Now there are new more specialized networks popping up like Doximity a network for doctors, Familyleaf is a networks for family, Anglelist is a network in the startup world, and many more. Spigit is a network for innovation in the same vein.
We have seen that networks are useful when they are active and engaged. There are multiple ways of measuring “engagement”. Metrics ranges from number of sign ins, number of page views, content submitted and many more. I propose that the number of messages sent between the nodes is a pretty good way to measure engagement among a crowd network.
An experiment
Consider a crowd of people, represented by a network where
is the set of vertices and
is the set of edges. A relationship
gives the end point of an edge. The graph is represented by an adjacency matrix A where
if link exists, 0 otherwise.
Consider two random variables and
which represent the incoming and outgoing probability distributions. Entropy is the
in the random variable. It is the information content of the variable. The entropy is the maximum when it is most difficult to predict what the value of the random variable is going to be. Assuming that the incoming and outgoing probabilities between nodes are completely independent of each other, the structure of the graph which results in the highest entropy signifies the most optimally communicative nodes. In other words, graphs which have high entropy in their incoming and outgoing probability distributions are likely to be more engaged. The incoming and outgoing probabilities are represented by two n-by-n matrices
and
where
is the probability of “i”
a message to “j” and
is the probability of “i”
a message from “j”.
We first construct an ER graph in which the probability of a link existing between and
is given by
. The entropy of
and
is computed as follows,
The entropies are combined into a single weighted variable say “crowd value” as follows,

increase/decrease the values quickly normalize to the theoretical limits.
To maximize the entropy of the graph we iteratively change the structure of the graph and compute the entropy for the resultant structure. If the entropy is higher than the previous stage the graph is accepted as a “better” version. At this point we compute some structural properties of the graph namely
1. number of links
2. clustering coeffcicient
3. average shortest path length
4. number of components in the graph
These metrics of graph are recorded after every iteration and their progress is charted along with the increasing communication entropy. The results can be seen in the figure. The whole process is repeated for different values of which is the probability of a link existing.
It can be clearly seen that a highly connected graph with low average shortest path length and high clustering are usually crowds.