    Identifying representative textual sources in blog networks
    We apply methods from social network analysis and visualization to facilitate a study of the Irish blogosphere from a cultural studies perspective. We focus on solving the practical issues that arise when the goal is to perform textual analysis of the corpus produced by a network of bloggers. Previous studies into blogging networks have noted difficulties arising when trying to identify the extent and boundaries of these networks. As a response to calls for increasingly data-led approaches in media and cultural studies, we discuss a variety of social network analysis methods that can be used to identify which blogs can be seen as members of a posited "Irish blogging network". We identify hub blogs, communities of sites corresponding to different topics, and representative bloggers within these communities. Based on this study, we propose a set of analysis guidelines for researchers who wish to map out blogging networks.
    Detecting highly overlapping community structure by greedy clique expansion
    In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.
    Link Prediction with Social Vector Clocks
    State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed computing to social interaction networks. In fact, our experiments suggest that by taking into account the order and spacing of interactions, social vector clocks exploit different aspects of link formation so that their combination with previous approaches yields the most accurate predictor to date.
    Simmelian Backbones: Amplifying Hidden Homophily in Facebook Networks
    Empirical social networks are often aggregate proxies for several heterogeneous relations. In online social networks, for instance, interactions related to friendship, kinship, business, interests, and other relationships may all be represented as catchall 'friendships.' Because several relations are mingled into one, the resulting networks exhibit relatively high and uniform density. As a consequence, the variation in positional differences and local cohesion may be too small for reliable analysis. We introduce a method to identify the essential relationships in networks representing social interactions. Our method is based on a novel concept of triadic cohesion that is motivated by Simmel's concept of membership in social groups. We demonstrate that our Simmelian backbones are capable of extracting structure from Facebook interaction networks that makes them easy to visualize and analyze. Since all computations are local, the method can be restricted to partial networks such as ego networks, and scales to big data.
    Community detection: effective evaluation on large social networks
    (Oxford University Press, 2014) ;
    While many recently proposed methods aim to detect network communities in large datasets, such as those generated by social media and telecommunications services, most evaluation (i.e. benchmarking) of this research is based on small, hand-curated datasets. We argue that these two types of networks differ so significantly that, by evaluating algorithms solely on the smaller networks, we know little about how well they perform on the larger datasets. Recent work addresses this problem by introducing social network datasets annotated with meta-data that is believed to approximately indicate a 'ground truth' set of network communities. While such efforts are a step in the right direction, we find this meta-data problematic for two reasons. First, in practice, the groups contained in such meta-data may only be a subset of a network’s communities. Second, while it is often reasonable to assume that meta-data is related to network communities in some way, we must be cautious about assuming that these groups correspond closely to network communities. Here, we consider these difficulties and propose an evaluation scheme based on a classification task that is tailored to deal with them.
