Now showing 1 - 4 of 4
  • Publication
    Choosing the number of groups in a latent stochastic block model for dynamic networks
    (Cambridge University Press, 2018-11-15) ; ;
    Latent stochastic block models are flexible statistical models that are widely used in social network analysis. In recent years, efforts have been made to extend these models to temporal dynamic networks, whereby the connections between nodes are observed at a number of different times. In this paper we extend the original stochastic block model by using a Markovian property to describe the evolution of nodes cluster memberships over time. We recast the problem of clustering the nodes of the network into a model-based context, and show that the integrated completed likelihood can be evaluated analytically for a number of likelihood models. Then, we propose a scalable greedy algorithm to maximise this quantity, thereby estimating both the optimal partition and the ideal number of groups in a single inferential framework. Finally we propose applications of our methodology to both real and artificial datasets.
      232Scopus© Citations 8
  • Publication
    Optimal Bayesian estimators for latent variable cluster models
    (Springer, 2017-10-31) ;
    In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretic approach to define an optimality criterion for clusterings, and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We consider several loss functions to compare partitions, and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on a variety of real-data applications for three different clustering models: Gaussian finite mixtures, stochastic block models and latent block models for networks.
      253Scopus© Citations 17
  • Publication
    Properties of Latent Variable Network Models
    (Cambridge University Press, 2016-12-12) ; ;
    We derive properties of Latent Variable Models for networks, a broad class ofmodels that includes the widely-used Latent Position Models. These include theaverage degree distribution, clustering coefficient, average path length and degreecorrelations. We introduce the Gaussian Latent Position Model, and derive analyticexpressions and asymptotic approximations for its network properties. Wepay particular attention to one special case, the Gaussian Latent Position Modelswith Random Effects, and show that it can represent the heavy-tailed degree distributions,positive asymptotic clustering coefficients and small-world behaviours thatare often observed in social networks. Several real and simulated examples illustratethe ability of the models to capture important features of observed networks.
      298Scopus© Citations 21
  • Publication
    Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion
    The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution here is to illustrate that through the use of conjugate priors one can derive an exact expression for ICL and so avoiding any approximation. Moreover, we illustrate how one can find both the number of clusters and the best allocation of observations in one algorithmic framework. The performance of our algorithm is presented on several simulated and real examples.
      290Scopus© Citations 23