Now showing 1 - 10 of 27
  • Publication
    Bayesian exponential random graph models with nodal random effects
    We extend the well-known and widely used Exponential Random Graph Model (ERGM) by including nodal random effects to compensate for heterogeneity in the nodes of a network. The Bayesian framework for ERGMs proposed by Caimo and Friel (2011) yields the basis of our modelling algorithm. A central question in network models is the question of model selection and following the Bayesian paradigm we focus on estimating Bayes factors. To do so we develop an approximate but feasible calculation of the Bayes factor which allows one to pursue model selection. Two data examples and a small simulation study illustrate our mixed model approach and the corresponding model selection.
      231Scopus© Citations 26
  • Publication
    Inferring structure in bipartite networks using the latent block model and exact ICL
    (Cambridge University Press, 2017-02-01) ; ;
    We consider the task of simultaneous clustering of the two node sets involved in a bipartite network. The approach we adopt is based on use of the exact integrated complete likelihood for the latent blockmodel. Using this allows one to infer the number of clusters as well as cluster memberships using a greedy search. This gives a model-based clustering of the node sets. Experiments on simulated bipartite network data show that the greedy search approach is vastly more scalable than competing Markov chain Monte Carlo-based methods. Application to a number of real observed bipartite networks demonstrate the algorithms discussed.
      469Scopus© Citations 22
  • Publication
    Properties of Latent Variable Network Models
    (Cambridge University Press, 2016-12-12) ; ;
    We derive properties of Latent Variable Models for networks, a broad class ofmodels that includes the widely-used Latent Position Models. These include theaverage degree distribution, clustering coefficient, average path length and degreecorrelations. We introduce the Gaussian Latent Position Model, and derive analyticexpressions and asymptotic approximations for its network properties. Wepay particular attention to one special case, the Gaussian Latent Position Modelswith Random Effects, and show that it can represent the heavy-tailed degree distributions,positive asymptotic clustering coefficients and small-world behaviours thatare often observed in social networks. Several real and simulated examples illustratethe ability of the models to capture important features of observed networks.
      425Scopus© Citations 31
  • Publication
    Efficient MCMC for Gibbs Random Fields using pre-computation
    (The Institute of Mathematical Statistics and the Bernoulli Society, 2018-05-31) ; ;
    Bayesian inference of Gibbs random fields (GRFs) is often referred to as a doubly intractable problem, since the likelihood function is intractable. The exploration of the posterior distribution of such models is typically carried out with a sophisticated Markov chain Monte Carlo (MCMC) method, the exchange algorithm (Murray et al., 2006), which requires simulations from the likelihood function at each iteration. The purpose of this paper is to consider an approach to dramatically reduce this computational overhead. To this end we introduce a novel class of algorithms which use realizations of the GRF model, simulated offline, at locations specified by a grid that spans the parameter space. This strategy speeds up dramatically the posterior inference, as illustrated on several examples. However, using the pre-computed graphs introduces a noise in the MCMC algorithm, which is no longer exact. We study the theoretical behaviour of the resulting approximate MCMC algorithm and derive convergence bounds using a recent theoretical development on approximate MCMC methods.
      247Scopus© Citations 5
  • Publication
    Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion
    The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution here is to illustrate that through the use of conjugate priors one can derive an exact expression for ICL and so avoiding any approximation. Moreover, we illustrate how one can find both the number of clusters and the best allocation of observations in one algorithmic framework. The performance of our algorithm is presented on several simulated and real examples.
      386Scopus© Citations 28
  • Publication
    Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels
    Monte Carlo algorithms often aim to draw from a distribution π by simulating a Markov chain with transition kernel P such that π is invariant under P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace P by an approximation Pˆ. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how 'close' the chain given by the transition kernel Pˆ is to the chain given by P. We apply these results to several examples from spatial statistics and network analysis.
      318Scopus© Citations 65
  • Publication
    Bayesian Model Selection for Exponential Random Graph Models via Adjusted Pseudolikelihoods
    (Taylor & Francis, 2018-06-11) ; ;
    Models with intractable likelihood functions arise in areas including network analysisand spatial statistics, especially those involving Gibbs random fields. Posterior parameter estimationin these settings is termed a doubly-intractable problem because both the likelihoodfunction and the posterior distribution are intractable. The comparison of Bayesian models isoften based on the statistical evidence, the integral of the un-normalised posterior distributionover the model parameters which is rarely available in closed form. For doubly-intractablemodels, estimating the evidence adds another layer of difficulty. Consequently, the selectionof the model that best describes an observed network among a collection of exponentialrandom graph models for network analysis is a daunting task. Pseudolikelihoods offer atractable approximation to the likelihood but should be treated with caution because they canlead to an unreasonable inference. This paper specifies a method to adjust pseudolikelihoodsin order to obtain a reasonable, yet tractable, approximation to the likelihood. This allowsimplementation of widely used computational methods for evidence estimation and pursuitof Bayesian model selection of exponential random graph models for the analysis of socialnetworks. Empirical comparisons to existing methods show that our procedure yields similarevidence estimates, but at a lower computational cost.
      379Scopus© Citations 14
  • Publication
    Bayesian model selection for the latent position cluster model for Social Networks
    (Cambridge University Press, 2017-03) ; ;
    The latent position cluster model is a popular model for the statistical analysis of network data. This model assumes that there is an underlying latent space in which the actors follow a finite mixture distribution. Moreover, actors which are close in this latent space are more likely to be tied by an edge. This is an appealing approach since it allows the model to cluster actors which consequently provides the practitioner with useful qualitative information. However, exploring the uncertainty in the number of underlying latent components in the mixture distribution is a complex task. The current state-of-the-art is to use an approximate form of BIC for this purpose, where an approximation of the log-likelihood is used instead of the true log-likelihood which is unavailable. The main contribution of this paper is to show that through the use of conjugate prior distributions, it is possible to analytically integrate out almost all of the model parameters, leaving a posterior distribution which depends on the allocation vector of the mixture model. This enables posterior inference over the number of components in the latent mixture distribution without using trans-dimensional MCMC algorithms such as reversible jump MCMC. Our approach is compared with the state-of-the-art latentnet (Krivitsky & Handcock, 2015) and VBLPCM (Salter-Townshend & Murphy, 2013) packages.
      319Scopus© Citations 6
  • Publication
    Efficient Bayesian inference for exponential random graph models by correcting the pseudo-posterior distribution
    Exponential random graph models are an important tool in the statistical analysis of data. However, Bayesian parameter estimation for these models is extremely challenging, since evaluation of the posterior distribution typically involves the calculation of an intractable normalizing constant. This barrier motivates the consideration of tractable approximations to the likelihood function, such as the pseudolikelihood function, which offers an approach to constructing such an approximation. Naive implementation of what we term a pseudo-posterior resulting from replacing the likelihood function in the posterior distribution by the pseudolikelihood is likely to give misleading inferences. We provide practical guidelines to correct a sample from such a pseudo-posterior distribution so that it is approximately distributed from the target posterior distribution and discuss the computational and statistical efficiency that result from this approach. We illustrate our methodology through the analysis of real-world graphs. Comparisons against the approximate exchange algorithm of Caimo and Friel (2011) are provided, followed by concluding remarks.
      212Scopus© Citations 18
  • Publication
    Model comparison for Gibbs random fields using noisy reversible jump Markov chain Monte Carlo
    The reversible jump Markov chain Monte Carlo (RJMCMC) method offers an across-model simulation approach for Bayesian estimation and model comparison, by exploring the sampling space that consists of several models of possibly varying dimensions. A naive implementation of RJMCMC to models like Gibbs random fields suffers from computational difficulties: the posterior distribution for each model is termed doubly-intractable since computation of the likelihood function is rarely available. Consequently, it is simply impossible to simulate a transition of the Markov chain in the presence of likelihood intractability. A variant of RJMCMC is presented, called noisy RJMCMC, where the underlying transition kernel is replaced with an approximation based on unbiased estimators. Based on previous theoretical developments, convergence guarantees for the noisy RJMCMC algorithm are provided. The experiments show that the noisy RJMCMC algorithm can be much more efficient than other exact methods, provided that an estimator with controlled Monte Carlo variance is used, a fact which is in agreement with the theoretical analysis.
      348Scopus© Citations 2