Now showing 1 - 10 of 27
  • Publication
    Deterministic Bayesian inference for the p* model
    (Journal of Machine Learning Research (JMLR), 2010) ;
    The p* model is widely used in social network analysis. The likelihood of a network under this model is impossible to calculate for all but trivially small networks. Various approximation have been presented in the literature, and the pseudolikelihood approximation is the most popular. The aim of this paper is to introduce two likelihood approximations which have the pseudolikelihood estimator as a special case. We show, for the examples that we have considered, that both approximations result in improved estimation of model parameters with respect to the standard methodological approaches. We provide a deterministic approach and also illustrate how Bayesian model choice can be carried out in this setting.
  • Publication
    Calibration of conditional composite likelihood for Bayesian inference on Gibbs random fields
    (Microtome Publishing, 2015-05-12) ;
    Gibbs random fields play an important role in statistics, however, the resulting likelihood is typically unavailable due to an intractable normalizing constant. Composite likelihoods offer a principled means to construct useful approximations. This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples.
  • Publication
    A generalized multiple-try version of the Reversible Jump algorithm
    The Reversible Jump algorithm is one of the most widely used Markov chain Monte Carlo algorithms for Bayesian estimation and model selection. A generalized multiple-try version of this algorithm is proposed. The algorithm is based on drawing several proposals at each step and randomly choosing one of them on the basis of weights (selection probabilities) that may be arbitrarily chosen. Among the possible choices, a method is employed which is based on selection probabilities depending on a quadratic approximation of the posterior distribution. Moreover, the implementation of the proposed algorithm for challenging model selection problems, in which the quadratic approximation is not feasible, is considered. The resulting algorithm leads to a gain in efficiency with respect to the Reversible Jump algorithm, and also in terms of computational effort. The performance of this approach is illustrated for real examples involving a logistic regression model and a latent class model.
      313Scopus© Citations 10
  • Publication
    Bayesian model selection for the latent position cluster model for Social Networks
    (Cambridge University Press, 2017-03) ; ;
    The latent position cluster model is a popular model for the statistical analysis of network data. This model assumes that there is an underlying latent space in which the actors follow a finite mixture distribution. Moreover, actors which are close in this latent space are more likely to be tied by an edge. This is an appealing approach since it allows the model to cluster actors which consequently provides the practitioner with useful qualitative information. However, exploring the uncertainty in the number of underlying latent components in the mixture distribution is a complex task. The current state-of-the-art is to use an approximate form of BIC for this purpose, where an approximation of the log-likelihood is used instead of the true log-likelihood which is unavailable. The main contribution of this paper is to show that through the use of conjugate prior distributions, it is possible to analytically integrate out almost all of the model parameters, leaving a posterior distribution which depends on the allocation vector of the mixture model. This enables posterior inference over the number of components in the latent mixture distribution without using trans-dimensional MCMC algorithms such as reversible jump MCMC. Our approach is compared with the state-of-the-art latentnet (Krivitsky & Handcock, 2015) and VBLPCM (Salter-Townshend & Murphy, 2013) packages.
      262Scopus© Citations 5
  • Publication
    Optimal Bayesian estimators for latent variable cluster models
    (Springer, 2017-10-31) ;
    In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretic approach to define an optimality criterion for clusterings, and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We consider several loss functions to compare partitions, and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on a variety of real-data applications for three different clustering models: Gaussian finite mixtures, stochastic block models and latent block models for networks.
      253Scopus© Citations 17
  • Publication
    Bayesian variational inference for exponential random graph models
    (Taylor & Francis, 2020-04-15) ;
    Deriving Bayesian inference for exponential random graph models (ERGMs) is a challenging “doubly intractable” problem as the normalizing constants of the likelihood and posterior density are both intractable. Markov chain Monte Carlo (MCMC) methods which yield Bayesian inference for ERGMs, such as the exchange algorithm, are asymptotically exact but computationally intensive, as a network has to be drawn from the likelihood at every step using, for instance, a “tie no tie” sampler. In this article, we develop a variety of variational methods for Gaussian approximation of the posterior density and model selection. These include nonconjugate variational message passing based on an adjusted pseudolikelihood and stochastic variational inference. To overcome the computational hurdle of drawing a network from the likelihood at each iteration, we propose stochastic gradient ascent with biased but consistent gradient estimates computed using adaptive self-normalized importance sampling. These methods provide attractive fast alternatives to MCMC for posterior approximation. We illustrate the variational methods using real networks and compare their accuracy with results obtained via MCMC and Laplace approximation.
      110Scopus© Citations 4
  • Publication
    Choosing the number of groups in a latent stochastic block model for dynamic networks
    (Cambridge University Press, 2018-11-15) ; ;
    Latent stochastic block models are flexible statistical models that are widely used in social network analysis. In recent years, efforts have been made to extend these models to temporal dynamic networks, whereby the connections between nodes are observed at a number of different times. In this paper we extend the original stochastic block model by using a Markovian property to describe the evolution of nodes cluster memberships over time. We recast the problem of clustering the nodes of the network into a model-based context, and show that the integrated completed likelihood can be evaluated analytically for a number of likelihood models. Then, we propose a scalable greedy algorithm to maximise this quantity, thereby estimating both the optimal partition and the ideal number of groups in a single inferential framework. Finally we propose applications of our methodology to both real and artificial datasets.
      231Scopus© Citations 8
  • Publication
    Model comparison for Gibbs random fields using noisy reversible jump Markov chain Monte Carlo
    The reversible jump Markov chain Monte Carlo (RJMCMC) method offers an across-model simulation approach for Bayesian estimation and model comparison, by exploring the sampling space that consists of several models of possibly varying dimensions. A naive implementation of RJMCMC to models like Gibbs random fields suffers from computational difficulties: the posterior distribution for each model is termed doubly-intractable since computation of the likelihood function is rarely available. Consequently, it is simply impossible to simulate a transition of the Markov chain in the presence of likelihood intractability. A variant of RJMCMC is presented, called noisy RJMCMC, where the underlying transition kernel is replaced with an approximation based on unbiased estimators. Based on previous theoretical developments, convergence guarantees for the noisy RJMCMC algorithm are provided. The experiments show that the noisy RJMCMC algorithm can be much more efficient than other exact methods, provided that an estimator with controlled Monte Carlo variance is used, a fact which is in agreement with the theoretical analysis.
      241Scopus© Citations 1
  • Publication
    Classification using distance nearest neighbours
    This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (2002, 2003) and Cucala et al. (2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm.
      306Scopus© Citations 11
  • Publication
    Efficient model selection for probabilistic K nearest neighbour classification
    (Elsevier, 2015-02-03) ;
    Probabilistic K-nearest neighbour (PKNN) classification has been introduced to improve the performance of the original K-nearest neighbour (KNN) classification algorithm by explicitly modelling uncertainty in the classification of each feature vector. However, an issue common to both KNN and PKNN is to select the optimal number of neighbours, K. The contribution of this paper is to incorporate the uncertainty in K into the decision making, and consequently to provide improved classification with Bayesian model averaging. Indeed the problem of assessing the uncertainty in K can be viewed as one of statistical model selection which is one of the most important technical issues in the statistics and machine learning domain. In this paper, we develop a new functional approximation algorithm to reconstruct the density of the model (order) without relying on time consuming Monte Carlo simulations. In addition, the algorithms avoid cross validation by adopting Bayesian framework. The performance of the proposed approaches is evaluated on several real experimental datasets.
      344Scopus© Citations 12