Now showing 1 - 10 of 43
  • Publication
    Sentiment analysis of online media
    A joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
      948
  • Publication
    Sentiment Analysis of Online Media
    A joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
      591
  • Publication
    Joint Modelling of Multiple Network Views
    (Taylor and Francis, 2014-11-17) ;
    Latent space models (LSM) for network data were introduced by Holf et al. (2002) under the basic assumption that each node of the network has an unknown position in a D-dimensional Euclidean latent space: generally the smaller the distance between two nodes in the latent space, the greater their probability of being connected. In this paper we propose a variational inference approach to estimate the intractable posterior of the LSM. In many cases, different network views on the same set of nodes are available. It can therefore be useful to build a model able to jointly summarise the information given by all the network views. For this purpose, we introduce the latent space joint model (LSJM) that merges the information given by multiple network views assuming that the probability of a node being connected with other nodes in each network view is explained by a unique latent variable. This model is demonstrated on the analysis of two datasets: an excerpt of 50 girls from 'Teenage Friends and Lifestyle Study' data at three time points and the Saccharomyces cerevisiae genetic and physical protein-protein interactions.
      376
  • Publication
    Clustering ranked preference data using sociodemographic covariates
    Ranked preference data arise when a set of judges rank, in order of their preference, a set of objects. Such data arise in preferential voting systems and market research surveys. Covariate data associated with the judges are also often recorded. Such covariate data should be used in conjunction with preference data when drawing inferences about judges. To cluster a population of judges, the population is modelled as a collection of homogeneous groups. The Plackett-Luce model for ranked data is employed to model a judge’s ranked preferences within a group. A mixture of Plackett-Luce models is employed to model the population of judges, where each component in the mixture represents a group of judges. Mixture of experts models provide a framework in which covariates are included in mixture models. Covariates are included through the mixing proportions and the component density parameters. A mixture of experts model for ranked preference data is developed by combining a mixture of experts model and a mixture of Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The mixing proportions and group specific parameters are potentially dependent on covariates. Model selection procedures are employed to choose optimal models. Model parameters are estimated via the ‘EMM algorithm’, a hybrid of the Expectation-Maximization and the Minorization-Maximization algorithms. Examples are provided through a menu survey and through Irish election data. Results indicate mixture modelling using covariates is insightful when examining a population of judges who express preferences.
      606
  • Publication
    Mixed membership models for rank data: Investigating structure in Irish voting data
    A mixed membership model is an individual level mixture model where individuals have partial membership of the profiles (or groups) that characterize a population. A mixed membership model for rank data is outlined and illustrated through the analysis of voting in the 2002 Irish general election. This particular election uses a voting system called proportional representation using a single transferable vote (PR-STV) where voters rank some or all of the candidates in order of preference. The data set considered consists of all votes in a constituency from the 2002 Irish general election. Interest lies in highlighting distinct voting profiles within the electorate and studying how voters affiliate themselves to these voting profiles. The mixed membership model for rank data is fitted to the voting data and is shown to give a concise and highly interpretable explanation of voting patterns in this election.
      304
  • Publication
    Preferences in college applications - a nonparametric Bayesian analysis of top-10 rankings
    Applicants to degree courses in Irish colleges and universities rank up to ten degree courses from a list of over five hundred. These data provide a wealth of information concerning applicant degree choices. A Dirichlet process mixture of generalized Mallows models are used to explore data from a cohort of applicants. We find strong and diverse clusters, which in turn gains us important insights into the workings of the system. No previously tried models or analysis technique are able to model the data with comparable accuracy.
      271
  • Publication
    Variational Bayesian inference for the Latent Position Cluster Model
    Many recent approaches to modeling social networks have focussed on embedding the actors in a latent “social space”. Links are more likely for actors that are close in social space than for actors that are distant in social space. In particular, the Latent Position Cluster Model (LPCM) [1] allows for explicit modelling of the clustering that is exhibited in many network datasets. However, inference for the LPCM model via MCMC is cumbersome and scaling of this model to large or even medium size networks with many interacting nodes is a challenge. Variational Bayesian methods offer one solution to this problem. An approximate, closed form posterior is formed, with unknown variational parameters. These parameters are tuned to minimize the Kullback-Leibler divergence between the approximate variational posterior and the true posterior, which known only up to proportionality. The variational Bayesian approach is shown to give a computationally efficient way of fitting the LPCM. The approach is demonstrated on a number of data sets and it is shown to give a good fit.
      724
  • Publication
    Clustering with the multivariate normal inverse Gaussian distribution
    Many model-based clustering methods are based on a finite Gaussian mixture model. The Gaussian mixture model implies that the data scatter within each group is elliptically shaped. Hence non-elliptical groups are often modeled by more than one component, resulting in model over-fitting. An alternative is to use a mean–variance mixture of multivariate normal distributions with an inverse Gaussian mixing distribution (MNIG) in place of the Gaussian distribution, to yield a more flexible family of distributions. Under this model the component distributions may be skewed and have fatter tails than the Gaussian distribution. The MNIG based approach is extended to include a broad range of eigendecomposed covariance structures. Furthermore, MNIG models where the other distributional parameters are constrained is considered. The Bayesian Information Criterion is used to identify the optimal model and number of mixture components. The method is demonstrated on three sample data sets and a novel variation on the univariate Kolmogorov–Smirnov test is used to assess goodness of fit.
      17366Scopus© Citations 56
  • Publication
    Variational Bayesian inference for the Latent Position Cluster Model for network data
    A number of recent approaches to modeling social networks have focussed on embedding the nodes in a latent “social space”. Nodes that are in close proximity are more likely to form links than those who are distant. This naturally accounts for reciprocal and transitive relationships which are commonly found in many network datasets. The Latent Position Cluster Model is one such model that also explicitly incorporates clustering by modeling the locations using a finite Gaussian mixture model. Observed covariates and sociality random effects may also be modeled. However, inference for the model via MCMC is cumbersome and thus scaling to large networks is a challenge. Variational Bayesian methods offer an alternative inference methodology for this problem. Sampling based MCMC is replaced by an optimization that requires many orders of magnitude fewer iterations to converge. A Variational Bayesian algorithm for the Latent Position Cluster Model is therefore developed and demonstrated.
      165Scopus© Citations 48