Informed sub-sampling MCMC: approximate Bayesian inference for large datasets

Maire, Florian; Friel, Nial; Alquier, Pierre

doi:10.1007/s11222-018-9817-3

Informed sub-sampling MCMC: approximate Bayesian inference for large datasets

Author(s)

Maire, Florian

Friel, Nial

Alquier, Pierre

Uri

http://hdl.handle.net/10197/10403

Date Issued

2018-06-09

Date Available

2019-05-13T09:14:31Z

Abstract

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and flexible approach which, contrary to existing scalable methodologies, preserves the simplicity of the Metropolis–Hastings algorithm. Even though exactness is lost, i.e the chain distribution approximates the posterior, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

Sponsorship

Science Foundation Ireland

Other Sponsorship

Insight Centre for Data Analytics

Labex ECODEC

Fondation du Risque

Type of Material

Journal Article

Publisher

Springer

Journal

Statistics and Computing

Volume

29

Issue

3

Start Page

449

End Page

482

Copyright (Published Version)

2018 Springer

Subjects

Bayesian inference

Big-data

Approximate Bayesian ...

Noisy Markov chain Mo...

DOI

10.1007/s11222-018-9817-3

Language

English

Status of Item

Peer reviewed

This item is made available under a Creative Commons License

https://creativecommons.org/licenses/by-nc-nd/3.0/ie/

Name

insight_publication.pdf

Size

1.24 MB

Format

Adobe PDF

Checksum (MD5)

9debaa5da2c5bdb45fbde639cde943af

Owning collection

Insight Research Collection

Options

Informed sub-sampling MCMC: approximate Bayesian inference for large datasets