Now showing 1 - 4 of 4
  • Publication
    Prediction of time series by statistical learning: general losses and fast rates
    We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence √ d/n for the Gibbs estimator under the absolute loss were given in a previous work [7], where n is the sample size and d the dimension of the set of predictors. Under the same weak dependence conditions, we extend this result to any convex Lipschitz loss function. We also identify a condition on the parameter space that ensures similar rates for the classical penalized ERM procedure. We apply this method for quantile forecasting of the French GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and for uniformly mixing processes, we prove that the Gibbs estimator actually achieves fast rates of convergence d/n. We discuss the optimality of these different rates pointing out references to lower bounds when they are available. In particular, these results bring a generalization the results of [29] on sparse regression estimation to some autoregression.
      220Scopus© Citations 17
  • Publication
    Informed sub-sampling MCMC: approximate Bayesian inference for large datasets
    This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and flexible approach which, contrary to existing scalable methodologies, preserves the simplicity of the Metropolis–Hastings algorithm. Even though exactness is lost, i.e the chain distribution approximates the posterior, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.
      220Scopus© Citations 5
  • Publication
    Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels
    Monte Carlo algorithms often aim to draw from a distribution π by simulating a Markov chain with transition kernel P such that π is invariant under P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace P by an approximation Pˆ. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how 'close' the chain given by the transition kernel Pˆ is to the chain given by P. We apply these results to several examples from spatial statistics and network analysis.
      244Scopus© Citations 58
  • Publication
    A Bayesian approach for noisy matrix completion: Optimal rate under general sampling distribution
    (Institute of Mathematical Statistics, 2015-04) ;
    Bayesian methods for low-rank matrix completion with noise have been shown to be very efficient computationally [3, 18, 19, 24, 28]. While the behaviour of penalized minimization methods is well understood both from the theoretical and computational points of view (see [7, 9, 16, 23] among others) in this problem, the theoretical optimality of Bayesian estimators have not been explored yet. In this paper, we propose a Bayesian estimator for matrix completion under general sampling distribution. We also provide an oracle inequality for this estimator. This inequality proves that, whatever the rank of the matrix to be estimated, our estimator reaches the minimax-optimal rate of convergence (up to a logarithmic factor). We end the paper with a short simulation study.
      548Scopus© Citations 18