Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
  • Colleges & Schools
  • Statistics
  • All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Mathematics and Statistics
  4. Mathematics and Statistics Research Collection
  5. Model-Based and Nonparametric Approaches to Clustering for Data Compression in Actuarial Applications
 
  • Details
Options

Model-Based and Nonparametric Approaches to Clustering for Data Compression in Actuarial Applications

File(s)
FileDescriptionSizeFormat
Download Clustering Paper NAAJ for research repository (2).pdf3.34 MB
Author(s)
O'Hagan, Adrian 
Ferrari, Colm 
Uri
http://hdl.handle.net/10197/8168
Date Issued
2016
Date Available
04T01:00:13Z May 2018
Abstract
Clustering is used by actuaries in a data compression process to make massive or nested stochastic simulations practical to run. A large data set of assets or liabilities is partitioned into a user-defined number of clusters, each of which is compressed to a single representative policy. The representative policies can then simulate the behavior of the entire portfolio over a large range of stochastic scenarios. Such processes are becoming increasingly important in understanding product behavior and assessing reserving requirements in a big-data environment. This article proposes a variety of clustering techniques that can be used for this purpose. Initialization methods for performing clustering compression are also compared, including principal components, factor analysis and segmentation. A variety of methods for choosing a cluster's representative policy is considered. A real data set comprised of variable annuity policies, provided by Milliman, is used to test the proposed methods. It is found that the compressed data sets produced by the new methods, namely model-based clustering, Ward's minimum variance hierarchical clustering and k-medoids clustering, can replicate the behavior of the uncompressed (seriatim) data more accurately than those obtained by the existing Milliman method. This is verified within sample, by examining location variable totals of the representative policies versus the uncompressed data at the five levels of compression of interest. More crucially it is also verified out of sample by comparing the distributions of the present values of several variables after 20 years across 1,000 simulated scenarios based on the compressed and seriatim data, using Kolmogorov-Smirnov goodness-of-fit tests and weighted sums of squared differences.
Type of Material
Journal Article
Publisher
Taylor and Francis
Journal
North American Actuarial Journal
Volume
21
Issue
1
Start Page
107
End Page
146
Copyright (Published Version)
2016 Society of Actuaries
Keywords
  • Actuarial data compre...

  • Model-based clusterin...

DOI
10.1080/10920277.2016.1234398
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Owning collection
Mathematics and Statistics Research Collection
Scopus© citations
6
Acquisition Date
Feb 5, 2023
View Details
Views
1281
Last Month
1
Acquisition Date
Feb 5, 2023
View Details
Downloads
389
Last Week
4
Last Month
7
Acquisition Date
Feb 5, 2023
View Details
google-scholar
University College Dublin Research Repository UCD
The Library, University College Dublin, Belfield, Dublin 4
Phone: +353 (0)1 716 7583
Fax: +353 (0)1 283 7667
Email: mailto:research.repository@ucd.ie
Guide: http://libguides.ucd.ie/rru

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement