Web robot detection in scholarly Open Access institutional repositories

Files in This Item:
 File SizeFormat
DownloadRobotsLibHiTechAcceptedPostPrintRepository2016-06-20.pdf621.94 kBAdobe PDF
Title: Web robot detection in scholarly Open Access institutional repositories
Authors: Greene, Joseph
Permanent link: http://hdl.handle.net/10197/7682
Date: Jul-2016
Online since: 2016-08-14T01:00:36Z
Abstract: Purpose -- This paper investigates the impact and techniques for mitigating the effects of web robots on usage statistics collected by Open Access institutional repositories (IRs). Design/methodology/approach -- A review of the literature provides a comprehensive list of web robot detection techniques. Reviews of system documentation and open source code are carried out along with personal interviews to provide a comparison of the robot detection techniques used in the major IR platforms. An empirical test based on a simple random sample of downloads with 96.20% certainty is undertaken to measure the accuracy of an IR's web robot detection at a large Irish University. Findings -- While web robot detection is not ignored in IRs, there are areas where the two main systems could be improved. The technique tested here is found to have successfully detected 94.18% of web robots visiting the site over a two-year period (recall), with a precision of 98.92%. Due to the high level of robot activity in repositories, correctly labelling more robots has an exponential effect on the accuracy of usage statistics. Limitations -- This study is performed on one repository using a single system. Future studies across multiple sites and platforms are needed to determine the accuracy of web robot detection in OA repositories generally. Originality/value -- This is the only study to date to have investigated web robot detection in IRs. It puts forward the first empirical benchmarking of accuracy in IR usage statistics.
Type of material: Journal Article
Publisher: Emerald
Journal: Library Hi Tech
Volume: 34
Issue: 3
Start page: 500
End page: 520
Keywords: Open AccessInstitutional repositoriesUsage statisticsDownloadsWeb robots
DOI: 10.1108/LHT-04-2016-0048
Language: en
Status of Item: Peer reviewed
This item is made available under a Creative Commons License: https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Appears in Collections:UCD Library Staff Research Collection

Show full item record

Citations 50

Last Week
Last month
checked on Sep 11, 2020

Page view(s) 1

Last Week
Last month
checked on Jun 22, 2021

Download(s) 10

checked on Jun 22, 2021

Google ScholarTM



If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.