Options
Large-Scale Data Mining Techniques for Crop Management
Author(s)
Date Issued
2025
Date Available
2026-01-26T11:23:34Z
Abstract
This thesis introduces a modular and scalable predictive modelling architecture designed to address computational and structural challenges in analysing large-scale, heterogeneous datasets. The core innovation is the two-stage Predictive Timeline Scheduling (PTS) model, a structured forecasting framework for accurately predicting event-based actions across multi-timeline settings. The model first forecasts the number of future actions (e.g., application events) and then estimates their corresponding quantities and timings. This hierarchical approach addresses limitations in traditional machine learning models, including poor handling of temporally sequenced data, sensitivity to heterogeneity, and difficulties accommodating parallel event timelines. The PTS model's univariate stage predicts action frequencies for a specific category, while the multivariate stage leverages inter-feature correlations to forecast timing and quantities across concurrent event sequences. This dual-stage method creates a temporal structure, enabling accurate modelling of overlapping and event-driven sequences essential for multi-dimensional forecasting. To address inconsistency in diverse datasets, the research developed a comprehensive Fertiliser Dictionary to standardise non-uniform identifiers and nutrient compositions. An Automated Standardised Data Mapping (ASDM) method was employed to harmonise variable representations, supported by a structured product dictionary to translate unstructured identifiers into numerical and categorical forms. This pre-processing framework enhances dataset reliability and reusability for downstream predictive tasks, reducing noise from naming conflicts and incomplete data. The architecture integrates several data management frameworks: Slice Separate and Link (SSL), Layer Linking Stream (LLS), and Separate Order Connect (SOC), to support structured alignment of temporal streams. These frameworks isolate, link, and reconnect event sequences and feature layers at each prediction stage, reducing data complexity and optimising representation for time-sensitive forecasting. The system also supports modular data pipeline construction, allowing for adaptation to variable-length inputs and maintaining a consistent, structured prediction flow. Model evaluation includes accuracy metrics across training and test splits, with cross-validation to ensure generalisability. The evaluation also integrates auxiliary predictors for robust outcome validation. An application consistency validator ensures that predicted event numbers and their total quantity align with aggregated outcomes, even in scenarios where multiple occurrences per target are possible. A yield consistency checker verifies cumulative predictions against expected outcomes, maintaining coherence under real-world variability. These measures guarantee realistic and consistent model outputs, particularly in multi-action scheduling scenarios. Although initially validated in digital agriculture, the methods demonstrate universal applicability. The system architecture, prediction flow, and data transformation techniques are transferable to traffic control, logistics, energy forecasting, financial scheduling, and health informatics fields. These domains benefit from structured timeline prediction to support real-time or large-scale decision-making processes. This research establishes a flexible, scalable framework capable of managing complex data streams, contributing to developing general-purpose, event-aligned forecasting systems with dynamic input handling and modular scaling.
Type of Material
Master Thesis
Qualification Name
Master of Science (M.Sc.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2025 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Ikhlaq2025.pdf
Size
2.69 MB
Format
Adobe PDF
Checksum (MD5)
36882fbcdcfe78750b0bc834f18b0e23
Owning collection