In this study, a GE-framework is built, in an effort to apply it to huge datasets. Combining statistical techniques such as appropriate error measures and data splitting, population-based improvements such as mass parallelisation, and even specific techniques such as grammar design and repeat management, GE is applied for the first time to massive datasets, such as the Higgs dataset (eleven million samples).