Options
Kechadi, Tahar
Preferred name
Kechadi, Tahar
Official Name
Kechadi, Tahar
Research Output
Now showing 1 - 10 of 11
- PublicationMining Spatio-temporal Data at Different Levels of DetailIn this paper we propose a methodology for mining very large spatio-temporal datasets. We propose a two-pass strategy for mining and manipulating spatio-temporal datasets at different levels of detail (i.e., granularities). The approach takes advantage of the multi-granular capability of the underlying spatio-temporal model to reduce the amount of data that can be accessed initially. The approach is implemented and applied to real-world spatio-temporal datasets. We show that the technique can deal easily with very large datasets without losing the accuracy of the extracted patterns, as demonstrated in the experimental results.
Scopus© Citations 6 1102 - PublicationE-government Alerts Correlation Model(2014-11-19)
; ; Qatars IT infrastructure is rapidly growing to encompass the evolution of businesses and economical growth the country is increasingly witnessing throughout its industries. It is now evident that the countrys e-government requirements and associated data management systems are becoming large in number, highly dynamic in nature, and exceptionally attractive for cybercrime activities. Protecting the sensitive data e-government portals are relying on for daily activities is not a trivial task. The techniques used to perform cybercrimes are becoming sophisticated relatively with the firewalls protecting them. Reaching high-level of data protection, in both wired and wireless networks, in order to face recent cybercrime approaches is a challenge that is continuously proven hard to achieve.In a common IT infrastructure, the deployed network devices contain a number of event logs that reside locally within its memory. These logs are in large numbers, and therefore, analyzing them is a time consuming task for network administrators. In addition, a single network event often generates a redundancy of similar event logs that belong to the same class within short time intervals. The large amount of redundancy logs makes it difficult to manage them during forensics investigation. In most cybercrime cases, a single alert log does not contain sufficient information about malicious actionsbackground and invisible network attackers. The information for a particular malicious action or attacker is often distributed among multiple alert logs and among multiple network devices. Forensic investigators mission is to detect malicious activities and reconstruct incident scenarios is now very complex considering the number as well as the quality of these event logs.355 - PublicationADMIRE framework: Distributed Data Mining on Data Grid platforms(2006-09-14)
; ; In this paper, we present the ADMIRE architecture; a new framework for developing novel and innovative data mining techniques to deal with very large and distributed heterogeneous datasets in both commercial and academic applications. The main ADMIRE components are detailed as well as its interfaces allowing the user to efficiently develop and implement their data mining applications techniques on a Grid platform such as Globus ToolKit, DGET, etc.277 - PublicationVariance-based Clustering Technique for Distributed Data Mining ApplicationsNowadays, huge amounts of data are naturally collected in distributed sites due to different facts and moving these data through the network for extracting useful knowledge is almost unfeasible for either technical reasons or policies. Furthermore, classical parallel algorithms cannot be applied, specially in loosely coupled environments. This requires to develop scalable distributed algorithms able to return the global knowledge by aggregating local results in an effective way. In this paper we propose a distributed algorithm based on independent local clustering processes and a global merging based on minimum variance increases and requires a limited communication overhead. We also introduce the notion of distributed sub-clusters perturbation to improve the global generated distribution. We show that this algorithm improves the quality of clustering compared to classical local centralized ones and is able to find real global data nature or distribution.
142 - PublicationData Mining Techniques Applied to Wireless Sensor Networks for Early Forest Fire DetectionNowadays, forest fires are a serious threat to the environment and human life. The monitoring system for forest fires should be able to make a real-time monitoring of the target region and the early detection of fire threats. In this paper, we propose a new approach based on the integration of Data Mining techniques into sensor nodes for forest fire detection. This approach is based on the clustered WSN where each sensor node will individually decide on detecting fire using a classifier of Data Mining techniques. When a fire is detected, the correspondent node will send an alert through its cluster-head which will pass through gateways and other cluster-heads until it will reach the sink in order to inform the firefighters. We use the CupCarbon simulator to validate and evaluate our proposed approach. Through extensive simulation experiments, we show that our approach can provide a fast reaction to forest fires while consuming energy efficiently.
Scopus© Citations 24 1044 - PublicationUse of Data Mining Techniques to Predict Short Term Adverse Events Occurrence in NB-UVB Phototherapy Treatments(International Journal of Machine Learning and Computing, 2018-04)
; ; The prediction of short term adverse events occurrence in phototherapy treatment is important for the dermatologists who administrate phototherapy to adjust the treatment and standardize the clinical outcomes. Recently, a modeling technique which can detect the potential short term adverse events occurrence in phototherapy treatments is required for clinicians. Based on data mining, this study tends to explore the significant features and the class distribution of training data for the short term adverse events occurrence prediction in NB-UVB phototherapy treatments. The experimental results highlight that acceptable prediction accuracy can be achieved by using the significant features and the performance of the classifiers can be significantly improved by sampling 40% of negative class samples in training data, hyper parameter tuning of classifiers and use of stacked classifiers in creating prediction models.Scopus© Citations 2 327 - PublicationAn integrated model for financial data miningNowadays, financial data analysis is becoming increasingly importantin the business market. As companies collect more and more data fromdaily operations, they expect to extract useful knowledge from existing collecteddata to help make reasonable decisions for new customer requests, e.g. usercredit category, churn analysis, real estate analysis, etc. Financial institutes haveapplied different data mining techniques to enhance their business performance.However, simple approach of these techniques could raise a performance issue.Besides, there are very few general models for both understanding and forecastingdifferent financial fields. We present in this paper a new classification modelfor analyzing financial data. We also evaluate this model with different realworlddata to show its performance
333 - PublicationData Reduction in Very Large Spatio-Temporal Data SetsToday, huge amounts of data are being collected with spatial and temporal components from sources such as metrological, satellite imagery etc.. Efficient visualisation as well as discovery of useful knowledge from these datasets is therefore very challenging and becoming a massive economic need. Data Mining has emerged as the technology to discover hidden knowledge from very large size of data. Furthermore, data mining techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. As a consequence, instead of dealing with a large size of raw data, we can use these representatives to visualise or to analyse without losing important information. This paper presents a data reduction technique based on clustering to help analyse very large spatio-temporal data. We also present and discuss preliminary results of this approach.
351Scopus© Citations 10 - PublicationMulti-objective Clustering Algorithm with Parallel GamesData mining and knowledge discovery are two important growing research fields in the last few decades due to the abundance of data collected from various sources. The exponentially growing volumes of generated data urge the development of several mining techniques to feed the needs for automatically derived knowledge. Clustering analysis (finding similar groups of data) is a well-established and widely used approach in data mining and knowledge discovery. In this paper, we introduce a clustering technique that uses game theory models to tackle multi-objective application problems. The main idea is to exploit a specific type of simultaneous move games, called congestion games. Congestion games offer numerous advantages ranging from being succinctly represented to possessing a Nash equilibrium that is reachable in a polynomial-time. The proposed algorithm has three main steps: 1) it starts by identifying the initial players (or the cluster-heads); 2) then, it establishes the initial clusters' composition by constructing the game to play and try to find the equilibrium of the game. The third step consists of merging close clusters to obtain the final clusters. The experiment results show that the proposed clustering approach obtains good results and it is very promising in terms of scalability, and performance.
12Scopus© Citations 1 - PublicationMissing Data Analysis Using Multiple Imputation in Relation to Parkinson's DiseaseMissing data is an omnipresent problem in neurological control diseases, such as Parkinson's Disease. Statistical analyses on the level of Parkinson's Disease may be not accurate, if no adequate method for handling missing data is applied. In order to determine a useful way to treat missing data on Parkinson's stage, we propose a multiple imputation method based on the theory of Copulas in the data pre-processing phase of the data mining process. Our goal to use the theory of Copulas is to estimate the multivariate joint probability distribution without constraints of specific types of marginal distributions of random variables that represent the dimensions of our datasets. To evaluate the proposed approach, we have compared our algorithm with seven state-of-the-art imputation methods such as mean, regression, min, max, K-nearest neighbors, Markov Chain Monte Carlo, Expected Maximization methods, on the basis of six dataset cases containing 5%, 15%, 25%, 35%, 45% and 50% missing data percentages, respectively. The accuracy of each imputation method was evaluated using the Root Mean Square Error (RMSE) formula. Our results indicate that the proposed method outperforms significantly the existing algorithms.
360Scopus© Citations 4