Massive Data Pre-Processing with a Cluster Based Approach