|
Should missing value imputation and outlier treatment be done prior to splitting data into training and validation data sets? Suppose, i have split my data into training and validation data. I have done median imputation for missing values and capped data at 1 and 99th percentile in training data set. While imputing missing data and outlier treatment in validation data set, should i use the same median and capping value that were calculated in training data. Would it be fine if i calculate the median and percentile scores according to validation data set? In future, the same process will hold for a new data set in which we do scoring? I know it's not a SPSS question. As many analytics professionals are active in this forum, i thought i would get an answer if i post my question here :-)
Thanks in anticipation!
|