On 9/25/06, Asma <
[hidden email]> wrote:
>
> Hi,
>
> I want to develop a General Linear Model in SPSS with one set of data
> (training data) and test the model with another set of data (testing
> data).
> Can someone help me out with this.
>
There are several possible strategies you can use. One possibility is to
develop your model using ALL the available data. Then, to validate it,
partition your dataset into training and testing subsamples (randomly assign
50% of observations into each category). Now, run the model using only the
training data. Save the parameters, then estimate the dependent variable
for the testing data. After doing this, you can see the success of your
model on real data. (Note: for your testing data, set the dependent
variable to missing, and make sure you specify an option in your procedure
to save the predicted values).
Some would argue that you should develop the model using only the testing
data, but I think it's arbitrary and doesn't break any rules to use all of
it in the beginning. If you have time-series data, one way of validating is
to partition based on a certain cutoff time. This way, your training data
consists of data observed before the cutoff date, and your testing data is
everything observed after that date. Again, the choice of cutoff points is
arbitrary. The whole point of a validation exercise is to test the model
using some data that the model hasn't seen yet.