Hello everyone,
I am using SPSS to explore clusterings for my data. I wish to apply K-means and try using Bayesian Information Criterion (BIC) and/or Akaike Information Criterion (AIC) to determine the number of clusters. Is there any way to make SPSS to calculate BIC and AIC for my data? Two-Step clustering has the options and can be made to display the values. However, I do not understand fully what the two-step clustering does, so I dont know if I can just take the values it gives and use the bic/aic minimum (ie the optima) for the number of clusters in k-means clustering. So my question here is, can I take the values of BIC and AIC given in two-step clustering and use them in k-means? Or does the two-step analysis process the data in some ways before applying BIC/AIC (the first step in two steps whatever that may be)? Or is there some other way to calculate BIC/AIC in a more direct way using SPSS? Are the calculations same regardles of clustering technique even? I can try writing some scripts/whatever if needed.
As a related note, would anyone have a pointer to a tutorial or similar explanation of BIC/AIC for dummies? I'm not a statistician so all the explanations seem a bit confusing to me. Wikipedia has simple formulas listed [1], but I am still left wondering about a few points. Do I just calculate it with RSS/n for L or do I need to make my own L based on whatever (what does SPSS use?)? Do I apply it to the dataset with just varying the number of parameters (and is this equal to the number of clusters in this case?) or do I need to create clusters before applying it? I could just implement it myself if i understood the exact calculations.. As you see, I am a most confused dummy :)
Thanks for any help,
[1]
http://en.wikipedia.org/wiki/Bayesian_information_criterion