SPSSX Discussion

Clustering procedure

Classic

List

Threaded

4 messages Options

MaaikeSmits

Clustering procedure

Hi,

I am performing a hierarchical, followed by a k-meansclusterprocedure. The 10 variabeles I am using as inputvariables for clustering have a lot of missing data. I used multiple imputation to impute the missings, but getting to know the procedure learns me that this procedure is not aimed at providing a completed dataset, but generates multiple datasets for which pooled results are delivered in analysis that support the MI. However, clustering procedures not seem to support the MI.

Does anyone have any idea how to work around this that enables me to use imputed data in the clustering?

Thanks in advance,

Kind regards,
Maaike

Kirill Orlov

Re: Clustering procedure

Why necessarily use MI? Clustering is explorative, not significance testing analysis. SPSS has a decent single-imputation regression and, better, EM method imputations in Missing Value analysis procedure. For quantitative features.

If some of your features to imput are categorical you may use Hot-doc imputation (find two macros for it on http://www.spsstools.net/en/KO-spssmacros).

But most important is to decide whether to do imputations at all. Imputations are forgery, whatever they say. How many missind data you have? You say "a lot". If it is above 20% forget about doing imputations. Analyse just complete cases.

21.10.2015 18:32, MaaikeSmits пишет:

Hi,

I am performing a hierarchical, followed by a k-meansclusterprocedure. The
10 variabeles I am using as inputvariables for clustering have a lot of
missing data. I used multiple imputation to impute the missings, but getting
to know the procedure learns me that this procedure is not aimed at
providing a completed dataset, but generates multiple datasets for which
pooled results are delivered in analysis that support the MI. However,
clustering procedures not seem to support the MI. 

Does anyone have any idea how to work around this that enables me to use
imputed data in the clustering?

Thanks in advance,

Kind regards,
Maaike



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Clustering-procedure-tp5730814.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Kirill Orlov

Re: Clustering procedure

In reply to this post by MaaikeSmits

-->If some of your features to imput are categorical

I meant to say: if the background variables (which the imputed one depends on) are categorical. The variable being imputed can be categorical or quantitative.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: Clustering procedure

In reply to this post by Kirill Orlov

Also - I don't know how other people figure it, but I prefer
imputation when the missing is At Random, and each Missing
is unrelated to the fact of some other item being missing.
That is mainly true when the Missing is accidental, rather than
meaningful in any sense. (And that is why I have never been
enthusiastic about imputation... folks do it far too casually.)

If there is "a lot" of missing, perhaps the first clustering ought
to be done to look at Missing/ Not missing. If there are one or
two big reasons for Missing, that should be sorted out at the start.

--
Rich Ulrich

Date: Wed, 21 Oct 2015 20:56:37 +0300
From: [hidden email]
Subject: Re: Clustering procedure
To: [hidden email]

Why necessarily use MI? Clustering is explorative, not significance testing analysis. SPSS has a decent single-imputation regression and, better, EM method imputations in Missing Value analysis procedure. For quantitative features.

If some of your features to imput are categorical you may use Hot-doc imputation (find two macros for it on http://www.spsstools.net/en/KO-spssmacros).

But most important is to decide whether to do imputations at all. Imputations are forgery, whatever they say. How many missind data you have? You say "a lot". If it is above 20% forget about doing imputations. Analyse just complete cases.

21.10.2015 18:32, MaaikeSmits пишет:

Hi,

I am performing a hierarchical, followed by a k-meansclusterprocedure. The
10 variabeles I am using as inputvariables for clustering have a lot of
missing data. I used multiple imputation to impute the missings, but getting
to know the procedure learns me that this procedure is not aimed at
providing a completed dataset, but generates multiple datasets for which
pooled results are delivered in analysis that support the MI. However,
clustering procedures not seem to support the MI. 

Does anyone have any idea how to work around this that enables me to use
imputed data in the clustering?

Thanks in advance,

Kind regards,
Maaike



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Clustering-procedure-tp5730814.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD