Hello,
I have recently encountered a problem regarding duplication in SPSS. The problem is that I want to find percentage of duplication in 100+ variables. For example, if there are only 80 variables which are duplicate in 10 cases then the percentage of duplication would be 80%. I could not find any way to do this in SPSS. If someone could help I would really appreciate it. |
Where does the 80% come from? If you can pre-specify the group of variables you want to check before hand, you can use the menu dialog data->Identify Duplicates.
That could be an interesting problem though to identify near duplicates among a larger set of variables though. |
Administrator
|
In reply to this post by muh.hassan
You are going to have to provide an example of what you are talking about.
A simple dummy data set with a precise definition of what you mean by duplication.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by muh.hassan
I agree completely with both Andy and David. More information is needed to really understand what you need.
This probably isn't going to do what you want unless what you need to do is what this will do. So. Suppose id v1 to v100. Across v1 to v100 some variables may have the same value for a given case. Some cases have no variables with the same value. There may be a case where all (v1 to v100) have the same value. I suggest: Varstocases followed by aggregate breaking on id and value and computing the nu function. (I now assume you are interested in maximum number of variables having the same value for each case.) Aggregate (again) breaking on id and compute the max function. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of muh.hassan Sent: Wednesday, April 20, 2016 8:01 AM To: [hidden email] Subject: Finding percentage of duplication variables Hello, I have recently encountered a problem regarding duplication in SPSS. The problem is that I want to find percentage of duplication in 100+ variables. For example, if there are only 80 variables which are duplicate in 10 cases then the percentage of duplication would be 80%. I could not find any way to do this in SPSS. If someone could help I would really appreciate it. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-percentage-of-duplication-variables-tp5731966.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
"This probably isn't going to do what you want unless what you need to do is what this will do. "
I need to add that to my sig ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Well spotted, David! But let's not be too hard on Gene. IIRC, he's at U of Buffalo, so he has probably just been listening to too many political speeches recently.* ;-)
* For the benefit of those who don't follow US politics, or who are reading this in the archives, the Democratic & Republican primaries for NY State happened yesterday. RESULTS: http://www.nytimes.com/elections/results/new-york http://www.bbc.com/news/election-us-2016-36084957
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Maguin, Eugene
Sorry everyone for being too hasty in typing my first post. Let me just elaborate my problem.
Suppose we have var v1 to v50, for each case every variable value is same except for 5 variables lets say v26 to v30. Case : 1 Duplication for just one variable v1 whose value is same for each case Duplicate case = 49, Percent = 98 Primary case = 1, Percent = 2 Case : 2 Duplication for the five variables v26 to v30 whose values are unique for each case Primary case = 50, Percent =100 Case : 3 Duplication for all the 50 variables v1 to v50 which gives Primary case = 50 , Percent = 100 The problem lies in Case # 3. The duplication exists for 45 variables out of 50 but the tool did not display that information. I need to get the information in percentage for all the 50 variables i.e in this case Duplication case = 44, Percent = 88 Primary case = 5, Percent = 10 |
Administrator
|
Please review the responses to your query and address each of them. I for one requested a simple sample data example and the results from such. What are we supposed to do with this second post? It is useless for answering your question. Pretend we are NOT looking over your shoulder and reading your mind! You need to 'elaborate' on what you mean by duplicate and provide an example of inputs and desired outputs. It seems your second post has been far too hasty as well. 'but the tool did not display that information.' What tool?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Whatsamatta U, David? Is your ESPss on the fritz again? ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by muh.hassan
Please elaborate on your question.
What is the context? What is a case? Are you looking for: -- duplicate cases? -- pattern responding to a test/questionnaire? etc.
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Bruce Weaver
More than likely. More likely is that I'm too lazy to help people who won't help themselves by providing clear examples of what they need to sort out ;-)
--
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by muh.hassan
I think you are going to have to overcome your shyness, and tell us where
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
the data come from. Depending on how informative that is, you might have to be explicit and say, also, What do 50 variables measure? (Are they all the same thing?) What does a line of data represent? And, see below - > Date: Thu, 21 Apr 2016 01:02:35 -0700 > From: [hidden email] > Subject: Re: Finding percentage of duplication variables > To: [hidden email] > > Sorry everyone for being too hasty in typing my first post. Let me just > elaborate my problem. > Suppose we have var v1 to v50, for each case every variable value is same > except for 5 variables lets say v26 to v30. Does "case", here, mean "line", or is it a reference to the "Case: 1", etc., listed below? If there are 10 lines does that mean: - there may well be 450 values that are all the same; - there may well be 45 values on each line that are the same; - there may well be duplication between lines, so that v1 shows one value, v2 shows another, etc. > > Case : 1 Does this merely mean "example 1", as I expect? I will treat it as such. > Duplication for just one variable v1 whose value is same for each case > Duplicate case = 49, Percent = 98 > Primary case = 1, Percent = 2 Why is "Duplicate case = 49"? With "Percent = 98", this seems to be taken from a total of 50. Does 50 also represent the number of lines? Should this have been "duplicate cases for v1", followed similarly for "Primary cases" (i.e., unique values) for v1? > > Case : 2 > Duplication for the five variables v26 to v30 whose values are unique for > each case > Primary case = 50, Percent =100 Well, if v26-v30 each have unique values for the whole dataset, each line has a unique value and there would be 50 "Primary cases" for them. I don't understand where the "duplication for the five variables" comes in, unless you are saying that on each line, all the other values can be matched to one of {v26 to v30}. > > Case : 3 > Duplication for all the 50 variables v1 to v50 which gives > Primary case = 50 , Percent = 100 "Duplication for all the 50 variables" seems to contradict the overall specification, that v26 to v30 are not "the same". And it seems to imply that 50-lines-times-50-variables gives 2500 values that are the same, so there would be only 1 (unique) primary case. > > The problem lies in Case # 3. The duplication exists for 45 variables out of > 50 but the tool did not display that information. I need to get the > information in percentage for all the 50 variables i.e in this case > > Duplication case = 44, Percent = 88 > Primary case = 5, Percent = 10 > Well, I find this totally obscure. Where does 44 come from, and 5, and why do they /not/ add up to 50 lines? -- Rich Ulrich |
Administrator
|
I'll bet your brain really hurts now Rich ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |