Has anybody already worked this out?
CaseIDs are supposed to be Unique.
Identify Duplicate Cases shows that is not so.
a variable "Match Sequence" has been added to the file.
There are dozens of variables, strings of varying widths mixed with numbers
and dates.
I would like to see which variables are NOT the same.
For the instance of* pairs* of almost duplicates something like this
DO IF MatchSequence GE 2.
PRINT /CaseID.
DO REPEAT MyVar= varlist.
DO IF MyVar NE Lag(MyVar).
PRINT /VarName MyVar Lag(MyVar).
END IF.
END REPEAT.
END IF.
Obviously PRINT would not output VarName or Lag(MyVar).
It is also not possible in syntax to put Lag(Var) into a new variable
because of the different types and string widths.
Another way to think of it is to pull sets of cases that are a group in
Identify Duplicate Cases as a matrix with as many rows as group members and
columns for all the variables.
Then transpose that matrix keeping type and string width formatting and
indicate where there are differences.
-----
Art Kendall
Social Research Consultants
--
Sent from:
http://spssx-discussion.1045642.n5.nabble.com/
spssx-discussion.1045642.n5.nabble.com
SPSSX Discussion forum and mailing list archive.
|
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD