Remove Duplicate value from list of variable

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove Duplicate value from list of variable

GauravSrivastava
Hi All,

I have a data set in which approx 500 cases and 20 variable. I need unique set of value mentioned across variable like 1 to 20. my variable can have max 20 value or less. I need to remove if any duplicate values occur. I am trying to do this using loop. Can any one help me on this.
I am using loop for 20 variable and comparing each across other but it doesn't seems effective. Any input on this will be helpful.

                Var1 var2 var3....... var 20
Case 1 :     2     4     6    6  7  8    9  20

from above case i need to remove one 6 and moving the data left.

Regards,
Guarav
Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

Albert-Jan Roskam
----- Original Message -----

> From: GauravSrivastava <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Wednesday, November 7, 2012 9:19 AM
> Subject: [SPSSX-L] Remove Duplicate value from list of variable
>
> Hi All,
>
> I have a data set in which approx 500 cases and 20 variable. I need unique
> set of value mentioned across variable like 1 to 20. my variable can have
> max 20 value or less. I need to remove if any duplicate values occur. I am
> trying to do this using loop. Can any one help me on this.
> I am using loop for 20 variable and comparing each across other but it
> doesn't seems effective. Any input on this will be helpful.
>
>                 Var1 var2 var3....... var 20
> Case 1 :    2    4    6    6  7  8    9  20
>
> from above case i need to remove one 6 and moving the data left.
>

Hi,

Does this do what you want?

data list free / var1 (f2) var2 (f2) var3 (f2) var4 (f2) var5 (f2) var6 (f2) var7 (f2) var8 (f2).
begin data
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
end data.
file handle fh / name = "%temp%/tmp.sav".
save outfile = fh.

get file = fh.
vector v=var1 to var8.
loop #i=1 to 7.
+ do if ( v(#i) eq v(#i + 1) ).
+ loop #j = #i to 7.
+ compute v(#j) = v(#j + 1).
+ end loop.
+ end if.
end loop.
compute casenum = $casenum.
exe.
varstocases / make vars from var1 to var8 / index = colno (vars).
sort cases by casenum.
compute #x = vars eq lag(vars) and casenum eq lag(casenum).
select if any(0, $casenum-1, #x).
casestovars /id = casenum /index=colno / drop = casenum.
apply dictionary from fh / newvars.

Regards,
Albert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

GauravSrivastava
Hi Albert-Jan,

Thanks for this.
I have noticed that while restructuring data from var to case and again case to var, we will loss those case who hasn't answer anything (blank cases).

Regards,
Gaurav



On Wed, Nov 7, 2012 at 5:15 PM, Albert-Jan Roskam <[hidden email]> wrote:


----- Original Message -----
> From: GauravSrivastava <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Wednesday, November 7, 2012 9:19 AM
> Subject: [SPSSX-L] Remove Duplicate value from list of variable
>
> Hi All,
>
> I have a data set in which approx 500 cases and 20 variable. I need unique
> set of value mentioned across variable like 1 to 20. my variable can have
> max 20 value or less. I need to remove if any duplicate values occur. I am
> trying to do this using loop. Can any one help me on this.
> I am using loop for 20 variable and comparing each across other but it
> doesn't seems effective. Any input on this will be helpful.
>
>                 Var1 var2 var3....... var 20
> Case 1 :    2    4    6    6  7  8    9  20
>
> from above case i need to remove one 6 and moving the data left.
>

Hi,

Does this do what you want?

data list free / var1 (f2) var2 (f2) var3 (f2) var4 (f2) var5 (f2) var6 (f2) var7 (f2) var8 (f2).
begin data
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
end data.
file handle fh / name = "%temp%/tmp.sav".
save outfile = fh.
 
get file = fh.
vector v=var1 to var8.
loop #i=1 to 7.
+ do if ( v(#i) eq v(#i + 1) ).
+ loop #j = #i to 7.
+ compute v(#j) = v(#j + 1).
+ end loop.
+ end if.
end loop.
compute casenum = $casenum.
exe.
varstocases / make vars from var1 to var8 / index = colno (vars).
sort cases by casenum.
compute #x = vars eq lag(vars) and casenum eq lag(casenum).
select if any(0, $casenum-1, #x).
casestovars /id = casenum /index=colno / drop = casenum.
apply dictionary from fh / newvars.
 
Regards,
Albert-Jan

Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

David Marso
Administrator
In reply to this post by GauravSrivastava
Probably a more tweezerly way to do this but I am pre-caffeinated.
----
DATA LIST LIST / var1 TO var8 (8f2).
BEGIN DATA
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
END DATA.
VECTOR old=var1 TO var8.
LOOP #=1 TO 3.
+  LOOP ##=1 TO 8.
+    LOOP ###=##+1 TO 8.
+      DO IF old(###)=old(##).
+        COMPUTE old(###)=OLD(###+1).
+        COMPUTE old(###+1)=$SYSMIS.
+      END IF.
+      DO IF SYSMIS(OLD(###)) AND ### LT 8.
+        COMPUTE OLD(###)=OLD(###+1).
+        COMPUTE OLD(###+1)=$SYSMIS.
+      END IF.
+    END LOOP.
+  END LOOP.
END LOOP.
LIST.


VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

  2    4    6    7    8    9   20    .
  2    4    6    8    9   21    .    .
  4    5    6    7    8    9   24    .
  1    2    3    4    5    6    .    .
  1    .    .    .    .    .    .    .


Number of cases read:  5    Number of cases listed:  5



Number of cases read:  3    Number of cases listed:  3

GauravSrivastava wrote
Hi All,

I have a data set in which approx 500 cases and 20 variable. I need unique set of value mentioned across variable like 1 to 20. my variable can have max 20 value or less. I need to remove if any duplicate values occur. I am trying to do this using loop. Can any one help me on this.
I am using loop for 20 variable and comparing each across other but it doesn't seems effective. Any input on this will be helpful.

                Var1 var2 var3....... var 20
Case 1 :     2     4     6    6  7  8    9  20

from above case i need to remove one 6 and moving the data left.

Regards,
Guarav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

David Marso
Administrator
SCRATCH THAT previous!! ;-((
I'll leave it be for posterity but first impulse is to destroy the evidence ;-)
--
Ah the miracle of coffee!
--
data list LIST / var1 TO var8 (8f2).
begin data
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
end data.
VECTOR old=var1 TO var8 /#new(8).
COMPUTE #=2.
COMPUTE #new(1)=old(1).
LOOP ##=2 TO 8.
+  COMPUTE #FOUND=0.
+  LOOP ###=1 TO #-1.
+    IF #new(###)=old(##) #found=1.
+  END LOOP IF #found.
+  DO IF NOT(#found).
+    COMPUTE #new(#)=old(##).
+    COMPUTE #=#+1.
+  END IF.
END LOOP.
LOOP ##=1 TO #-1.
+  COMPUTE old(##)=#new(##).
END LOOP.
LOOP ##=# TO 8.
+  COMPUTE old(##)=$SYSMIS.
END LOOP.
LIST.

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

  2    4    6    7    8    9   20    .
  2    4    6    8    9   21    .    .
  4    5    6    7    8    9   24    .
  1    2    3    4    5    6    .    .
  1    .    .    .    .    .    .    .
  1    2    3    4    5    6    7    8


Number of cases read:  6    Number of cases listed:  6
David Marso wrote
Probably a more tweezerly way to do this but I am pre-caffeinated.
----
DATA LIST LIST / var1 TO var8 (8f2).
BEGIN DATA
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
END DATA.
VECTOR old=var1 TO var8.
LOOP #=1 TO 3.
+  LOOP ##=1 TO 8.
+    LOOP ###=##+1 TO 8.
+      DO IF old(###)=old(##).
+        COMPUTE old(###)=OLD(###+1).
+        COMPUTE old(###+1)=$SYSMIS.
+      END IF.
+      DO IF SYSMIS(OLD(###)) AND ### LT 8.
+        COMPUTE OLD(###)=OLD(###+1).
+        COMPUTE OLD(###+1)=$SYSMIS.
+      END IF.
+    END LOOP.
+  END LOOP.
END LOOP.
LIST.


VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

  2    4    6    7    8    9   20    .
  2    4    6    8    9   21    .    .
  4    5    6    7    8    9   24    .
  1    2    3    4    5    6    .    .
  1    .    .    .    .    .    .    .


Number of cases read:  5    Number of cases listed:  5



Number of cases read:  3    Number of cases listed:  3

GauravSrivastava wrote
Hi All,

I have a data set in which approx 500 cases and 20 variable. I need unique set of value mentioned across variable like 1 to 20. my variable can have max 20 value or less. I need to remove if any duplicate values occur. I am trying to do this using loop. Can any one help me on this.
I am using loop for 20 variable and comparing each across other but it doesn't seems effective. Any input on this will be helpful.

                Var1 var2 var3....... var 20
Case 1 :     2     4     6    6  7  8    9  20

from above case i need to remove one 6 and moving the data left.

Regards,
Guarav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Remove Duplicate value from list of variable

Valerie Villella
Please note that I am off-site at at workshop and have limited access to my email. I will return to the office on Friday November 9.

Valerie Villella
Education Coordinator &
Policy and Program Analyst
OANHSS
905-851-8821 ext. 228

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

Bruce Weaver
Administrator
In reply to this post by David Marso
Here's another NPR* approach.  Not sure how it compares to David's VECTOR & LOOP method in terms of efficiency, but I *think* it require somewhat less caffeine to follow the logic.  ;-)

* NPR = No Python required


NEW FILE.
DATASET CLOSE all.
DATA LIST LIST / var1 TO var8 (8f2).
BEGIN DATA
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
END DATA.

VARSTOCASES
  /ID=row
  /MAKE var FROM var1 var2 var3 var4 var5 var6 var7 var8
  /INDEX=Index(8)
  /NULL=KEEP.

SORT CASES by row var.
MATCH FILES file = * / BY row var / FIRST = FirstRec.
SELECT IF FirstRec.
EXECUTE.

SORT CASES by row index.
IF row EQ LAG(row) and index NE (LAG(index) + 1) index = LAG(index) + 1.
CASESTOVARS
  /ID=row
  /INDEX=Index
  /SEPARATOR=""
  /DROP=FirstRec.
LIST.

Output:

     row var1 var2 var3 var4 var5 var6 var7 var8
 
       1   2    4    6    7    8    9   20    .
       2   2    4    6    8    9   21    .    .
       3   4    5    6    7    8    9   24    .
       4   1    2    3    4    5    6    .    .
       5   1    .    .    .    .    .    .    .
       6   1    2    3    4    5    6    7    8
 
Number of cases read:  6    Number of cases listed:  6


David Marso wrote
SCRATCH THAT previous!! ;-((
I'll leave it be for posterity but first impulse is to destroy the evidence ;-)
--
Ah the miracle of coffee!
--
data list LIST / var1 TO var8 (8f2).
begin data
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
end data.
VECTOR old=var1 TO var8 /#new(8).
COMPUTE #=2.
COMPUTE #new(1)=old(1).
LOOP ##=2 TO 8.
+  COMPUTE #FOUND=0.
+  LOOP ###=1 TO #-1.
+    IF #new(###)=old(##) #found=1.
+  END LOOP IF #found.
+  DO IF NOT(#found).
+    COMPUTE #new(#)=old(##).
+    COMPUTE #=#+1.
+  END IF.
END LOOP.
LOOP ##=1 TO #-1.
+  COMPUTE old(##)=#new(##).
END LOOP.
LOOP ##=# TO 8.
+  COMPUTE old(##)=$SYSMIS.
END LOOP.
LIST.

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

  2    4    6    7    8    9   20    .
  2    4    6    8    9   21    .    .
  4    5    6    7    8    9   24    .
  1    2    3    4    5    6    .    .
  1    .    .    .    .    .    .    .
  1    2    3    4    5    6    7    8


Number of cases read:  6    Number of cases listed:  6
David Marso wrote
Probably a more tweezerly way to do this but I am pre-caffeinated.
----
DATA LIST LIST / var1 TO var8 (8f2).
BEGIN DATA
2 4 6 6 7 8 9 20
2 4 6 6 8 8 9 21
4 4 5 6 7 8 9 24
1 2 3 3 3 4 5  6
1 1 1 1 1 1 1 1
END DATA.
VECTOR old=var1 TO var8.
LOOP #=1 TO 3.
+  LOOP ##=1 TO 8.
+    LOOP ###=##+1 TO 8.
+      DO IF old(###)=old(##).
+        COMPUTE old(###)=OLD(###+1).
+        COMPUTE old(###+1)=$SYSMIS.
+      END IF.
+      DO IF SYSMIS(OLD(###)) AND ### LT 8.
+        COMPUTE OLD(###)=OLD(###+1).
+        COMPUTE OLD(###+1)=$SYSMIS.
+      END IF.
+    END LOOP.
+  END LOOP.
END LOOP.
LIST.


VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

  2    4    6    7    8    9   20    .
  2    4    6    8    9   21    .    .
  4    5    6    7    8    9   24    .
  1    2    3    4    5    6    .    .
  1    .    .    .    .    .    .    .


Number of cases read:  5    Number of cases listed:  5



Number of cases read:  3    Number of cases listed:  3

GauravSrivastava wrote
Hi All,

I have a data set in which approx 500 cases and 20 variable. I need unique set of value mentioned across variable like 1 to 20. my variable can have max 20 value or less. I need to remove if any duplicate values occur. I am trying to do this using loop. Can any one help me on this.
I am using loop for 20 variable and comparing each across other but it doesn't seems effective. Any input on this will be helpful.

                Var1 var2 var3....... var 20
Case 1 :     2     4     6    6  7  8    9  20

from above case i need to remove one 6 and moving the data left.

Regards,
Guarav
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

Jon K Peck
In reply to this post by GauravSrivastava
Another application of the SPSSINC TRANS extension command.  I just called the variables X1 to X8.  Change as needed.

begin program.
def uniq(*args):
    newlist = []
    for item in args:
        if not item in newlist:
            newlist.append(item)
    return newlist
end program.

spssinc trans result = X1 to X8
/formula "uniq(X1,X2,X3,X4,X5,X6,X7,X8)".



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        GauravSrivastava <[hidden email]>
To:        [hidden email],
Date:        11/07/2012 01:21 AM
Subject:        [SPSSX-L] Remove Duplicate value from list of variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi All,

I have a data set in which approx 500 cases and 20 variable. I need unique
set of value mentioned across variable like 1 to 20. my variable can have
max 20 value or less. I need to remove if any duplicate values occur. I am
trying to do this using loop. Can any one help me on this.
I am using loop for 20 variable and comparing each across other but it
doesn't seems effective. Any input on this will be helpful.

               Var1 var2 var3....... var 20
Case 1 :     2     4     6    6  7  8    9  20

from above case i need to remove one 6 and moving the data left.

Regards,
Guarav



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Remove-Duplicate-value-from-list-of-variable-tp5716071.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Remove Duplicate value from list of variable

GauravSrivastava
In reply to this post by Bruce Weaver
HI All,

Thanks for all your reply. It's always been good to see so many way to do the same thing.

Gaurav