How to remove duplicate (repeated) character in a variable

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to remove duplicate (repeated) character in a variable

albert_sun
Hi,

I am asking for help in dealing with the following situations,

the first one is where I got a variable X with AABC, CCDE, DFFEE, and I want to transform to variable Y with ABC, CDE and DEF.

the second one is where I got a variable X with "ABC,BCD, CDE, BCD", "CDE, DFE, CDE", "DFE, EFT, DFE" and I want to transform to variable Y with "ABC,BCD,CDE", "CDE, DFE", "DFE, EFT"

Is there a way to do these in SPSS syntax?

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: How to remove duplicate (repeated) character in a variable

David Marso
Administrator

1. See INDEX and REPLACE functions for part 1
2. See this thread http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tt5714037.html to parse the latter set into a VECTOR and then LOOP over the results and null out any duplicates.
No SPSS on this box, so give it a try and report back.

albert_sun wrote
Hi,

I am asking for help in dealing with the following situations,

the first one is where I got a variable X with AABC, CCDE, DFFEE, and I want to transform to variable Y with ABC, CDE and DEF.

the second one is where I got a variable X with "ABC,BCD, CDE, BCD", "CDE, DFE, CDE", "DFE, EFT, DFE" and I want to transform to variable Y with "ABC,BCD,CDE", "CDE, DFE", "DFE, EFT"

Is there a way to do these in SPSS syntax?

Thanks,
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: How to remove duplicate (repeated) character in a variable

David Marso
Administrator

Untested:

STRING y (A100).
COMPUTE y=x.
DO REPEAT char="AA" "BB" "CC" "DD" "EE" "FF" "GG" /rep="A" "B" "C" "D" "E" "F" "G".
IF (INDEX(y,char) GT 0) y=REPLACE(y,char,rep).
END REPEAT.

STRING #ch1 #ch2 (A1).
LOOP #=1 TO LENGTH(y)-1.
+  COMPUTE #ch1=CHAR.SUBSTR(y,#,1).
+  COMPUTE #ch2=CHAR.SUBSTR(y,#+1,1).
+  IF (#ch1 GT #ch2) y=REPLACE(y,CONCAT(#ch1,#ch2),CONCAT(#ch2,#ch1)).
END LOOP.

David Marso wrote
1. See INDEX and REPLACE functions for part 1
2. See this thread http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tt5714037.html to parse the latter set into a VECTOR and then LOOP over the results and null out any duplicates.
No SPSS on this box, so give it a try and report back.

albert_sun wrote
Hi,

I am asking for help in dealing with the following situations,

the first one is where I got a variable X with AABC, CCDE, DFFEE, and I want to transform to variable Y with ABC, CDE and DEF.

the second one is where I got a variable X with "ABC,BCD, CDE, BCD", "CDE, DFE, CDE", "DFE, EFT, DFE" and I want to transform to variable Y with "ABC,BCD,CDE", "CDE, DFE", "DFE, EFT"

Is there a way to do these in SPSS syntax?

Thanks,
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: How to remove duplicate (repeated) character in a variable

Jignesh Sutar
In reply to this post by albert_sun
Though you've only given examples of consecutive characters being the same and needing removing. I've included another 4th example which has duplicate characters but further down in the string so you'll have to tweak as per your requirement of what you need exactly to cover all scenarios.



data list free / x1 (a99) x2 (a99).
begin data.
AABC "ABC,BCD, CDE, BCD"
CCDE "CDE, DFE, CDE"
DFFEE "DFE, EFT, DFE"
ABCABC "ABC,BCD, ABC, BCD"
end data.


begin program.
def ddupe1(var):
    return ''.join(sorted(set(var)-set(" ")))
end program.
begin program.

def ddupe2(var):
    return ", ".join(sorted(set( [ str(i).strip() for i in var.split(",")))]  )))
end program.

spssinc trans result=Y1 type=99 /formula ddupe1(var=x1).
spssinc trans result=Y2 type=99 /formula ddupe2(var=x2).
Reply | Threaded
Open this post in threaded view
|

Re: How to remove duplicate (repeated) character in a variable

David Marso
Administrator
In reply to this post by David Marso
Part 2:

STRING y (A100).
COMPUTE y=x.
DO REPEAT char="AA" "BB" "CC" "DD" "EE" "FF" "GG" /rep="A" "B" "C" "D" "E" "F" "G".
IF (INDEX(y,char) GT 0) y=REPLACE(y,char,rep).
END REPEAT.

STRING #ch1 #ch2 (A1).
LOOP #=1 TO LENGTH(y)-1.
+  COMPUTE #ch1=CHAR.SUBSTR(y,#,1).
+  COMPUTE #ch2=CHAR.SUBSTR(y,#+1,1).
+  IF (#ch1 GT #ch2) y=REPLACE(y,CONCAT(#ch1,#ch2),CONCAT(#ch2,#ch1)).
END LOOP.

David Marso wrote
Untested:

STRING y (A100).
COMPUTE y=x.
DO REPEAT char="AA" "BB" "CC" "DD" "EE" "FF" "GG" /rep="A" "B" "C" "D" "E" "F" "G".
IF (INDEX(y,char) GT 0) y=REPLACE(y,char,rep).
END REPEAT.

STRING #ch1 #ch2 (A1).
LOOP #=1 TO LENGTH(y)-1.
+  COMPUTE #ch1=CHAR.SUBSTR(y,#,1).
+  COMPUTE #ch2=CHAR.SUBSTR(y,#+1,1).
+  IF (#ch1 GT #ch2) y=REPLACE(y,CONCAT(#ch1,#ch2),CONCAT(#ch2,#ch1)).
END LOOP.

David Marso wrote
1. See INDEX and REPLACE functions for part 1
2. See this thread http://spssx-discussion.1045642.n5.nabble.com/Parsing-String-Name-Variable-tt5714037.html to parse the latter set into a VECTOR and then LOOP over the results and null out any duplicates.
No SPSS on this box, so give it a try and report back.

albert_sun wrote
Hi,

I am asking for help in dealing with the following situations,

the first one is where I got a variable X with AABC, CCDE, DFFEE, and I want to transform to variable Y with ABC, CDE and DEF.

the second one is where I got a variable X with "ABC,BCD, CDE, BCD", "CDE, DFE, CDE", "DFE, EFT, DFE" and I want to transform to variable Y with "ABC,BCD,CDE", "CDE, DFE", "DFE, EFT"

Is there a way to do these in SPSS syntax?

Thanks,
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"