delete duplicate brand names in the concatenated variable

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

delete duplicate brand names in the concatenated variable

jagadishpchary
There are 4 variables data i.e. from Var1 to Var4 in SPSS file. From these variables I am trying to concatenate and creating a new variable with name “Brand_Main”. Below is the SPSS code.

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
if char.length(rtrim(!X)) ne 0 and char.length(valuelabels(var)) gt 0 !X = concat(rtrim(!X),';',valuelabels(var)).
if char.length(rtrim(!X)) = 0 !X = valuelabels(var).
end repeat.
!ENDDEFINE.
MRMAC START = Var1 / END = Var4 / X = Brand_Main.

However, when I do so I could observe that there are few brands that are identical across the cases and I would like to delete those duplicate brand names in the concatenated variable i.e. “Brand_Main” and keep only the non-duplicate onces. so finally data should be in the same way as variable “Brand_Final” – (which I have manually created it by deleting the duplicates). Please help in getting the syntax to do the mentioned task.


Sample_data.sav
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Andy W
Here is a python solution.

******************************.
SPSSINC GETURI DATA
URI="http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav"
FILETYPE=SAV DATASET=Samp.

BEGIN PROGRAM Python.
#unique function adapted via http://www.peterbe.com/plog/uniqifiers-benchmark (f5)
def UniqueList(x,str):
   seen = {}
   result = []
   for item in x:
       if item in seen: continue
       seen[item] = 1
       if item.strip() != '':
         result.append(item)
   return str.join(result)
END PROGRAM.

*Now use SPSSINC TRANS to call the function.
SPSSINC TRANS RESULT=Brand_Final2 TYPE=20
  /VARIABLES Var1 TO Var4
  /FORMULA "UniqueList(x=[<>],str=''';''')".
******************************.

The function would be a bit simpler if the original ordering of the variables need not be preserved (so one can use sets in Python to get the unique elements).

A native SPSS solution might involve VARSTOCASES to eliminate duplicates. Another way would be to iterate over the list in a set of nested loops, something like below (untested):

VECTOR V = Var1 TO Var4.
LOOP #i = 1 TO 3.
  LOOP #j = (#i + 1) TO 4.
    IF V(#i) = V(#j) V(#j) = "".
  END LOOP.
END LOOP.

But to accommodate this via a macro one would either need to change how the parameters are entered via the macro, or compute a bunch of other superfluous stuff.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Mario Giesel
* Hi, 
  something like this should work
GL,
  Mario.

STRING brandlist (A90).
COMPUTE brandlist = "".
IF (CHAR.INDEX(brandlist,var1) EQ 0) brandlist = CONCAT(brandlist,var1,";").
IF (CHAR.INDEX(brandlist,var2) EQ 0) brandlist = CONCAT(brandlist,var2,";").
IF (CHAR.INDEX(brandlist,var3) EQ 0) brandlist = CONCAT(brandlist,var3,";").
IF (CHAR.INDEX(brandlist,var4) EQ 0) brandlist = CONCAT(brandlist,var4,";").
EXECUTE.



Andy W <[hidden email]> schrieb am 15:10 Dienstag, 23.Juni 2015:


Here is a python solution.

******************************.
SPSSINC GETURI DATA
URI="http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav"
FILETYPE=SAV DATASET=Samp.

BEGIN PROGRAM Python.
#unique function adapted via
http://www.peterbe.com/plog/uniqifiers-benchmark (f5)
def UniqueList(x,str):
  seen = {}
  result = []
  for item in x:
      if item in seen: continue
      seen[item] = 1
      if item.strip() != '':
        result.append(item)
  return str.join(result)
END PROGRAM.

*Now use SPSSINC TRANS to call the function.
SPSSINC TRANS RESULT=Brand_Final2 TYPE=20
  /VARIABLES Var1 TO Var4
  /FORMULA "UniqueList(x=[<>],str=''';''')".
******************************.

The function would be a bit simpler if the original ordering of the
variables need not be preserved (so one can use sets in Python to get the
unique elements).

A native SPSS solution might involve VARSTOCASES to eliminate duplicates.
Another way would be to iterate over the list in a set of nested loops,
something like below (untested):

VECTOR V = Var1 TO Var4.
LOOP #i = 1 TO 3.
  LOOP #j = (#i + 1) TO 4.
    IF V(#i) = V(#j) V(#j) = "".
  END LOOP.
END LOOP.

But to accommodate this via a macro one would either need to change how the
parameters are entered via the macro, or compute a bunch of other
superfluous stuff.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898p5729901.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Mario Giesel
Munich, Germany
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Bruce Weaver
Administrator
Here's a more scalable variation on Mario's theme.

STRING brandlist (A90).
DO REPEAT v = var1 to var4.
- DO IF LENGTH(v) GT 0. /* Needed to avoid warnings in output.
-   IF CHAR.INDEX(brandlist,v) EQ 0 brandlist = CONCAT(brandlist,v,";").
- END IF.
END REPEAT.
* Remove the final semi-colon.
COMPUTE brandlist = RTRIM(brandlist,";").
LIST brand_final brandlist.

OUTPUT:
Brand_Final          brandlist 

Brand3;Brand2        Brand3;Brand2 
Brand2;Brand3        Brand2;Brand3 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2;Brand4 Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2Brand4  Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand2;Brand3        Brand2;Brand3 
Brand2               Brand2

Mario Giesel wrote
* Hi,   something like this should workGL,  Mario.

STRING brandlist (A90).
COMPUTE brandlist = "".
IF (CHAR.INDEX(brandlist,var1) EQ 0) brandlist = CONCAT(brandlist,var1,";").
IF (CHAR.INDEX(brandlist,var2) EQ 0) brandlist = CONCAT(brandlist,var2,";").
IF (CHAR.INDEX(brandlist,var3) EQ 0) brandlist = CONCAT(brandlist,var3,";").
IF (CHAR.INDEX(brandlist,var4) EQ 0) brandlist = CONCAT(brandlist,var4,";").
EXECUTE.
 


     Andy W <[hidden email]> schrieb am 15:10 Dienstag, 23.Juni 2015:
   

 Here is a python solution.

******************************.
SPSSINC GETURI DATA
URI="http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav"
FILETYPE=SAV DATASET=Samp.

BEGIN PROGRAM Python.
#unique function adapted via
http://www.peterbe.com/plog/uniqifiers-benchmark (f5)
def UniqueList(x,str):
  seen = {}
  result = []
  for item in x:
      if item in seen: continue
      seen[item] = 1
      if item.strip() != '':
        result.append(item)
  return str.join(result)
END PROGRAM.

*Now use SPSSINC TRANS to call the function.
SPSSINC TRANS RESULT=Brand_Final2 TYPE=20
  /VARIABLES Var1 TO Var4
  /FORMULA "UniqueList(x=[<>],str=''';''')".
******************************.

The function would be a bit simpler if the original ordering of the
variables need not be preserved (so one can use sets in Python to get the
unique elements).

A native SPSS solution might involve VARSTOCASES to eliminate duplicates.
Another way would be to iterate over the list in a set of nested loops,
something like below (untested):

VECTOR V = Var1 TO Var4.
LOOP #i = 1 TO 3.
  LOOP #j = (#i + 1) TO 4.
    IF V(#i) = V(#j) V(#j) = "".
  END LOOP.
END LOOP.

But to accommodate this via a macro one would either need to change how the
parameters are entered via the macro, or compute a bunch of other
superfluous stuff.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898p5729901.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Art Kendall
If the data for var1 to var4 were the result of manual data entry, you might want to first check for variations in spelling, casing, and spacing.
Autorecode ... /group.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Art Kendall
Also, If I Recall Correctly , you would need numeric values for MULT RESPONSE.

Is there any substantive meaning to the order of responses or to multiple mentions of a brand?
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

David Marso
Administrator
In reply to this post by Bruce Weaver
What if the original data are
brand11 brand20 brand1 brand2 ;-)

brand1 and brand2 will be discarded by the INDEX query.

Might want to protect that with
IF INDEX(brandlist,CONCAT(v,';')) EQ 0 brandlist=CONCAT(brandlist(v,';')

Bruce Weaver wrote
Here's a more scalable variation on Mario's theme.

STRING brandlist (A90).
DO REPEAT v = var1 to var4.
- DO IF LENGTH(v) GT 0. /* Needed to avoid warnings in output.
-   IF CHAR.INDEX(brandlist,v) EQ 0 brandlist = CONCAT(brandlist,v,";").
- END IF.
END REPEAT.
* Remove the final semi-colon.
COMPUTE brandlist = RTRIM(brandlist,";").
LIST brand_final brandlist.

OUTPUT:
Brand_Final          brandlist 

Brand3;Brand2        Brand3;Brand2 
Brand2;Brand3        Brand2;Brand3 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2;Brand4 Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2Brand4  Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand2;Brand3        Brand2;Brand3 
Brand2               Brand2

Mario Giesel wrote
* Hi,   something like this should workGL,  Mario.

STRING brandlist (A90).
COMPUTE brandlist = "".
IF (CHAR.INDEX(brandlist,var1) EQ 0) brandlist = CONCAT(brandlist,var1,";").
IF (CHAR.INDEX(brandlist,var2) EQ 0) brandlist = CONCAT(brandlist,var2,";").
IF (CHAR.INDEX(brandlist,var3) EQ 0) brandlist = CONCAT(brandlist,var3,";").
IF (CHAR.INDEX(brandlist,var4) EQ 0) brandlist = CONCAT(brandlist,var4,";").
EXECUTE.
 


     Andy W <[hidden email]> schrieb am 15:10 Dienstag, 23.Juni 2015:
   

 Here is a python solution.

******************************.
SPSSINC GETURI DATA
URI="http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav"
FILETYPE=SAV DATASET=Samp.

BEGIN PROGRAM Python.
#unique function adapted via
http://www.peterbe.com/plog/uniqifiers-benchmark (f5)
def UniqueList(x,str):
  seen = {}
  result = []
  for item in x:
      if item in seen: continue
      seen[item] = 1
      if item.strip() != '':
        result.append(item)
  return str.join(result)
END PROGRAM.

*Now use SPSSINC TRANS to call the function.
SPSSINC TRANS RESULT=Brand_Final2 TYPE=20
  /VARIABLES Var1 TO Var4
  /FORMULA "UniqueList(x=[<>],str=''';''')".
******************************.

The function would be a bit simpler if the original ordering of the
variables need not be preserved (so one can use sets in Python to get the
unique elements).

A native SPSS solution might involve VARSTOCASES to eliminate duplicates.
Another way would be to iterate over the list in a set of nested loops,
something like below (untested):

VECTOR V = Var1 TO Var4.
LOOP #i = 1 TO 3.
  LOOP #j = (#i + 1) TO 4.
    IF V(#i) = V(#j) V(#j) = "".
  END LOOP.
END LOOP.

But to accommodate this via a macro one would either need to change how the
parameters are entered via the macro, or compute a bunch of other
superfluous stuff.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898p5729901.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Bruce Weaver
Administrator
Well spotted, and nice fix.


David Marso wrote
What if the original data are
brand11 brand20 brand1 brand2 ;-)

brand1 and brand2 will be discarded by the INDEX query.

Might want to protect that with
IF INDEX(brandlist,CONCAT(v,';')) EQ 0 brandlist=CONCAT(brandlist(v,';')

Bruce Weaver wrote
Here's a more scalable variation on Mario's theme.

STRING brandlist (A90).
DO REPEAT v = var1 to var4.
- DO IF LENGTH(v) GT 0. /* Needed to avoid warnings in output.
-   IF CHAR.INDEX(brandlist,v) EQ 0 brandlist = CONCAT(brandlist,v,";").
- END IF.
END REPEAT.
* Remove the final semi-colon.
COMPUTE brandlist = RTRIM(brandlist,";").
LIST brand_final brandlist.

OUTPUT:
Brand_Final          brandlist 

Brand3;Brand2        Brand3;Brand2 
Brand2;Brand3        Brand2;Brand3 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2;Brand4 Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand1;Brand3        Brand1;Brand3 
Brand1;Brand2Brand4  Brand1;Brand2;Brand4 
Brand1;Brand2;Brand3 Brand1;Brand2;Brand3 
Brand2;Brand3        Brand2;Brand3 
Brand2               Brand2

Mario Giesel wrote
* Hi,   something like this should workGL,  Mario.

STRING brandlist (A90).
COMPUTE brandlist = "".
IF (CHAR.INDEX(brandlist,var1) EQ 0) brandlist = CONCAT(brandlist,var1,";").
IF (CHAR.INDEX(brandlist,var2) EQ 0) brandlist = CONCAT(brandlist,var2,";").
IF (CHAR.INDEX(brandlist,var3) EQ 0) brandlist = CONCAT(brandlist,var3,";").
IF (CHAR.INDEX(brandlist,var4) EQ 0) brandlist = CONCAT(brandlist,var4,";").
EXECUTE.
 


     Andy W <[hidden email]> schrieb am 15:10 Dienstag, 23.Juni 2015:
   

 Here is a python solution.

******************************.
SPSSINC GETURI DATA
URI="http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav"
FILETYPE=SAV DATASET=Samp.

BEGIN PROGRAM Python.
#unique function adapted via
http://www.peterbe.com/plog/uniqifiers-benchmark (f5)
def UniqueList(x,str):
  seen = {}
  result = []
  for item in x:
      if item in seen: continue
      seen[item] = 1
      if item.strip() != '':
        result.append(item)
  return str.join(result)
END PROGRAM.

*Now use SPSSINC TRANS to call the function.
SPSSINC TRANS RESULT=Brand_Final2 TYPE=20
  /VARIABLES Var1 TO Var4
  /FORMULA "UniqueList(x=[<>],str=''';''')".
******************************.

The function would be a bit simpler if the original ordering of the
variables need not be preserved (so one can use sets in Python to get the
unique elements).

A native SPSS solution might involve VARSTOCASES to eliminate duplicates.
Another way would be to iterate over the list in a set of nested loops,
something like below (untested):

VECTOR V = Var1 TO Var4.
LOOP #i = 1 TO 3.
  LOOP #j = (#i + 1) TO 4.
    IF V(#i) = V(#j) V(#j) = "".
  END LOOP.
END LOOP.

But to accommodate this via a macro one would either need to change how the
parameters are entered via the macro, or compute a bunch of other
superfluous stuff.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898p5729901.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Jon K Peck
In reply to this post by jagadishpchary
Here is a very simple Python solution.  x, y, z, and w are the variables.  type should be the string size for the result (20 in my example).  Duplicates are deleted, and the values are concatened with a blank separator.

begin program.
def f(*args):
    return " ".join(set(args))
end program.

spssinc trans result= brand_main type=20
/formula "f(x,y,z,w)".


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        jagadishpchary <[hidden email]>
To:        [hidden email]
Date:        06/23/2015 04:59 AM
Subject:        [SPSSX-L] delete duplicate brand names in the concatenated variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




There are 4 variables data i.e. from Var1 to Var4 in SPSS file. From these
variables I am trying to concatenate and creating a new variable with name
“Brand_Main”. Below is the SPSS code.

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1)
/nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
if char.length(rtrim(!X)) ne 0 and char.length(valuelabels(var)) gt 0 !X =
concat(rtrim(!X),';',valuelabels(var)).
if char.length(rtrim(!X)) = 0 !X = valuelabels(var).
end repeat.
!ENDDEFINE.
MRMAC START = Var1 / END = Var4 / X = Brand_Main.

However, when I do so I could observe that there are few brands that are
identical across the cases and I would like to delete those duplicate brand
names in the concatenated variable i.e. “Brand_Main” and keep only the
non-duplicate onces. so finally data should be in the same way as variable
“Brand_Final” – (which I have manually created it by deleting the
duplicates). Please help in getting the syntax to do the mentioned task.

<
http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Untitled.png>
Sample_data.sav
<
http://spssx-discussion.1045642.n5.nabble.com/file/n5729898/Sample_data.sav>  



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

jagadishpchary
In reply to this post by jagadishpchary
Thank you all for the wonderful code. However, in my datafile there are some instances like the variables are in String and sometimes in Numeric. So can i have a common code which works on both types.

Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

David Marso
Administrator
Please see ALTER TYPE command in the FM.
Next time it might be a good idea to mention this?

On Thu, Jun 25, 2015 at 5:42 AM, jagadishpchary [via SPSSX Discussion] <[hidden email]> wrote:
Thank you all for the wonderful code. However, in my datafile there are some instances like the variables are in String and sometimes in Numeric. So can i have a common code which works on both types.




To unsubscribe from delete duplicate brand names in the concatenated variable, click here.
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Andy W
In reply to this post by Jon K Peck
Yes, that was what I had in mind when I said using "set" could be simpler. Two things though (that are why my original function was more complicated):

 - set does not maintain the original ordering
 - set does not eliminate empty strings

In this example data set always places the empty strings at the end (although I don't know if that will be consistent behavior). So following that function with RTRIM would work, as long as the ordering of the other brands does not matter.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Art Kendall
In reply to this post by jagadishpchary
For consistency of approach between sets of string variables and sets of numeric variables, consider using AUTORECODE on the string sets
collapsing autorecoded categories if necessary

then on both kinds of sets
eliminating duplicate entries
creating the final string variable by concatenating value labels  using the valuelabel function.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Jon K Peck
In reply to this post by Andy W
If some of the input variables are actually blank, then one blank value would be included in the output.  If that is a possibility, the blank can be easily eliminated by changing this


begin program.

def f(*args):

    return " ".join(set(args))

end program.


spssinc trans result= brand_main type=20

/formula "f(x,y,z,w)".


to this


begin program.

def f(*args):

    return " ".join(set([s for s in args if not s.strip() == ""]))

end program.


spssinc trans result= brand_main type=20

/formula "f(x,y,z,w)".



Andy,
I missed your earlier Python post.



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email]
Date:        06/25/2015 06:32 AM
Subject:        Re: [SPSSX-L] delete duplicate brand names in the concatenated variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Yes, that was what I had in mind when I said using "set" could be simpler.
Two things though (that are why my original function was more complicated):

- set does not maintain the original ordering
- set does not eliminate empty strings

In this example data set always places the empty strings at the end
(although I don't know if that will be consistent behavior). So following
that function with RTRIM would work, as long as the ordering of the other
brands does not matter.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/delete-duplicate-brand-names-in-the-concatenated-variable-tp5729898p5729942.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

jagadishpchary
In reply to this post by jagadishpchary
First of all, I would thank you all for providing the solutions for my post. However, I have considered the SPSS code which “Bruce Weaver” has written.

As per my earlier post, the database which I have considered was “Sample data.sav” in that - all the variables were in string and the provided code is working in the same way as per the plan.
Now, I have a new task which I need to get the same results when run on numeric variables. So I modified the code accordingly (Below is the code)

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
IF INDEX(!X,CONCAT(valuelabels(var),';')) EQ 0 !X = CONCAT(!X,valuelabels(var),";").
end repeat.
COMPUTE !X = RTRIM(!X,";").
!ENDDEFINE.
MRMAC START = Var1 / END = Var4 / X = Brand _Main.
EXECUTE.

The problem which I am facing is: while concatenating, when the 1st variable is empty the delimiter i.e. “;” is shown as it is – which should be removed. Hence the results should resembl in the same way as “brand_final” (which I have removed the “;” manually) (Attached is the pic).

As I am new to SPSS coding - I would request to kindly check my code and let me know the necessary changes to be made in order to get the data as shown in variable “Brand_Final”.  

FYI – Attached the SPSS data file.

Sample_data_numeric.sav
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

David Marso
Administrator
Insert
DO IF var NE "".
<code>
END IF.
In the appropriate place.
Leaving you scratching your head to discern exactly where to place it after consulting the FM.

jagadishpchary wrote
First of all, I would thank you all for providing the solutions for my post. However, I have considered the SPSS code which “Bruce Weaver” has written.

As per my earlier post, the database which I have considered was “Sample data.sav” in that - all the variables were in string and the provided code is working in the same way as per the plan.
Now, I have a new task which I need to get the same results when run on numeric variables. So I modified the code accordingly (Below is the code)

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
IF INDEX(!X,CONCAT(valuelabels(var),';')) EQ 0 !X = CONCAT(!X,valuelabels(var),";").
end repeat.
COMPUTE !X = RTRIM(!X,";").
!ENDDEFINE.
MRMAC START = Var1 / END = Var4 / X = Brand _Main.
EXECUTE.

The problem which I am facing is: while concatenating, when the 1st variable is empty the delimiter i.e. “;” is shown as it is – which should be removed. Hence the results should resembl in the same way as “brand_final” (which I have removed the “;” manually) (Attached is the pic).

As I am new to SPSS coding - I would request to kindly check my code and let me know the necessary changes to be made in order to get the data as shown in variable “Brand_Final”.  

FYI – Attached the SPSS data file.

Sample_data_numeric.sav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

Bruce Weaver
Administrator
Alternatively, you could do this:

COMPUTE !X = LTRIM(RTRIM(!X,";"),";").

...instead of this:

COMPUTE !X = RTRIM(!X,";").

Also, your macro call has an unwanted space in the middle of variable name Brand_Main.  I.e., you have Brand _Main, but want Brand_Main.  

Finally, unless you have reasons for wanting to retain the A2500 formatting for Brand_Main, you might consider inserting this line at the end of your macro:

ALTER TYPE !X (AMIN).


HTH.

David Marso wrote
Insert
DO IF var NE "".
<code>
END IF.
In the appropriate place.
Leaving you scratching your head to discern exactly where to place it after consulting the FM.

jagadishpchary wrote
First of all, I would thank you all for providing the solutions for my post. However, I have considered the SPSS code which “Bruce Weaver” has written.

As per my earlier post, the database which I have considered was “Sample data.sav” in that - all the variables were in string and the provided code is working in the same way as per the plan.
Now, I have a new task which I need to get the same results when run on numeric variables. So I modified the code accordingly (Below is the code)

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
IF INDEX(!X,CONCAT(valuelabels(var),';')) EQ 0 !X = CONCAT(!X,valuelabels(var),";").
end repeat.
COMPUTE !X = RTRIM(!X,";").
!ENDDEFINE.
MRMAC START = Var1 / END = Var4 / X = Brand _Main.
EXECUTE.

The problem which I am facing is: while concatenating, when the 1st variable is empty the delimiter i.e. “;” is shown as it is – which should be removed. Hence the results should resembl in the same way as “brand_final” (which I have removed the “;” manually) (Attached is the pic).

As I am new to SPSS coding - I would request to kindly check my code and let me know the necessary changes to be made in order to get the data as shown in variable “Brand_Final”.  

FYI – Attached the SPSS data file.

Sample_data_numeric.sav
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

jagadishpchary
In reply to this post by David Marso
Hi David:

I am having a problem while concatenating the variables. I have 5 variables (attached the sav file named data.sav) when I combine them the last variable data is not displayed in the final concatenated variable. Hence I have changed the order (attached the sav file named data1.sav)- and when I combined them, the concatenated variable has all the data. Could you please check the code and let me know where I am going wrong (the below code was earlier suggested by you).  

Here is the SPSS code which I am using to merge variables

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
IF INDEX(!X,CONCAT(valuelabels(var),';')) EQ 0  and char.length(valuelabels(var)) gt 0 !X = CONCAT(!X,valuelabels(var),";").
end repeat.
COMPUTE !X = RTRIM(!X,";").
!ENDDEFINE.

MRMAC START = AGE / END = SOCLAS / X = TEST.
EXECUTE.

MRMAC START = SOCLAS / END = LOCAT/ X = TEST.
EXECUTE.
Data.sav
Data1.sav
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

David Marso
Administrator
Please don't expect me to exert extra labor to download your files!!!
I did try one and it won't open.  System believes it to be an htm file.
Paste some small sample data between data list/begin data/end data along with a listing.
If you can't replicate with that then there's something fishy in your actual data.
As a general rule, don't expect anyone to do extra work when you could make it self contained in a post.
---

jagadishpchary wrote
Hi David:

I am having a problem while concatenating the variables. I have 5 variables (attached the sav file named data.sav) when I combine them the last variable data is not displayed in the final concatenated variable. Hence I have changed the order (attached the sav file named data1.sav)- and when I combined them, the concatenated variable has all the data. Could you please check the code and let me know where I am going wrong (the below code was earlier suggested by you).  

Here is the SPSS code which I am using to merge variables

DEFINE MRMAC (START = !CHAREND("/") / END = !CHAREND("/") / X = !TOKENS(1) /nbval=!TOKENS(1)).
string !X(a2500).
do repeat var = !START to !END.
IF INDEX(!X,CONCAT(valuelabels(var),';')) EQ 0  and char.length(valuelabels(var)) gt 0 !X = CONCAT(!X,valuelabels(var),";").
end repeat.
COMPUTE !X = RTRIM(!X,";").
!ENDDEFINE.

MRMAC START = AGE / END = SOCLAS / X = TEST.
EXECUTE.

MRMAC START = SOCLAS / END = LOCAT/ X = TEST.
EXECUTE.
Data.sav
Data1.sav
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: delete duplicate brand names in the concatenated variable

jagadishpchary
In reply to this post by jagadishpchary
Hi David:

Here is the syntax

DATA LIST LIST
 / AGE SEX JOBR LOCAT SOCLAS.
BEGIN DATA.
1,2,1,1,2
5,2,3,2,1
3,2,2,2,2
5,2,4,2,2
5,1,3,2,2
END DATA.

LIST.

VALUE LABELS AGE
1 "18-24"
2 "25-34"
3 "35-44"
4 "45-54"
5 "55-64".

VALUE LABELS SEX
1 "MALE"
2 "FEMALE".

VALUE LABELS JOBR
1 "FULL TIME EDUCATION"
2 "PART TIME"
3 "RETIRED"
4 "NOT WORKING".

VALUE LABELS LOCAT
1 "RURAL"
2 "URBAN".

VALUE LABELS SOCLAS
1 "D"
2 "E".
12