Searching string-variables for numbers separated by comma and recoding into new variables

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching string-variables for numbers separated by comma and recoding into new variables

Claus D. Hansen
Dear list,

I have a question which I hope some of you may be able to help me with.

I have 6 string variables which all contain up to 13 numbers between 1 and
21. Each number is separated by a comma.

What I would like to do is to make a syntax that instead of 6 variables
containing these numbers have a variable for each of the numbers between 1
and 21 coded dichotomously so that if the number is present in any of the 6
variables the variable for each value is coded 1 and if not it is coded as
0.

So I need a syntax able to searches through each of the 6 variables and
recode each of the value into the new variables.

I am afraid I'm not able to make a syntax that shows you how the data is
available but I'll give an example:

V112 e.g contains the following:

Idnr    V112
1       3,4,5,7,8,9
2
3       17,16
4       3,17
5       4
6
7
8       3,4,5,6,16,17
9       9

And I'd like it to look like this:

Idnr    V112            no1     no2     no3     no4     no5     no6     no7
no8.... etc
1       3,4,5,7,8,9     0       0       1       1       1       0       1
1
2                       0       0       0       0       0       0       0
0
3       17,16           0       0       0       0       0       0       0
0
4       3,17            0       0       1       0       0       0       0
0
5       4               0       0       0       1       0       0       0
0

I hope this is sufficient explanation to what I need to do and hope that
someone will be able to help me!

Thank you in advance,

Claus

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Claus D. Hansen, MSc. Sociology
PhD-fellow, Research assistant
Department of Occupational Medicine
Herning Hospital
Gl. Landevej 61, DK-7400 Herning
Tlf.: +45 9927 2994
Email: [hidden email]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: longitudinal comparison

Maguin, Eugene
Svetlana,

This is Svetlana's reply to my posted questions yesterday.

>>The services recorded as yes (a specific service received) or no. I
would like to look at the total number of services (e.g., outpatient
services) received within 1 year per child. There are only 30% of
children in foster care who receive mental health services (any kind). I
am interested in cohort difference over time. Cohort by time
interactions would be interesting too. If I had only two years (i.e.,
two cohorts) then the comparison would be easy but I have five.

Ok, you have a count of services received during the previous year as the
DV. That variable has a moderately high proportion of 0's. So certainly not
normally distributed. I would say that your choices for modeling your DV are
either ordinal or poisson.

Prior to spss 15, you'd have to use either Mixed or GLM, which would treat
the DV as a normally distributed variable. Within spss 15 or later, I
believe that Genlin is the procedure to use. I have not used it and can not
advise you on the details of it. Although there is a repeated measures
logistic example, I am not so sure that Genlin can handle repeated measures
of an ordinal variable. Try it and see. Perhaps listmembers who have used
Genlin will offer their experience (on the list please so I can learn
something also). If you don't get better help, I'd suggest writing to spss
tech support ([hidden email]) and present the problem as a question about
how to use Genlin. Dave Nichols is the senior statistician and is very
helpful.

Outside of spss, there are more options. I'd use Mplus. But, I suspect that
more can be done in Stata and SAS

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: longitudinal comparison

Reutter, Alex
In v16, the ordinal multinomial distribution was added to Genlin.  In the dialogs, look at Analyze > Generalized Linear Models > Generalized Estimating Equations.

Cheers,
Alex


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Tuesday, August 05, 2008 8:52 AM
To: [hidden email]
Subject: Re: longitudinal comparison

Svetlana,

This is Svetlana's reply to my posted questions yesterday.

>>The services recorded as yes (a specific service received) or no. I
would like to look at the total number of services (e.g., outpatient
services) received within 1 year per child. There are only 30% of
children in foster care who receive mental health services (any kind). I
am interested in cohort difference over time. Cohort by time
interactions would be interesting too. If I had only two years (i.e.,
two cohorts) then the comparison would be easy but I have five.

Ok, you have a count of services received during the previous year as the
DV. That variable has a moderately high proportion of 0's. So certainly not
normally distributed. I would say that your choices for modeling your DV are
either ordinal or poisson.

Prior to spss 15, you'd have to use either Mixed or GLM, which would treat
the DV as a normally distributed variable. Within spss 15 or later, I
believe that Genlin is the procedure to use. I have not used it and can not
advise you on the details of it. Although there is a repeated measures
logistic example, I am not so sure that Genlin can handle repeated measures
of an ordinal variable. Try it and see. Perhaps listmembers who have used
Genlin will offer their experience (on the list please so I can learn
something also). If you don't get better help, I'd suggest writing to spss
tech support ([hidden email]) and present the problem as a question about
how to use Genlin. Dave Nichols is the senior statistician and is very
helpful.

Outside of spss, there are more options. I'd use Mplus. But, I suspect that
more can be done in Stata and SAS

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Searching string-variables for numbers separated by comma and recoding into new variables

Maguin, Eugene
In reply to this post by Claus D. Hansen
Claus,

>>I have 6 string variables which all contain up to 13 numbers between 1 and
21. Each number is separated by a comma.

>>What I would like to do is to make a syntax that instead of 6 variables
containing these numbers have a variable for each of the numbers between 1
and 21 coded dichotomously so that if the number is present in any of the 6
variables the variable for each value is coded 1 and if not it is coded as
0.

>>So I need a syntax able to searches through each of the 6 variables and
recode each of the value into the new variables.


First, combine your six string vars into one much longer one using Concat.
Suppose the six vars are s1 to s6(a40). AND, there are no embedded blanks or
blank vars! Every var must have a number in it! Then,

*  combine vars into one long var.
String sall(a250).
Compute sall=concat(rtrim(s1),',',rtrim(s2),',',rtrim(s3),',',rtrim(s4),',',
   rtrim(s5),',',rtrim(s6)).

Compute len=index(sall,'  ')-1. /* sall has to be long enough to have two
blanks at end.
Vector numbers(21).
*  locate commas, extract numbers.
Compute #last=1.
Loop #i=1 to len.
Do if (substr(sall,#i,1) eq ',').
+  compute value=number(substr(sall,#last,(#i-#last)),f2.0).
+  compute numbers(value)=1.
+  compute #last=#i+1.
End if.
End loop.
*  extract last number.
compute value=number(substr(sall,#last,(len-#last+1)),f2.0).
compute numbers(value)=1.
Execute.


*  if you have troubles let me know.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD