Life History Calendar Data

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Life History Calendar Data

Laneah
I have a data set with 1010 cases, an ID variable, and 175 repeated measures coded (. = missing, 0 = not in condition, 1 =begin/end condition, 2 = ongoing). These come from a life history calendar that is measured annually with each variable representing a month across a 14 year time span. There is missing data for a small and different portion of the sample each year.

Example:

ID m1 m2 m3 m4 m5 m6 m7

1   0   0   0   1   2    2   1
2   2   2   2   2   2    2   2
3   .    .   .    2   1    0   0
4   0   1   1   0   0    0   0
5   1   2   1   0   1    2   1
6   2   1   0   0   0    1   2

I want to generate two types of variables for each "episode" of the condition

1. the month the episode(s) begin
2. the number of months the episode lasts

If it makes this easier, these varibles could all be converted into dummy variables.

If you have suggestions about where to get tutorials on this type of repeated measures that would be great. Also, if a solution is not provided for this problem I would like general advice about whether loops, vectors, or arrays could help me work with this data, and if so, which should I learn first.

Sorry, if this is a repost, I may have accidentially sent part of this message earlier.

Thanks,

Laneah

 
Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Maguin, Eugene
Laneah,

This isn't very hard to do. The key element in the method is to define the
month variables as a vector so that you can bring the loop command into use.
But, before going into that, look over very carefully how I'm modified your
data to see if it is how you want the result to be. I made assumptions that
you might not have made or made differently.

Gene Maguin


>>I have a data set with 1010 cases, an ID variable, and 175 repeated
measures
coded (. = missing, 0 = not in condition, 1 =begin/end condition, 2 =
ongoing). These come from a life history calendar that is measured annually
with each variable representing a month across a 14 year time span. There is
missing data for a small and different portion of the sample each year.

I want to generate two types of variables for each "episode" of the
condition
1. the month the episode(s) begin
2. the number of months the episode lasts

Example:
ID m1  m2  m3  m4  m5  m6  m7  eb1 eb2 ed1 ed2
1   0   0   0   1   2   2   1   4   .   4   .
2   2   2   2   2   2   2   2   .   .   .   . (no start, no stop)
3   .   .   .   2   1   0   0   .   .   .   . (missing data for start)
4   0   1   1   0   0   0   0   2   .   2   .
5   1   2   1   0   1   2   1   1   5   3   3
6   2   1   0   0   0   1   2   6   .   .   . (start of episode 1 undefined,
                                              end of episode 2 undefined)

Eb(i)=episode begins, ed(i)=episode duration

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Laneah
Gene,

Yes, the variables that you have described are what I want, and I would need enough eb(i) and ed(i) pairs to accomodate all the episodes for whomever in the data set as had the largest number of episodes. I do not know what that number is. Also, I can see that the data structure as it is would need to have each episode in this form 1 2 2 2 1 with the "1s" capping the ends of each episode. Thus, if I keep the 1s and 2s I would also need a method to convert instances such as ". 2 2 1" and "0 2 2 1" into "0 1 2 1" and so forth. Perhaps I could recode all 1s and 2s into 1s and not having start/stop information, but only "yes" or "no" for the condition. Would this make things easier?

The only drawback to using only dummies is that I would also like to compute an accumulated sum of time spent in the condition (across all episodes) and it would be nice to sum the 1s and 2s with each count representing a "two-week period". That way a person who starts sometime in one month and ends in the next month "1 1" would have an average accumulation of one month (two, two-week periods) rather than 2 months. By the way, if we keep 1s and 2s what happens to episodes that are in this pattern "1 1" ?

I guess what I am asking is whether it is doable to "clean up" the episodes using the form "1 2... 1" first before creating the "eb(i) and ed(i)" variables using syntax, or whether I should convert to all dummy variables first.

Thanks,

Laneah


Gene Maguin wrote
Laneah,

This isn't very hard to do. The key element in the method is to define the
month variables as a vector so that you can bring the loop command into use.
But, before going into that, look over very carefully how I'm modified your
data to see if it is how you want the result to be. I made assumptions that
you might not have made or made differently.

Gene Maguin


>>I have a data set with 1010 cases, an ID variable, and 175 repeated
measures
coded (. = missing, 0 = not in condition, 1 =begin/end condition, 2 =
ongoing). These come from a life history calendar that is measured annually
with each variable representing a month across a 14 year time span. There is
missing data for a small and different portion of the sample each year.

I want to generate two types of variables for each "episode" of the
condition
1. the month the episode(s) begin
2. the number of months the episode lasts

Example:
ID m1  m2  m3  m4  m5  m6  m7  eb1 eb2 ed1 ed2
1   0   0   0   1   2   2   1   4   .   4   .
2   2   2   2   2   2   2   2   .   .   .   . (no start, no stop)
3   .   .   .   2   1   0   0   .   .   .   . (missing data for start)
4   0   1   1   0   0   0   0   2   .   2   .
5   1   2   1   0   1   2   1   1   5   3   3
6   2   1   0   0   0   1   2   6   .   .   . (start of episode 1 undefined,
                                              end of episode 2 undefined)

Eb(i)=episode begins, ed(i)=episode duration

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Maguin, Eugene
Laneah,

I'm going to respond to your message in parts.

>>Yes, the variables that you have described are what I want, and I would
need
enough eb(i) and ed(i) pairs to accomodate all the episodes for whomever in
the data set as had the largest number of episodes. I do not know what that
number is.

You can get a rough idea by counting the 1s and dividing by two. If there
were not data problems, every sequence would start and stop with a 1 and the
number of 1s divided by two would be the exact number. So that is a tiny
thing you can do to get an idea of your dataset.

>>Also, I can see that the data structure as it is would need to
have each episode in this form 1 2 2 2 1 with the "1s" capping the ends of
each episode. Thus, if I keep the 1s and 2s I would also need a method to
convert instances such as ". 2 2 1" and "0 2 2 1" into "0 1 2 1" and so
forth.

I think you just have to decide what do--with the understanding that you do
something different later. It might be help to enumerate the problem
patterns so that you can decide ahead of time how to handle them in the
syntax. Of course, I know nothing about your project but my list of problem
patterns would include

2 1 (ocurring at the start. Possible?)
2 2 (occuring at the end. Possible?)
1 . 2 or 2 . 1 (anywhere)
0 . 2 or 2 . 0 (anywhere)
2 . . 0 (anywhere)

I'm sure there are quite a few more.

The second thing is to write syntax to find problems. You know there must be
certain sequences and there must not be other sequences. I think the only
allowable pair sequences are 00, 01, 10, 12, 21, and 22. The unallowable
pair sequencies include 1., 2., 02, 20, 11. But, basically, anything other
than the allowed sequences are disallowed.

I think the easiest way to look for disallowed sequences is the following.
The key point is to convert sysmis to 9 and then convert the data vector,
m1-m175, to a string and use the index function. Assume there is a
disallowed sequence present and disprove that. Like this.

Format m1 to m175(f1.0).
Recode m1 to m175(sysmis=9).
String months(a175).
Do repeat m=m1 to m175/i=1 to 175.
+  compute substr(months,i,1)=string(m,f1.0)).
End repeat.

Compute error=0.
Do repeat x='19' '29' '02' '20' '11' '92'.
+  compute found=index(months,x).
+  if (found eq 0) error=error+1.
End repeat.

Frequencies error.

What you do and how you do it after you have found disallowed sequences is
your decision. I don't think I can offer an opinion.

>>Perhaps I could recode all 1s and 2s into 1s and not having
start/stop information, but only "yes" or "no" for the condition. Would this
make things easier?

I don't think that this makes things easier because you still have to fix up
the disallowed sequences.


>>The only drawback to using only dummies is that I would also like to
compute
an accumulated sum of time spent in the condition (across all episodes) and
it would be nice to sum the 1s and 2s with each count representing a
"two-week period". That way a person who starts sometime in one month and
ends in the next month "1 1" would have an average accumulation of one month
(two, two-week periods) rather than 2 months.

I don't understand what you are saying because I thought you said that each
data point represents something that did or did not happen in a given month.

>>By the way, if we keep 1s and
2s what happens to episodes that are in this pattern "1 1" ?

Isn't that an allowable pair sequence? Something started in one month and
ended the next month?


>>I guess what I am asking is whether it is doable to "clean up" the
episodes
using the form "1 2... 1" first before creating the "eb(i) and ed(i)"
variables using syntax, or whether I should convert to all dummy variables
first.

Yes, I think it is doable. It may be a nightmare but I think you have to do
it because that will make the computation of the eb and ed 'easy'.
Otherwise, it will be hellish because you have to deal with exceptions. The
big, important problem is how to create a data transformation sequence that
is auditable. Meaning that you can say, 'I started here, applied these
transformations, and wound up over here'. You have to give yourself a way to
back out of mistakes or changed decisions. One way is to write syntax to
make every transformation. You have a over 175,000 data points and the
transformation syntax could be 500 to 1,000 lines, easily. Alternatively,
you could edit the data by hand and save repeatedly to new datasets.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Laneah
Gene,

Thanks for your help. I think that I see now what I need to do to clean up the data set. Now when it is clean, and assuming that I know the maximum number of epidoes to code for, is the syntax for coding the episodes easy to create? I am assuming that all episodes would be coded with 1s at the beginning and end of each episode and there are no missing data such as in this example:  "0 0 0 1 2 2 1 0 0 1 1 0 0". This also assumes that "0 1 1 0" (started one month ended the next) and "0 1 0" (began and ended in the same month) could be included in the data set.

Laneah


Gene Maguin wrote
Laneah,

I'm going to respond to your message in parts.

>>Yes, the variables that you have described are what I want, and I would
need
enough eb(i) and ed(i) pairs to accomodate all the episodes for whomever in
the data set as had the largest number of episodes. I do not know what that
number is.

You can get a rough idea by counting the 1s and dividing by two. If there
were not data problems, every sequence would start and stop with a 1 and the
number of 1s divided by two would be the exact number. So that is a tiny
thing you can do to get an idea of your dataset.

>>Also, I can see that the data structure as it is would need to
have each episode in this form 1 2 2 2 1 with the "1s" capping the ends of
each episode. Thus, if I keep the 1s and 2s I would also need a method to
convert instances such as ". 2 2 1" and "0 2 2 1" into "0 1 2 1" and so
forth.

I think you just have to decide what do--with the understanding that you do
something different later. It might be help to enumerate the problem
patterns so that you can decide ahead of time how to handle them in the
syntax. Of course, I know nothing about your project but my list of problem
patterns would include

2 1 (ocurring at the start. Possible?)
2 2 (occuring at the end. Possible?)
1 . 2 or 2 . 1 (anywhere)
0 . 2 or 2 . 0 (anywhere)
2 . . 0 (anywhere)

I'm sure there are quite a few more.

The second thing is to write syntax to find problems. You know there must be
certain sequences and there must not be other sequences. I think the only
allowable pair sequences are 00, 01, 10, 12, 21, and 22. The unallowable
pair sequencies include 1., 2., 02, 20, 11. But, basically, anything other
than the allowed sequences are disallowed.

I think the easiest way to look for disallowed sequences is the following.
The key point is to convert sysmis to 9 and then convert the data vector,
m1-m175, to a string and use the index function. Assume there is a
disallowed sequence present and disprove that. Like this.

Format m1 to m175(f1.0).
Recode m1 to m175(sysmis=9).
String months(a175).
Do repeat m=m1 to m175/i=1 to 175.
+  compute substr(months,i,1)=string(m,f1.0)).
End repeat.

Compute error=0.
Do repeat x='19' '29' '02' '20' '11' '92'.
+  compute found=index(months,x).
+  if (found eq 0) error=error+1.
End repeat.

Frequencies error.

What you do and how you do it after you have found disallowed sequences is
your decision. I don't think I can offer an opinion.

>>Perhaps I could recode all 1s and 2s into 1s and not having
start/stop information, but only "yes" or "no" for the condition. Would this
make things easier?

I don't think that this makes things easier because you still have to fix up
the disallowed sequences.


>>The only drawback to using only dummies is that I would also like to
compute
an accumulated sum of time spent in the condition (across all episodes) and
it would be nice to sum the 1s and 2s with each count representing a
"two-week period". That way a person who starts sometime in one month and
ends in the next month "1 1" would have an average accumulation of one month
(two, two-week periods) rather than 2 months.

I don't understand what you are saying because I thought you said that each
data point represents something that did or did not happen in a given month.

>>By the way, if we keep 1s and
2s what happens to episodes that are in this pattern "1 1" ?

Isn't that an allowable pair sequence? Something started in one month and
ended the next month?


>>I guess what I am asking is whether it is doable to "clean up" the
episodes
using the form "1 2... 1" first before creating the "eb(i) and ed(i)"
variables using syntax, or whether I should convert to all dummy variables
first.

Yes, I think it is doable. It may be a nightmare but I think you have to do
it because that will make the computation of the eb and ed 'easy'.
Otherwise, it will be hellish because you have to deal with exceptions. The
big, important problem is how to create a data transformation sequence that
is auditable. Meaning that you can say, 'I started here, applied these
transformations, and wound up over here'. You have to give yourself a way to
back out of mistakes or changed decisions. One way is to write syntax to
make every transformation. You have a over 175,000 data points and the
transformation syntax could be 500 to 1,000 lines, easily. Alternatively,
you could edit the data by hand and save repeatedly to new datasets.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Decent GPL reference, please!

mpirritano
Here's a concrete example of the obtuseness of the GPL reference manual.
I want to simply add the r value to a scatter-plot. I've looked up every
related term I can think of in the index. I often feel that the only way
to get anything from this manual, would be to read it from cover to
cover.

Please, can anyone recommend a reasonable handbook to GPL?!

Thanks
matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Decent GPL reference, please!

ViAnn Beadle
The GPL reference manual doesn't reference anything that GPL, by itself,
cannot do. You can however, use an SPSS chart template to add a fit line
along with a legend containing the R-sqd value. To create your own create a
simple scatterplot using something like GRAPH. Within the chart editor, add
the fit line which automatically creates both line and the legend entry for
that line. Save the chart template, checking off only the fit line and
legend. Specify that template in your GGRAPH syntax. Note that since the
template is adding the line, you don't need to specify an element to create
the line.



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Pirritano, Matthew
Sent: Wednesday, August 12, 2009 3:05 PM
To: [hidden email]
Subject: Decent GPL reference, please!

Here's a concrete example of the obtuseness of the GPL reference manual.
I want to simply add the r value to a scatter-plot. I've looked up every
related term I can think of in the index. I often feel that the only way
to get anything from this manual, would be to read it from cover to
cover.

Please, can anyone recommend a reasonable handbook to GPL?!

Thanks
matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Decent GPL reference, please!

Marta Garcia-Granero
ViAnn Beadle wrote:

> The GPL reference manual doesn't reference anything that GPL, by itself,
> cannot do. You can however, use an SPSS chart template to add a fit line
> along with a legend containing the R-sqd value. To create your own create a
> simple scatterplot using something like GRAPH. Within the chart editor, add
> the fit line which automatically creates both line and the legend entry for
> that line. Save the chart template, checking off only the fit line and
> legend. Specify that template in your GGRAPH syntax. Note that since the
> template is adding the line, you don't need to specify an element to create
> the line.
>
ViAnn:

I think Matthew wanted r, not the fit line with r-square. The
statistical meaning (correlation vs linear regression) is quite
different, and should not be confused (see
http://www.bmj.com/cgi/content/full/315/7105/422 for an excellent
discussion on the topic).

Now, back to Matthew's original question (very interesting form, too):
is there a way to tweak a GPL syntax to add a small box with the
correlation coefficient? GPL doesn't have to calculate it by itself, it
could have been captured (via OMS, for instance) and the pasted inside
the GPL code.

Another question. I downloaded and printed the GPL guide back in SPSS 14
times. Is it worth to download the new manual (PASW 17 dated), print it
and send to recycling the old one?

Regards,
Marta GG

>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Pirritano, Matthew
> Sent: Wednesday, August 12, 2009 3:05 PM
> To: [hidden email]
> Subject: Decent GPL reference, please!
>
> Here's a concrete example of the obtuseness of the GPL reference manual.
> I want to simply add the r value to a scatter-plot. I've looked up every
> related term I can think of in the index. I often feel that the only way
> to get anything from this manual, would be to read it from cover to
> cover.
>
> Please, can anyone recommend a reasonable handbook to GPL?!
>
>

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Decent GPL reference, please!

Marta Garcia-Granero
In reply to this post by mpirritano
Pirritano, Matthew wrote:
> Here's a concrete example of the obtuseness of the GPL reference manual.
> I want to simply add the r value to a scatter-plot. I've looked up every
> related term I can think of in the index. I often feel that the only way
> to get anything from this manual, would be to read it from cover to
> cover.
>

This is the only workaround I have found (add the r value as a
footnote). I think it can be automatic turning it into a macro that
reads the r value, and adds it to the GPL code:

* Sample dataset *.
DATA LIST FREE/Glucagon HDL.
BEGIN DATA
72 1.03 75 1.06 76 0.92 88 1.04 71 0.91
82 0.81 73 0.90 80 0.95 69 1.12 78 1.08
83 1.13 80 1.16 113 1.22 91 1.24 62 1.09
79 1.01 84 1.10 87 1.15 86 1.12 85 1.08
97 1.15 93 1.34 81 1.19 110 1.20 97 1.11
96 1.13 98 1.35 99 1.19 81 1.18 93 1.21
END DATA.

* Compute r and add it manually to the GPL code *.
CORRELATIONS
  /VARIABLES=Glucagon HDL
  /PRINT=TWOTAIL NOSIG.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=HDL Glucagon
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: HDL=col(source(s), name("HDL"))
 DATA: Glucagon=col(source(s), name("Glucagon"))
 GUIDE: axis(dim(1), label("HDL Colesterol (mmol/l)"))
 GUIDE: axis(dim(2), label("Glucagon (pg/ml)"))
 GUIDE: text.footnote(label("r=0.571"))
 ELEMENT: point(position(HDL*Glucagon),color.interior(color.black))
END GPL.

Anyway, instead of a footnote, i would have liked to add it as a text
box inside the graph, or as a legend. Is that possible?

Marta GG


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Decent GPL reference, please!

ViAnn Beadle
You can insert a text box using a chart template. The issue then is how to
create the chart template on the fly. I suspect that probably could be done
via Python. Here's a challenge to our Python experts!

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Thursday, August 13, 2009 1:35 AM
To: [hidden email]
Subject: Re: Decent GPL reference, please!

Pirritano, Matthew wrote:
> Here's a concrete example of the obtuseness of the GPL reference manual.
> I want to simply add the r value to a scatter-plot. I've looked up every
> related term I can think of in the index. I often feel that the only way
> to get anything from this manual, would be to read it from cover to
> cover.
>

This is the only workaround I have found (add the r value as a
footnote). I think it can be automatic turning it into a macro that
reads the r value, and adds it to the GPL code:

* Sample dataset *.
DATA LIST FREE/Glucagon HDL.
BEGIN DATA
72 1.03 75 1.06 76 0.92 88 1.04 71 0.91
82 0.81 73 0.90 80 0.95 69 1.12 78 1.08
83 1.13 80 1.16 113 1.22 91 1.24 62 1.09
79 1.01 84 1.10 87 1.15 86 1.12 85 1.08
97 1.15 93 1.34 81 1.19 110 1.20 97 1.11
96 1.13 98 1.35 99 1.19 81 1.18 93 1.21
END DATA.

* Compute r and add it manually to the GPL code *.
CORRELATIONS
  /VARIABLES=Glucagon HDL
  /PRINT=TWOTAIL NOSIG.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=HDL Glucagon
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: HDL=col(source(s), name("HDL"))
 DATA: Glucagon=col(source(s), name("Glucagon"))
 GUIDE: axis(dim(1), label("HDL Colesterol (mmol/l)"))
 GUIDE: axis(dim(2), label("Glucagon (pg/ml)"))
 GUIDE: text.footnote(label("r=0.571"))
 ELEMENT: point(position(HDL*Glucagon),color.interior(color.black))
END GPL.

Anyway, instead of a footnote, i would have liked to add it as a text
box inside the graph, or as a legend. Is that possible?

Marta GG


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Identify Straight-liners

Gyorgy Bea
In reply to this post by Marta Garcia-Granero
Do you know any efficient methodology to identify straight-liners in a survey? Any documentation on this theme? I think without a serious analyze of the data, excluding straight-liners could turn easily into data manipulation.
What do you think?

Beata




Reply | Threaded
Open this post in threaded view
|

Re: Decent GPL reference, please!

Albert-Jan Roskam
In reply to this post by Marta Garcia-Granero
Hi Marta,

I have virtually no knowledge of GPL, but here's how to dynamically fill in the r-value. It's also possible to dynamically edit the .sgt file (template) but I thought this code would be cleaner & simpler. Moreover, a template means one more file to keep track of.

Probably the 'boxed' r can be achieved in a similar way. The Python code is like macro language, but with easier syntaxis.

Btw, I don;t know what happened to the scale of the axes.

Cheers!!
Albert-Jan


* Sample dataset *.
DATA LIST FREE/Glucagon HDL.
BEGIN DATA
72 1.03 75 1.06 76 0.92 88 1.04 71 0.91
82 0.81 73 0.90 80 0.95 69 1.12 78 1.08
83 1.13 80 1.16 113 1.22 91 1.24 62 1.09
79 1.01 84 1.10 87 1.15 86 1.12 85 1.08
97 1.15 93 1.34 81 1.19 110 1.20 97 1.11
96 1.13 98 1.35 99 1.19 81 1.18 93 1.21
END DATA.


BEGIN PROGRAM.
import spss, spssaux

def make_scatter(x_var, y_var):

        corr = r"""DATASET DECLARE corrs.
OMS /SELECT TABLES /IF COMMANDS = ["Correlations"] SUBTYPES = ["Correlations"]
 /DESTINATION FORMAT = SAV NUMBERED = TableNumber_ OUTFILE = corrs.
CORRELATIONS
  /VARIABLES=%(y_var)s %(x_var)s
  /PRINT=TWOTAIL NOSIG.
OMSEND.
DATASET ACTIVATE corrs.
""" % locals()
        spss.Submit(corr)

        curs = spss.Cursor([spss.GetVariableCount() - 1])
        r = curs.fetchone()[0]
        curs.close()

        gpl = r"""DATASET CLOSE corrs.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=%(x_var)s %(y_var)s
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: %(x_var)s=col(source(s), name("%(x_var)s"))
DATA: %(y_var)s=col(source(s), name("%(y_var)s"))
GUIDE: axis(dim(1), label("%(x_var)s "))
GUIDE: axis(dim(2), label("%(y_var)s "))
GUIDE: text.footnote(label("r=%(r)2.3f"))
ELEMENT: point(position(%(x_var)s*%(y_var)s),color.interior(color.black))
END GPL.
        """ % locals()
        spss.Submit(gpl)

make_scatter(x_var = "HDL", y_var = "glucagon")

END PROGRAM.


--- On Thu, 8/13/09, Marta García-Granero <[hidden email]> wrote:

> From: Marta García-Granero <[hidden email]>
> Subject: Re: [SPSSX-L] Decent GPL reference, please!
> To: [hidden email]
> Date: Thursday, August 13, 2009, 9:34 AM
> Pirritano, Matthew wrote:
> > Here's a concrete example of the obtuseness of the GPL
> reference manual.
> > I want to simply add the r value to a scatter-plot.
> I've looked up every
> > related term I can think of in the index. I often feel
> that the only way
> > to get anything from this manual, would be to read it
> from cover to
> > cover.
> >
>
> This is the only workaround I have found (add the r value
> as a
> footnote). I think it can be automatic turning it into a
> macro that
> reads the r value, and adds it to the GPL code:
>
> * Sample dataset *.
> DATA LIST FREE/Glucagon HDL.
> BEGIN DATA
> 72 1.03 75 1.06 76 0.92 88 1.04 71 0.91
> 82 0.81 73 0.90 80 0.95 69 1.12 78 1.08
> 83 1.13 80 1.16 113 1.22 91 1.24 62 1.09
> 79 1.01 84 1.10 87 1.15 86 1.12 85 1.08
> 97 1.15 93 1.34 81 1.19 110 1.20 97 1.11
> 96 1.13 98 1.35 99 1.19 81 1.18 93 1.21
> END DATA.
>
> * Compute r and add it manually to the GPL code *.
> CORRELATIONS
>  /VARIABLES=Glucagon HDL
>  /PRINT=TWOTAIL NOSIG.
>
> GGRAPH
>  /GRAPHDATASET NAME="graphdataset" VARIABLES=HDL Glucagon
>  /GRAPHSPEC SOURCE=INLINE.
> BEGIN GPL
> SOURCE: s=userSource(id("graphdataset"))
> DATA: HDL=col(source(s), name("HDL"))
> DATA: Glucagon=col(source(s), name("Glucagon"))
> GUIDE: axis(dim(1), label("HDL Colesterol (mmol/l)"))
> GUIDE: axis(dim(2), label("Glucagon (pg/ml)"))
> GUIDE: text.footnote(label("r=0.571"))
> ELEMENT:
> point(position(HDL*Glucagon),color.interior(color.black))
> END GPL.
>
> Anyway, instead of a footnote, i would have liked to add it
> as a text
> box inside the graph, or as a legend. Is that possible?
>
> Marta GG
>
>
> --
> For miscellaneous SPSS related statistical stuff, visit:
> http://gjyp.nl/marta/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identify Straight-liners

Maguin, Eugene
In reply to this post by Gyorgy Bea
Beata,

Whare are 'straight-liners'? I've never heard that phrase before.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identify Straight-liners

statisticsdoc
In reply to this post by Gyorgy Bea
Beata,
 
I have some code to identify subjects whose responses follow a straight line or a zig-zag pattern.  When examining these cases, it is important to consider the possibility that the pattern could be generated according to meaningful content based responding, or whether the answers are inherently contradictory.  You might want to keep the cases, but note whether the response pattern occurs with an unusually high frequency in certain data collection sites (e.g., in certain classrooms, or in sessions that are administered by a specific research assistant).  Unusually high levels of oddly patterned responding might indicate a need to improve aspects of the data collection process in those sites (I am assuming here that the data may be collected from the same sites on future occassions).
 
Best,
 
Stephen Brand
 
 
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Gyorgy Bea
Sent: Thursday, August 13, 2009 12:07 PM
To: [hidden email]
Subject: Identify Straight-liners

Do you know any efficient methodology to identify straight-liners in a survey? Any documentation on this theme? I think without a serious analyze of the data, excluding straight-liners could turn easily into data manipulation.
What do you think?

Beata




Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Maguin, Eugene
In reply to this post by Laneah
Laneah,

The whole enterprise is trickier because of the legitimacy of the 010
sequence. In principle it is solvable. Before giving you some code to do
testing this weekend with an example dataset. In preparation for testing are
the following sequences legitimate?

Beginning at month 1:
2221

Ending at month 175:
12222



In terms of counting episodes, there are three issues: a) '010', b) '221' at
the start, c) '1222' at the end.

Issues b) and c) are not so easily solvable. If issue a) weren't present
then you could simply do this

Count episodes=m1 to m175(1).
*  the value of episode at this point should always be an even number.
Compute episodes=episodes/2.

To bring issue a) in I think you have to do this

Build a 175 character string as described in a previous email (call it
months), then

Compute singles=0.
Loop #i=1 to 173.
+  if (substr(months,#i,3) eq '010') singles=singles+1.
End loop.
*  then this from above.
Count episodes=m1 to m175(1).
Compute episodes=(episodes-singles)/2.


To others, please if you have some ideas, especially if it can be done in
long format, please post them as I am congenitally oriented to a wide format
framework. The question is, I'd guess, somewhat uncommon but not too much
so.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Laneah
Gene,

I think for my purposes I can recode all the 2s in the first and last months as 1s so that I can work around situations b and c. Once I have the information coded into episodes I can use the original begining and end months to adjust the beginning and ending episodes for those cases. I think that the occurances of 010 are fairly rare so I could hold those cases out and hand code them.

Laneah


Gene Maguin wrote
Laneah,

The whole enterprise is trickier because of the legitimacy of the 010
sequence. In principle it is solvable. Before giving you some code to do
testing this weekend with an example dataset. In preparation for testing are
the following sequences legitimate?

Beginning at month 1:
2221

Ending at month 175:
12222



In terms of counting episodes, there are three issues: a) '010', b) '221' at
the start, c) '1222' at the end.

Issues b) and c) are not so easily solvable. If issue a) weren't present
then you could simply do this

Count episodes=m1 to m175(1).
*  the value of episode at this point should always be an even number.
Compute episodes=episodes/2.

To bring issue a) in I think you have to do this

Build a 175 character string as described in a previous email (call it
months), then

Compute singles=0.
Loop #i=1 to 173.
+  if (substr(months,#i,3) eq '010') singles=singles+1.
End loop.
*  then this from above.
Count episodes=m1 to m175(1).
Compute episodes=(episodes-singles)/2.


To others, please if you have some ideas, especially if it can be done in
long format, please post them as I am congenitally oriented to a wide format
framework. The question is, I'd guess, somewhat uncommon but not too much
so.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Identify Straight-liners

Gyorgy Bea
In reply to this post by Maguin, Eugene
Straight-liners are considered those respondents that follow a certain pattern when completing the survey. They always choose the first option, the one from the middle, the last option, or in zig-zag : 1-3-1-3-1-3 etc. I think this tendency mostly occurs when the questionnaire is auto administrated.

Beata


From: Gene Maguin <[hidden email]>
To: [hidden email]
Sent: Thursday, August 13, 2009 11:06:12 PM
Subject: Re: Identify Straight-liners

Beata,

Whare are 'straight-liners'? I've never heard that phrase before.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Identify Straight-liners

Gyorgy Bea
In reply to this post by statisticsdoc
Hi Stephen,

So basically all questions and items needs to be analyzed in the first place in order to see if there can be any pattern identified, but only because a respondent answered "in line" (always the first option), doesn't necessarily mean that his responses aren't "quality" responses.
Identifying respondents who is answering in line is quite straight I think, however, identifying the zig-zag, or other patters can be more challenging. Could you suggest how the code for this should work? Is it possible to cover all possible combination when identifying the pattern? 

Best regards,
Beata



From: Statisticsdoc <[hidden email]>
To: [hidden email]
Sent: Thursday, August 13, 2009 11:58:59 PM
Subject: Re: Identify Straight-liners

Beata,
 
I have some code to identify subjects whose responses follow a straight line or a zig-zag pattern.  When examining these cases, it is important to consider the possibility that the pattern could be generated according to meaningful content based responding, or whether the answers are inherently contradictory.  You might want to keep the cases, but note whether the response pattern occurs with an unusually high frequency in certain data collection sites (e.g., in certain classrooms, or in sessions that are administered by a specific research assistant).  Unusually high levels of oddly patterned responding might indicate a need to improve aspects of the data collection process in those sites (I am assuming here that the data may be collected from the same sites on future occassions).
 
Best,
 
Stephen Brand
 
 
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Gyorgy Bea
Sent: Thursday, August 13, 2009 12:07 PM
To: [hidden email]
Subject: Identify Straight-liners

Do you know any efficient methodology to identify straight-liners in a survey? Any documentation on this theme? I think without a serious analyze of the data, excluding straight-liners could turn easily into data manipulation.
What do you think?

Beata





Reply | Threaded
Open this post in threaded view
|

Re: Life History Calendar Data

Maguin, Eugene
In reply to this post by Laneah
Laneah,

I haven't done any code writing and testing yet but I think I want to try to
accommodate 010 sequences in the main line code. Hand coding with a dataset
the size of yours won't be easy. If I were in your place, I'd want to avoid
as much of it as possible.

As to the 2s at the beginning and end, I am surprised that you'd want to
convert them to 1's. As I understand your data reduction goals, you want to
identify each episode, from which you can get a count of the number of
episodes, and extract the duration of each episode. Converting the 2s to 1s
would, it seems to me, overestimate the number of episodes and underestimate
the duration. I think it better to treat a beginning or ending 2 sequence as
a single episode of unknown (i.e., missing) duration. However, you are the
PI, what do you want?

If you decide that you want to recode beginning and ending 2s to 1s, how
should these two sequences be scored:

A) 221 to 111 or 122 to 111: three one-month episodes or one two-month
episode and one one-month episode?

B) 2221 to 1111 or 1222 to 1111: four one-month episodes or two two-month
episodes?

Gene Maguin


I think for my purposes I can recode all the 2s in the first and last months
as 1s so that I can work around situations b and c. Once I have the
information coded into episodes I can use the original begining and end
months to adjust the beginning and ending episodes for those cases. I think
that the occurances of 010 are fairly rare so I could hold those cases out
and hand code them.

Laneah

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Order don't answer to the final row in custom table

Auberth Hurtado
In reply to this post by ViAnn Beadle
Hi all, I find the answer category (9) at the end of the custom table, for
example:

DATA LIST FREE/P1.
BEGIN DATA
1 1 1 1 9 9 9 2 2 3
END DATA.

VAL LAB P1
1 'E'
2 'R'
3 'B'
9 'DA' .

CTABLES
  /VLABELS VARIABLES=P1 DISPLAY=DEFAULT
  /TABLE P1 [C][COUNT F40.0]
  /CATEGORIES VARIABLES=P1 ORDER=D KEY=COUNT EMPTY=INCLUDE TOTAL=YES
POSITION=AFTER.


The table looks like:

                Count
P1      E       4
        DA      3
        R       2
        B       1
        Total   10

And I want:

                Count
P1      E       4
        R       2
        B       1
        DA      3
        Total   10

how do I do that?, Thanks for your help.

Auberth.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12