SPSSX Discussion

Recoding for more than one result

Classic

List

Threaded

47 messages Options

123

Art Kendall

Re: mixed models - time as DV

This worked okay on my machine.
I guess then that these are durations (intervals) rather specific times?
data list list/thetime(time12.2). begin data 31:22:33 24:00:00 23:59:55 42:10:10 end data. compute secs = thetime. formats secs (f16). compute back = secs. formats back (time12.2). execute. list.ArtOn 6/22/2011 3:29 PM, Parise, Carol A. wrote:

Thanks Art. I don't have the data in front of me today but i will check this out.

Do you know if are differences when time is over 24:00:00 versus under 24:00:00? when working with hh:mm:ss,  in the past, i recall having some issues with this. some of these "tasks" took more than a day.

________________________________
From: Art Kendall [[hidden email]]
Sent: Wednesday, June 22, 2011 5:39 AM
To: Parise, Carol A.
Cc: [hidden email]
Subject: Re: [SPSSX-L] mixed models - time as DV

What format did you use to put the DV in?
try this syntax.  It appears that time12.2 format works on my machine.
data list list/thetime(time12.2).
begin data
1:22:33
24:00:00
23:59:55
12:10:10
end data.
compute secs = thetime.
formats secs (f16).
compute back = secs.
formats back (time12.2).
execute.
list.

I then saved that file and opened a new data file.
I copied secs into the new data file.
I changed the type to date and used the drop down list to get a time format.

Also, the only time I would expect the intercept to be exactly equal to the grand mean would be when all predictors were zero like with z-scores.

Art Kendall
Social Research Consultants

On 6/21/2011 6:37 PM, Parise, Carol A. wrote:
Art,

This appears to work if I use the format: dd-mm-yyyy hh:mm:ss. Any other format and the numbers are not logical. But, the date that showis up is kind of crazy: 16-MAR-1590. I'm a bit concerned that this may not be accurate. Although the time is in the ballpark, I would think that the overall mean i get using descriptives should be just about the same as what i get from this model and it's off by a good amount and it's not.

Thank you!
Carol

________________________________
From: Art Kendall [[hidden email]]
Sent: Tuesday, June 21, 2011 2:58 PM
To: Parise, Carol A.
Cc: [hidden email][hidden email]
Subject: Re: [SPSSX-L] mixed models - time as DV

try this as a workaround.
Time is in seconds since the start of the Gregorian calendar.
While you have the .spv file open, open a new data file <file><new><data>
Switch to the the output file. highlight and copy the intercept.
Switch to the data file. Paste the intercept.
Click the variables view tab.
change the format to time.

Art Kendall
Social Research Consultants

On 6/21/2011 5:40 PM, Parise, Carol A. wrote:

Hi all,

This one of probably many questions i will likely be posting on using linear mixed models over the next few months. It's my first crack at using this and i'm slowly working through the lingo by reading as much as I can. I read the SPSS technical report on this and i found an example from nice little article that i am using to mess with my data. http://www.indiana.edu/~statmath/stat/all/hlm/hlm.pdf<http://www.indiana.edu/%7Estatmath/stat/all/hlm/hlm.pdf>

The article explains the interpretation of the intercept term in the empty model is equivalent to the overall math achievement score.

My data have time in hh:mm:ss as the DV.

In my sample analysis, the scale of the intercept is not in hh:mm:ss and i can't seem to adjust to to be in this format with the "cell properties" when i click on the output.

It would be really helpful to have the correct scale versus just the p-value for interpretation purposes. Anyone have thoughts on how I can do this?

I am running version 14.0 and we won't be upgrading anytime soon.

Thanks much.
Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email][hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

parisec

Re: mixed models - time as DV

great! hard to believe but these are actual times. it's kind of a crazy 'task'.

________________________________
From: Art Kendall [[hidden email]]
Sent: Wednesday, June 22, 2011 12:39 PM
To: Parise, Carol A.; SPSSX-L post
Subject: Re: [SPSSX-L] mixed models - time as DV

This worked okay on my machine.
I guess then that these are durations (intervals) rather specific times?

data list list/thetime(time12.2).
begin data
31:22:33
24:00:00
23:59:55
42:10:10
end data.
compute secs = thetime.
formats secs (f16).
compute back = secs.
formats back (time12.2).
execute.
list.

Art

On 6/22/2011 3:29 PM, Parise, Carol A. wrote:

Thanks Art. I don't have the data in front of me today but i will check this out.

Do you know if are differences when time is over 24:00:00 versus under 24:00:00? when working with hh:mm:ss, in the past, i recall having some issues with this. some of these "tasks" took more than a day.

________________________________
From: Art Kendall [[hidden email]<mailto:[hidden email]>]
Sent: Wednesday, June 22, 2011 5:39 AM
To: Parise, Carol A.
Cc: [hidden email]<mailto:[hidden email]>
Subject: Re: [SPSSX-L] mixed models - time as DV

What format did you use to put the DV in?
try this syntax. It appears that time12.2 format works on my machine.
data list list/thetime(time12.2).
begin data
1:22:33
24:00:00
23:59:55
12:10:10
end data.
compute secs = thetime.
formats secs (f16).
compute back = secs.
formats back (time12.2).
execute.
list.

I then saved that file and opened a new data file.
I copied secs into the new data file.
I changed the type to date and used the drop down list to get a time format.

Also, the only time I would expect the intercept to be exactly equal to the grand mean would be when all predictors were zero like with z-scores.

Art Kendall
Social Research Consultants

On 6/21/2011 6:37 PM, Parise, Carol A. wrote:
Art,

This appears to work if I use the format: dd-mm-yyyy hh:mm:ss. Any other format and the numbers are not logical. But, the date that showis up is kind of crazy: 16-MAR-1590. I'm a bit concerned that this may not be accurate. Although the time is in the ballpark, I would think that the overall mean i get using descriptives should be just about the same as what i get from this model and it's off by a good amount and it's not.

Thank you!
Carol

________________________________
From: Art Kendall [mailto:[hidden email]]
Sent: Tuesday, June 21, 2011 2:58 PM
To: Parise, Carol A.
Cc: [hidden email]<mailto:[hidden email]><mailto:[hidden email]><mailto:[hidden email]>
Subject: Re: [SPSSX-L] mixed models - time as DV

try this as a workaround.
Time is in seconds since the start of the Gregorian calendar.
While you have the .spv file open, open a new data file <file><new><data>
Switch to the the output file. highlight and copy the intercept.
Switch to the data file. Paste the intercept.
Click the variables view tab.
change the format to time.

Art Kendall
Social Research Consultants

On 6/21/2011 5:40 PM, Parise, Carol A. wrote:

Hi all,

This one of probably many questions i will likely be posting on using linear mixed models over the next few months. It's my first crack at using this and i'm slowly working through the lingo by reading as much as I can. I read the SPSS technical report on this and i found an example from nice little article that i am using to mess with my data. http://www.indiana.edu/~statmath/stat/all/hlm/hlm.pdf<http://www.indiana.edu/%7Estatmath/stat/all/hlm/hlm.pdf><http://www.indiana.edu/%7Estatmath/stat/all/hlm/hlm.pdf>

The article explains the interpretation of the intercept term in the empty model is equivalent to the overall math achievement score.

My data have time in hh:mm:ss as the DV.

In my sample analysis, the scale of the intercept is not in hh:mm:ss and i can't seem to adjust to to be in this format with the "cell properties" when i click on the output.

It would be really helpful to have the correct scale versus just the p-value for interpretation purposes. Anyone have thoughts on how I can do this?

I am running version 14.0 and we won't be upgrading anytime soon.

Thanks much.
Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]<mailto:[hidden email]><mailto:[hidden email]><mailto:[hidden email]> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

John F Hall

Re: Recoding for more than one result

In reply to this post by David Marso

David

I already tried a modified version of my own, but it still didn't match
yours. It's late over here, so I'll do your new one tomorrow. I've
generated a data set with variables from both your and my versions. All
versions are internally consistent when listing, but the values,
distributions and profiles are not the same between your HASH and my PROFILE
variables.

temp .
Select if <var> = <value> .
List var serial <var> d1 to d22 .

It's worth persevering with this as Judy says they will be doing a lot more
analyses with other data, so we might as well get it right. However I think
the main problem is the design of the data capture method with all those
strings, so I may well knock up a transfer instrument based on the current
data.

John

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: 22 June 2011 21:21
To: [hidden email]
Subject: Re: Recoding for more than one result

"I need time to check why our versions give different results, so I'll get
back later."...

Hi John,
Upon inspection it appears you are omitting the first value of the
autorecoded variables (missing).
This will create an 'off by one' discrepancy.
Try this variation which does not process the first category.
HTH, David
---
GET FILE blah blah blah..../ DROP num1 TO num11.
AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
COMPUTE HASH=0.
NUMERIC D1 TO D21.
VECTOR D=D1 TO D21 /X=num1 TO num11.
LOOP #=1 TO 11.
+ DO IF X(#) NE 1.
+ COMPUTE D(X(#-1))=1.
** This is not necessary, but if you insist ;-)**.
+ COMPUTE HASH=SUM(HASH,2**X(#-1)).
+ END IF.
END LOOP.

RECODE D1 TO D21 (symsis=0).
COMPUTE NRESP=SUM(D1 TO D21).
AGGREGATE OUTFILE *
/ BREAK D1 TO D21
/ Npattern=N
/ NResp=MAX(NRESP)
/ HashPatt=MAX(HASH).
SORT CASES BY NRESP(A) NPattern (D).
LIST.

John F Hall wrote:

>
> David
>
> Went back to scratch and ran your new syntax. It worked, but your
> combination variable HASH has a different distribution from my variable
> PATTERN. Part of the problem is that the data were originally entered as
> huge strings of multiple diagnoses, which I think have been spread out
> manually as 22 pairs of variables, the first with a short official code
> and
> the second with a description. It would have been easier if the
> description
> had been used as a variable label (and a lot fewer keystrokes!). When I
> was
> working and a client came in with data like this, I always designed a
> transfer sheet so that data could be coded and entered as numeric. Quite
> possibly this data set came from some official source records (? Data
> base).
> There wasn't even a serial number for each case.
>
> Your syntax is much shorter than mine and I need to investigate VECTOR
> further as it reminds me of the days when I wrote survey programs in ALGOL
> at Salford Univ in the 1960s. I think I asked for "dynamic variable
> creation" to be available when I organised the 1974 international
> conference
> at LSE to plan for a new release SPSS.
>
> I need time to check why our versions give different results, so I'll get
> back later.
>
> There was no confidential data in my post, just a list of diagnoses
> produced
> by your syntax.
>
>
> John
>
> [hidden email]
> www.surveyresearch.weebly.com
>
>

<SNIP>
-----

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Recoding-for-more-than-one-res
ult-tp4477253p4515124.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Recoding for more than one result

Administrator

John,
My Hash was beginning at 2 rather than 1, hence the discrepancy.
David
--
GET FILE blah blah blah..../ DROP num1 TO num11.
AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
COMPUTE HASH=0.
NUMERIC D2 TO D22.
VECTOR D=D2 TO D22 /X=num1 TO num11.
LOOP #=1 TO 11.
+ DO IF X(#) NE 1.
+ COMPUTE D(X(#-1))=1.
** This is not necessary, but if you insist ;-)**.
* Was previously 2**X(#-1) *.
+ COMPUTE HASH=SUM(HASH,2**X(#-2)).
+ END IF.
END LOOP.

RECODE D2 TO D22 (symsis=0).
COMPUTE NRESP=SUM(D2 TO D22).
AGGREGATE OUTFILE *
/ BREAK D2 TO D22
/ Npattern=N
/ NResp=MAX(NRESP)
/ HashPatt=MAX(HASH).
SORT CASES BY NRESP(A) NPattern (D).
LIST.

John F Hall wrote

David

I already tried a modified version of my own, but it still didn't match
yours. It's late over here, so I'll do your new one tomorrow. I've
generated a data set with variables from both your and my versions. All
versions are internally consistent when listing, but the values,
distributions and profiles are not the same between your HASH and my PROFILE
variables.

temp .
Select if <var> = <value> .
List var serial <var> d1 to d22 .

It's worth persevering with this as Judy says they will be doing a lot more
analyses with other data, so we might as well get it right. However I think
the main problem is the design of the data capture method with all those
strings, so I may well knock up a transfer instrument based on the current
data.

John

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: 22 June 2011 21:21
To: [hidden email]
Subject: Re: Recoding for more than one result

"I need time to check why our versions give different results, so I'll get
back later."...

Hi John,
Upon inspection it appears you are omitting the first value of the
autorecoded variables (missing).
This will create an 'off by one' discrepancy.
Try this variation which does not process the first category.
HTH, David
---
GET FILE blah blah blah..../ DROP num1 TO num11.
AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
COMPUTE HASH=0.
NUMERIC D1 TO D21.
VECTOR D=D1 TO D21 /X=num1 TO num11.
LOOP #=1 TO 11.
+ DO IF X(#) NE 1.
+ COMPUTE D(X(#-1))=1.
** This is not necessary, but if you insist ;-)**.
+ COMPUTE HASH=SUM(HASH,2**X(#-1)).
+ END IF.
END LOOP.

RECODE D1 TO D21 (symsis=0).
COMPUTE NRESP=SUM(D1 TO D21).
AGGREGATE OUTFILE *
/ BREAK D1 TO D21
/ Npattern=N
/ NResp=MAX(NRESP)
/ HashPatt=MAX(HASH).
SORT CASES BY NRESP(A) NPattern (D).
LIST.

John F Hall wrote:
>
> David
>
> Went back to scratch and ran your new syntax. It worked, but your
> combination variable HASH has a different distribution from my variable
> PATTERN. Part of the problem is that the data were originally entered as

<SNIP>
--

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

John F Hall

Re: Recoding for more than one result

David

I've spent quite some time trying to reconcile the differences. The 22 (or
21) dummy binary and intermediate **2 variables created are the same for
both methods, but the total diagnoses differ. I think this may be because
the same code can appear more than once in the 11 source variables, which
explains why you get 7 but I get 11 for the one case with 11 diagnoses.
Judy needs to check the original strings to make sure that a single letter
difference on entry has not created a separate value, but whichever profile
or pattern is generated by adding the 21 codes for valid diagnoses the
resulting variables are useful. All that needs doing now is to recode the
more frequent combinations into something manageable for reporting tables.

Thanks for your time: I've been using SPSS syntax since 1972, but I've
learned quite a lot from you and from others on the list about AUTORECODE,
AGGREGATE and even VECTOR. Even this 70-year-old dog can learn a few new
tricks.

John

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: 23 June 2011 13:27
To: [hidden email]
Subject: Re: Recoding for more than one result

John,
My Hash was beginning at 2 rather than 1, hence the discrepancy.
David
--
GET FILE blah blah blah..../ DROP num1 TO num11.
AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
COMPUTE HASH=0.
NUMERIC D2 TO D22.
VECTOR D=D2 TO D22 /X=num1 TO num11.
LOOP #=1 TO 11.
+ DO IF X(#) NE 1.
+ COMPUTE D(X(#-1))=1.
** This is not necessary, but if you insist ;-)**.
* Was previously 2**X(#-1) *.
+ COMPUTE HASH=SUM(HASH,2**X(#-2)).
+ END IF.
END LOOP.

RECODE D2 TO D22 (symsis=0).
COMPUTE NRESP=SUM(D2 TO D22).
AGGREGATE OUTFILE *
/ BREAK D2 TO D22
/ Npattern=N
/ NResp=MAX(NRESP)
/ HashPatt=MAX(HASH).
SORT CASES BY NRESP(A) NPattern (D).
LIST.

John F Hall wrote:

>
> David
>
> I already tried a modified version of my own, but it still didn't match
> yours. It's late over here, so I'll do your new one tomorrow. I've
> generated a data set with variables from both your and my versions. All
> versions are internally consistent when listing, but the values,
> distributions and profiles are not the same between your HASH and my
> PROFILE
> variables.
>
> temp .
> Select if <var> = <value> .
> List var serial <var> d1 to d22 .
>
> It's worth persevering with this as Judy says they will be doing a lot
> more
> analyses with other data, so we might as well get it right. However I
> think
> the main problem is the design of the data capture method with all those
> strings, so I may well knock up a transfer instrument based on the current
> data.
>
> John
>
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> David Marso
> Sent: 22 June 2011 21:21
> To: [hidden email]
> Subject: Re: Recoding for more than one result
>
> "I need time to check why our versions give different results, so I'll get
> back later."...
>
> Hi John,
> Upon inspection it appears you are omitting the first value of the
> autorecoded variables (missing).
> This will create an 'off by one' discrepancy.
> Try this variation which does not process the first category.
> HTH, David
> ---
> GET FILE blah blah blah..../ DROP num1 TO num11.
> AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
> COMPUTE HASH=0.
> NUMERIC D1 TO D21.
> VECTOR D=D1 TO D21 /X=num1 TO num11.
> LOOP #=1 TO 11.
> + DO IF X(#) NE 1.
> + COMPUTE D(X(#-1))=1.
> ** This is not necessary, but if you insist ;-)**.
> + COMPUTE HASH=SUM(HASH,2**X(#-1)).
> + END IF.
> END LOOP.
>
> RECODE D1 TO D21 (symsis=0).
> COMPUTE NRESP=SUM(D1 TO D21).
> AGGREGATE OUTFILE *
> / BREAK D1 TO D21
> / Npattern=N
> / NResp=MAX(NRESP)
> / HashPatt=MAX(HASH).
> SORT CASES BY NRESP(A) NPattern (D).
> LIST.
>
> John F Hall wrote:
>>
>> David
>>
>> Went back to scratch and ran your new syntax. It worked, but your
>> combination variable HASH has a different distribution from my variable
>> PATTERN. Part of the problem is that the data were originally entered as
>
> <SNIP>
> --
>

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Recoding-for-more-than-one-res
ult-tp4477253p4517131.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Recoding for more than one result

Administrator

"I think this may be because
the same code can appear more than once in the 11 source variables, which
explains why you get 7 but I get 11 for the one case with 11 diagnoses."
Yes, my code counts each type of diagnosis only once. In most cases that is what one would do when working with MR/MD data within SPSS. I suspect running MR using values 1..22 would render proper counts using Multiple Response procedure. OTOH MD would only map the dummy variables and under report the overall incidence.
"Judy needs to check the original strings to make sure that a single letter
difference on entry has not created a separate value..."
Yes, As (I mentioned previously) in the original tables you reported about a week ago there are 2 issues matching this description.
6 Childhood/Adolescent Disorder
7 Childhood/Adolscent Disorder
--
11 Learning Disorder/Disabilities
12 Learning Disorders/Disabilities
These should be rectified before moving forward ;-)
Note that D2 To D22 can then be simply D2 TO D20.
---
Accounting for TOTAL number of diagnoses rather that count of distinct diagnoses ;=)
a few teensy weensy mods:
(move recode after the NUMERIC declaration, increment "dummies",
also corrected spelling of SYSMIS -DOH-).
GET FILE blah blah blah..../ DROP num1 TO num11.
AUTORECODE Axis1Cat1 to Axis1Cat11 /into num1 to num11 /group.
COMPUTE HASH=0.
NUMERIC D2 TO D22.
RECODE D2 TO D22 (SYSMIS=0).
VECTOR D=D2 TO D22 /X=num1 TO num11.
LOOP #=1 TO 11.
+ DO IF X(#) NE 1.
+ COMPUTE D(X(#-1))=D(X(#-1)) + 1.
** This is not necessary, but if you insist ;-)**.
* Was previously 2**X(#-1) *.
+ COMPUTE HASH=SUM(HASH,2**X(#-2)).
+ END IF.
END LOOP.
COMPUTE NRESP=SUM(D2 TO D22).
AGGREGATE OUTFILE * / BREAK D2 TO D22
/ Npattern=N / NResp=MAX(NRESP) / HashPatt=MAX(HASH).
SORT CASES BY NRESP(A) NPattern (D).
LIST.
---
"All that needs doing now is to recode the more frequent combinations into something manageable for reporting tables."
These fall out of the AGGREGATE almost without any effort at all.

"Thanks for your time: I've been using SPSS syntax since 1972, but I've
learned quite a lot from you and from others on the list about AUTORECODE,
AGGREGATE and even VECTOR. Even this 70-year-old dog can learn a few new
tricks."
As soon as I can afford (when hell freezes over?) to get a copy of SPSS, this "old dog" will be learning some Python, OMS, GPL.
Maybe at that point some cognescenti will throw a pebble at my head before I run off a cliff ;-)
Meanwhile my crusty old circa 2003 11.5 does everything I need it to do. People seem to prematurely Pythonize everything and overlook the basic QAD raw power and simplicity of MACROs (So they are completely retarded about simple math ;-). I can see the very powerful use of Python if you need access to the data dictionary/data cursor etc...OTOH, I solved that problem in my client applications using the SPSSIO.dll about hmmm 15 years ago. So I'm LMAO that SPSS/IBM inc has finally caught up to implementations I figured out using VB6 AGES ago. Happy they finally did so, but what took so long?
Follow the money?...
--

John F Hall wrote

David

I've spent quite some time trying to reconcile the differences. The 22 (or
21) dummy binary and intermediate **2 variables created are the same for
both methods, but the total diagnoses differ. I think this may be because
the same code can appear more than once in the 11 source variables, which
explains why you get 7 but I get 11 for the one case with 11 diagnoses.
Judy needs to check the original strings to make sure that a single letter
difference on entry has not created a separate value,
<SNIP...>

parisec

Re: mixed models - time as DV

In reply to this post by Maguin, Eugene

Hi all,

After a couple of months of contemplation. I have come up with a tentative model for the data i pasted below a couple of months ago.

>My goal is to determine if age, year and task have an effect on task time.
>Another variable i have in the data is number of times task is performed. i was originally thinking i needed to do repeated measures anova. but what these data don't show is that there >>are 28 tasks and 36 years and different people performed the tasks in different years. there are also several levels of correlation in the data: 1) the same person performed the same >>task in different years; 2) The same year had several different performers. This is why i turned to mixed models.

>The hypothesis is that age and number of times someone completes any task has an effect on task time. my descriptive analysis suggests that everyone improves their time on any
>>task after doing any task twice. but the amount of improvement depends on age. people on the lowest quintiles of age improve a lot more than people in higher quintiles of age. i also know >that there is a task effect. Some of the tasks have mean finish times that are longer than others just because the tasks are harder.

>ID Year TaskName Timehrs Age
>1 1986 ONE 2:15 40
>1 1990 ONE 3:00 44
>1 2000 ONE 1:59 45
>1 2005 TWO 4:05 50
>1 2007 TWO 3:58 52
>1 2008 TWO 3:42 53
>2 2001 TWO 3:00 30
>2 2002 THREE 1:35 31
>2 2003 THREE 1:55 33
>2 2005 TWO 2:25 35
>2 2006 THREE 2:40 36

Gene, I decided that this is a 2-level model:

Level 1: id, age and task number
Level 2: Year and task

Correlations I need to take into account:

1. The time that IDx has in YEARy is correlated with the time that IDx has in YEARz i.e. someones time on a task in 1986 will be correlated with their time in 1990
2. The time that IDx has on TASKy is correlated with the time that IDx has in TASKz i.e. someones time on a task for TASK 1 will be correlated with their time for TASK 3.

Other assumptions:

1. Task is a random variable. I view task as a random sample of all tasks of this type.
2. Year is a random variable. The year of the task is a random sample of any year the task was given.

I considered thinking of both year and taskname as /REPEATED based on a post by Ryan. But given the number of years and tasks included, it didn't make sense and with further thought, they are really like "school" in the example in Norusis.

Findings from descriptive analyses:

There is an interaction of age*number of times a task was completed: Number of times a task is completed improves a person's time but time improvement depends on age. Older people improve after doing a task 2 times but then the improvement stops. Younger people improve after up to 4 tries.

This is where i would appreciate feedback. Here is the code I am playing around with to model this.

sort cases by age5.
split file by age5.
MIXED
timehrs BY TrialNum TaskName ID year
/FIXED = TrialNum | SSTYPE(3)
/METHOD = REML
/PRINT = SOLUTION TESTCOV
/RANDOM =Intercept| Subject (id) /RANDOM =Intercept| Subject (taskname) /RANDOM =Intercept| Subject (year) | COVTYPE (VC)
split file off.

The rationale for running the analysis separate for age categories is to handle the trialnum x age interaction. The 5 age categories made sense for this population. I find that for models with interaction terms, it is easier for interpretation to stratify by one of the variables in the interaction.

I'm not sure that this RANDOM statement will work but the goal of including it in this post is to show that putting these variables in the Subject list seems to be the only way i can get the model to run without hanging. I also read that "Variables in the Subject list identify clusters of cases that are independent" and each of the variables listed fit this.

I haven't nailed down the covariance structure yet. I need to look at the resulting covariance matrices and residuals. Suggestions on this are welcome.

I know this isn't the most sophisticated method that could be used to model these data but given its complexity, I think this accounts for the variance that needs to be included in a manner that will be interpretable. I was getting analysis paralysis and needed to actually model the data.

I may regret asking since ignorance is bliss ...Am I missing something huge here? Would this get completely reamed by a reviewer?

Many thanks
Carol

----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Tuesday, June 28, 2011 7:55 AM
To: [hidden email]
Subject: Re: mixed models - time as DV [SEC: UNOFFICIAL]

Carol,

I think a better initial model for your data is school kids who get two different tests, english and music, at multiple times in a year. Why? Your folks are tested on multiple tasks and are tested on a given task multiple times. Using the schoolkids example, I think you'd have a three level model.

Level 1: y(ijk)=B1-0(ij) + B1-1(jk)*Time + e1(ijk).
Level 2a: B1-0(ij)=B2-0(i) + B2-1*TestType + e2(ij).
Level 2b: I'm going to assume that slope term, B1-1, is constant across kids and test type.
Level 3a: B2-0(i)=B3-0 + B3-1*Gender + e3(i).
Level 3b: I'm going to assume that test type term, B2-1, is constant across kids.

My coefficient nomenclature is very odd but is B(level)-(term number: 0 = intercept, 1 = slope).

But here's what makes your data more difficult. Again in terms of schoolkids. Think about a whole school system, K-12, but really small, one class per grade. Lots of different subjects. Kids come and go. Others are there for the whole 12 years. Some are tested one or two times in a subject or several subject. Others are tested repeatedly.

You have some selection criteria for number of tasks per person but you have
28 tasks, I think. I don' have any experience with anything this complex.

Mixed assumes a normal distribution. It kind of sounds like you've got your data converted to decimal minutes. How normal do they look?

>>When you suggest an offset age as a polynomial are you thinking:

1) compute agediff = age at each task - age at first task
2) compute agediff squared
3) enter age at first task
4) enter agediff squared

So your level 1 equation is
Score(ijk) = B1-0 + B1-1*Time_offset + e1(ijk).

Time_offset is the test time points relative to first task at time=0. These are the time offset for task_type = i. Each task_type has its own time offsets.

Gene Maguin

-----Original Message-----
From: Parise, Carol A. [mailto:[hidden email]]
Sent: Monday, June 27, 2011 6:55 PM
To: 'Gene Maguin'; [hidden email]
Subject: RE: mixed models - time as DV [SEC: UNOFFICIAL]

Gene,

I'm not quite sure whether I have a two level or three level structure. In the examples I am reading, it seems obvious. For example, groups of students have the same teacher within a school. My thought is that i''ve got three
tiers: task, year, person.

I have 28 tasks. I can think of this as task being the top tier since some tasks are known to be more difficult than others. Where things get muddled is that year is inherently tied to the task because weather plays a huge role within a task.

When you suggest an offset age as a polynomial are you thinking:

1) compute agediff = age at each task - age at first task
2) compute agediff squared
3) enter age at first task
4) enter agediff squared

Thanks. Everyone's comments have been very helpful.

Carol

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Wednesday, June 22, 2011 1:28 PM
To: [hidden email]
Subject: Re: mixed models - time as DV [SEC: UNOFFICIAL]

Carol,

I was thinking that if you computed a new age variable that was the offset from the firat age at any task you could treat age at first task as a level two (person) variable and the age offset as a level 1 variable. Offset age could be modeled as a polynomial.

Year at first test would be a second person variable. Categorizing age makes results display easier, for sure. The empirical question is whether categorizing is more informative in a variance modeled sense.

Are you thinking of this dataset as having a two level or a three level structure?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Parise, Carol A.
Sent: Wednesday, June 22, 2011 3:57 PM
To: [hidden email]
Subject: Re: mixed models - time as DV [SEC: UNOFFICIAL]

My thought was that person, year, and task would be considered random variables and quintile of age and 'task time' i.e. 1st, 2nd, 3rd, etc would be fixed.

I know someone is going to say "you shouldn't group data if you don't have to". I have read these references.

My descriptive data show that there is an improvement between task 1 and task 2 (regardless of the task) and that improvement diminishes after around
3 tries. However, younger age leads to higher improvement than older age. So I need to include an interaction.

Stratfiying into quintiles makes this easy to see graphically but i'm not sure if this is the best way to handle this in the model. I've been reading a bit about entering splines into these models but i only know enough to be dangerous at this point.

________________________________________
From: Gosse, Michelle [[hidden email]]
Sent: Wednesday, June 22, 2011 12:44 PM
To: Parise, Carol A.; [hidden email]
Subject: RE: mixed models - time as DV [SEC: UNOFFICIAL]

Hi Carol,

How are you handling the "year" variable, I apologise if I have missed this information from previous postings?

Cheers
Michelle

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Parise, Carol A.
Sent: Thursday, 23 June 2011 7:41 AM
To: [hidden email]
Subject: Re: mixed models - time as DV

great! hard to believe but these are actual times. it's kind of a crazy 'task'.

________________________________
From: Art Kendall [[hidden email]]
Sent: Wednesday, June 22, 2011 12:39 PM
To: Parise, Carol A.; SPSSX-L post
Subject: Re: [SPSSX-L] mixed models - time as DV

This worked okay on my machine.
I guess then that these are durations (intervals) rather specific times?

data list list/thetime(time12.2).
begin data
31:22:33
24:00:00
23:59:55
42:10:10
end data.
compute secs = thetime.
formats secs (f16).
compute back = secs.
formats back (time12.2).
execute.
list.

Art

On 6/22/2011 3:29 PM, Parise, Carol A. wrote:

Thanks Art. I don't have the data in front of me today but i will check this out.

Do you know if are differences when time is over 24:00:00 versus under 24:00:00? when working with hh:mm:ss, in the past, i recall having some issues with this. some of these "tasks" took more than a day.

________________________________
From: Art Kendall [[hidden email]<mailto:[hidden email]>]
Sent: Wednesday, June 22, 2011 5:39 AM
To: Parise, Carol A.
Cc: [hidden email]<mailto:[hidden email]>
Subject: Re: [SPSSX-L] mixed models - time as DV

What format did you use to put the DV in?
try this syntax. It appears that time12.2 format works on my machine.
data list list/thetime(time12.2).
begin data
1:22:33
24:00:00
23:59:55
12:10:10
end data.
compute secs = thetime.
formats secs (f16).
compute back = secs.
formats back (time12.2).
execute.
list.

I then saved that file and opened a new data file.
I copied secs into the new data file.
I changed the type to date and used the drop down list to get a time format.

Also, the only time I would expect the intercept to be exactly equal to the grand mean would be when all predictors were zero like with z-scores.

Art Kendall
Social Research Consultants

On 6/21/2011 6:37 PM, Parise, Carol A. wrote:
Art,

This appears to work if I use the format: dd-mm-yyyy hh:mm:ss. Any other format and the numbers are not logical. But, the date that showis up is kind of crazy: 16-MAR-1590. I'm a bit concerned that this may not be accurate.
Although the time is in the ballpark, I would think that the overall mean i get using descriptives should be just about the same as what i get from this model and it's off by a good amount and it's not.

Thank you!
Carol

________________________________
From: Art Kendall [mailto:[hidden email]]
Sent: Tuesday, June 21, 2011 2:58 PM
To: Parise, Carol A.
Cc:
[hidden email]<mailto:[hidden email]><mailto:SPSSX-L@LIS
TSERV.UGA.EDU><mailto:[hidden email]>
Subject: Re: [SPSSX-L] mixed models - time as DV

try this as a workaround.
Time is in seconds since the start of the Gregorian calendar.
While you have the .spv file open, open a new data file <file><new><data> Switch to the the output file. highlight and copy the intercept.
Switch to the data file. Paste the intercept.
Click the variables view tab.
change the format to time.

Art Kendall
Social Research Consultants

On 6/21/2011 5:40 PM, Parise, Carol A. wrote:

Hi all,

This one of probably many questions i will likely be posting on using linear mixed models over the next few months. It's my first crack at using this and i'm slowly working through the lingo by reading as much as I can. I read the SPSS technical report on this and i found an example from nice little article that i am using to mess with my data.
http://www.indiana.edu/~statmath/stat/all/hlm/hlm.pdf<http://www.indiana.edu
/%7Estatmath/stat/all/hlm/hlm.pdf><http://www.indiana.edu/%7Estatmath/stat/a
ll/hlm/hlm.pdf>

The article explains the interpretation of the intercept term in the empty model is equivalent to the overall math achievement score.

My data have time in hh:mm:ss as the DV.

In my sample analysis, the scale of the intercept is not in hh:mm:ss and i can't seem to adjust to to be in this format with the "cell properties" when i click on the output.

It would be really helpful to have the correct scale versus just the p-value for interpretation purposes. Anyone have thoughts on how I can do this?

I am running version 14.0 and we won't be upgrading anytime soon.

Thanks much.
Carol

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

123