|
Hi, all.
I have a file in wide format with the following variables. ID Date1 VarA1 VarB1 VarC1 VarD1...Date2 VarA2 VarB2 VarC2 VarD2... The numerals identify the time of the assessment. The letters identify the repeating variables for each time of assessment. I have a macro that will identify the first and last assessment dates and variables. However, for most of my work I'm looking for the dates and variables which occurred within the time a patient was in a particular program. This last task is easy given that I can match the id's to a separate file with dates of program enrollment. However, in order to do that I think that I need to move this into long format with the following look. ID Date1 VarA1 VarB1 VarC1 VarD1 ID Date2 VarA2 VarB2 VarC2 VarD2 Once in long format, I know how to vector the first and last occurrences back into a wide format for analysis. My challenge is how to get the data into long format given that the dates and variables are all labeled differently and there are frequently 50 or more administrations of the assessment. Thanks. Brian ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
So, since the current data set is wide format, every record has the same number of variables but some records might have valid data for only the DATE1 variable set while others have valid data for all 50 variable sets and everything in between.
One way is to do a massive reordering of the current dataset to this ordering Date1 date2 ... date50, vara1 ... vara50, varb1 ... varb50, etc. Then you can do a (massive) casestovars. Lots of "make" statements on this one. There's no limit on the number of make statements. Another way is to do a (massive) varstocases to create a three variable dataset (personid, indexvar, datavar). Perhaps some people missed some time points and then came back in or missed some variables within a timepoint; you need to keep all variables in the dataset so the NULL subcommand has to be Keep! Probably the variableset date variables have a format different from the data variables. I think you should change format for the date variables to match that of the data variables (vara1, etc) because there will be a format mismatch between the variables being restructured. (At this point you'll the matching file operation and select the variables to be retained.) Then construct a variableset id variable and do a little sorting magic so that the date variable and the associated data variables for that time point all have the same variable set id. Then do a casestovars using personid and variableset id as the id variables. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Brian Dates Sent: Monday, May 15, 2017 4:02 PM To: [hidden email] Subject: Reshaping from Wide to Long with Differing Dates and Variable Labels Hi, all. I have a file in wide format with the following variables. ID Date1 VarA1 VarB1 VarC1 VarD1...Date2 VarA2 VarB2 VarC2 VarD2... The numerals identify the time of the assessment. The letters identify the repeating variables for each time of assessment. I have a macro that will identify the first and last assessment dates and variables. However, for most of my work I'm looking for the dates and variables which occurred within the time a patient was in a particular program. This last task is easy given that I can match the id's to a separate file with dates of program enrollment. However, in order to do that I think that I need to move this into long format with the following look. ID Date1 VarA1 VarB1 VarC1 VarD1 ID Date2 VarA2 VarB2 VarC2 VarD2 Once in long format, I know how to vector the first and last occurrences back into a wide format for analysis. My challenge is how to get the data into long format given that the dates and variables are all labeled differently and there are frequently 50 or more administrations of the assessment. Thanks. Brian ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
I can't test this because my SPSS install has tanked ;-(
Do a massive VARSTOCASES COMPUTE @=1. VARSTOCASES /MAKE var FROM date1 TO @/INDEX=strindex(var). The use CHAR.INDEX to extract the numeric part from strindex. Then use CASESTOVARS to build the desired chopped sectioned data.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Gene/David,
First...David, I'm sorry about the SPSS Install. Good luck. Second, thanks for the input; it seems to be going in the same direction...a massive VARSTOCASES. I'll try it and then get back re: it's success. Brian ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of David Marso [[hidden email]] Sent: Monday, May 15, 2017 6:28 PM To: [hidden email] Subject: Re: Reshaping from Wide to Long with Differing Dates and Variable Labels I can't test this because my SPSS install has tanked ;-( Do a massive VARSTOCASES COMPUTE @=1. VARSTOCASES /MAKE var FROM date1 TO @/INDEX=strindex(var). The use CHAR.INDEX to extract the numeric part from strindex. Then use CASESTOVARS to build the desired chopped sectioned data. Maguin, Eugene wrote > So, since the current data set is wide format, every record has the same > number of variables but some records might have valid data for only the > DATE1 variable set while others have valid data for all 50 variable sets > and everything in between. > > One way is to do a massive reordering of the current dataset to this > ordering > Date1 date2 ... date50, vara1 ... vara50, varb1 ... varb50, etc. > Then you can do a (massive) casestovars. Lots of "make" statements on this > one. There's no limit on the number of make statements. > > Another way is to do a (massive) varstocases to create a three variable > dataset (personid, indexvar, datavar). Perhaps some people missed some > time points and then came back in or missed some variables within a > timepoint; you need to keep all variables in the dataset so the NULL > subcommand has to be Keep! Probably the variableset date variables have a > format different from the data variables. I think you should change format > for the date variables to match that of the data variables (vara1, etc) > because there will be a format mismatch between the variables being > restructured. (At this point you'll the matching file operation and select > the variables to be retained.) Then construct a variableset id variable > and do a little sorting magic so that the date variable and the associated > data variables for that time point all have the same variable set id. > Then do a casestovars using personid and variableset id as the id > variables. > > Gene Maguin > > > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto: > SPSSX-L@.UGA > ] On Behalf Of Brian Dates > Sent: Monday, May 15, 2017 4:02 PM > To: > SPSSX-L@.UGA > Subject: Reshaping from Wide to Long with Differing Dates and Variable > Labels > > Hi, all. > > I have a file in wide format with the following variables. > > ID Date1 VarA1 VarB1 VarC1 VarD1...Date2 VarA2 VarB2 VarC2 VarD2... > > The numerals identify the time of the assessment. The letters identify the > repeating variables for each time of assessment. I have a macro that will > identify the first and last assessment dates and variables. However, for > most of my work I'm looking for the dates and variables which occurred > within the time a patient was in a particular program. This last task is > easy given that I can match the id's to a separate file with dates of > program enrollment. However, in order to do that I think that I need to > move this into long format with the following look. > > ID Date1 VarA1 VarB1 VarC1 VarD1 > ID Date2 VarA2 VarB2 VarC2 VarD2 > > Once in long format, I know how to vector the first and last occurrences > back into a wide format for analysis. My challenge is how to get the data > into long format given that the dates and variables are all labeled > differently and there are frequently 50 or more administrations of the > assessment. Thanks. > > Brian > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to manage > subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Reshaping-from-Wide-to-Long-with-Differing-Dates-and-Variable-Labels-tp5734156p5734158.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by bdates
The problem seems to be the long set of MAKE specs in CasesToVars. I would avoid them by using the technique that I used before CasesToVars existed,
setting up a loop and assigning variables to be written with XSAVE.
For these data, you have something like VECTOR items= Date1 to VarD55 (say) which will be 55*5 in length.
The you do a loop # = 1 to 55, and Compute Row= 5*# -4. COMPUTE Date= items(Row). COMPUTE VarA= iems(Row+1). COMPUTE VarB= items(Row+2). COMPUTE VarC= items(Row+3). COMPUTE VarD= items(Row+4).
XSAVE outfile= .... /vars= Id Row Date VarA VarB VarC VarD . end the loop and Execute.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Brian Dates <[hidden email]>
Sent: Monday, May 15, 2017 4:02:27 PM To: [hidden email] Subject: Reshaping from Wide to Long with Differing Dates and Variable Labels Hi, all.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
I have a file in wide format with the following variables. ID Date1 VarA1 VarB1 VarC1 VarD1...Date2 VarA2 VarB2 VarC2 VarD2... The numerals identify the time of the assessment. The letters identify the repeating variables for each time of assessment. I have a macro that will identify the first and last assessment dates and variables. However, for most of my work I'm looking for the dates and variables which occurred within the time a patient was in a particular program. This last task is easy given that I can match the id's to a separate file with dates of program enrollment. However, in order to do that I think that I need to move this into long format with the following look. ID Date1 VarA1 VarB1 VarC1 VarD1 ID Date2 VarA2 VarB2 VarC2 VarD2 Once in long format, I know how to vector the first and last occurrences back into a wide format for analysis. My challenge is how to get the data into long format given that the dates and variables are all labeled differently and there are frequently 50 or more administrations of the assessment. Thanks. Brian ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
You can use a single MAKE subcommand with an index and parse the index variable as I previously suggested.
If you take Rich's path note that LOOP has a BY clause which is useful. LOOP #=1 TO 55 BY 5. etc... Another approach would be to use RESHAPE in MATRIX language. Untested due to FUBARed SPSS . MATRIX. GET data / FILE */VARIABLES date1 TO varD50. GET id /FILE * /VARIABLE ID. COMPUTE n=NROW(data). COMPUTE p=NCOL(data)/5. SAVE ({KRONECKER(ID,MAKE(p,1,1),RESHAPE(data,n*p,5) } ) /OUTFILE * /VARIABLES date varA varB varC varD. END MATRIX.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
Test solution for El Grande VarsToCases.
I opted for AGGREGATE rather than CASESTOVARS as I could not get the names of Root to appear as new variable names. This method also avoids any SORT which would typically be required. /* Simulate Brian's data */. DEFINE !renameAll (n !TOKENS(1) / list !CMDEND). !LET !vlist=!Null !DO !I=1 !TO !n !DO !Var !IN ( !List) !LET !vlist=!CONCAT(!vlist,!Var,!I," ") !DOEND !DOEND RENAME VARIABLES (X2 TO X251 = !vlist ). !ENDDEFINE . MATRIX. SAVE MAKE(10000,251,1)/OUTFILE * / VARIABLES X1 TO X251 . END MATRIX. DATASET NAME raw. RENAME VARIABLES x1=ID. SET PRINTBACK ON MPRINT ON. !renameAll n=50 list=date VarA VarB VarC VarD. /* Simulation complete */. /* Actual code */. COMPUTE ID=$CASENUM. COMPUTE @=1. VARSTOCASES / MAKE values FROM date1 TO @ /INDEX =vnames(values). SELECT IF vnames NE '@'. COMPUTE firstNumber=CHAR.INDEX(vnames,'0123456789',1). STRING Root (A8). COMPUTE NewIndex=NUMBER(CHAR.SUBSTR(vnames,firstNumber),F2). COMPUTE Root=CHAR.SUBSTR(vnames,1,firstNumber-1). DO REPEAT v=date VarA VarB VarC VarD /s="date" "VarA" "VarB" "VarC" "VarD". IF (Root EQ s) v= values. END REPEAT. AGGREGATE OUTFILE * / BREAK ID NewIndex / date VarA VarB VarC VarD=MAX(date VarA VarB VarC VarD).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
A tested solution using MATRIX.
SET MXLOOPS=50. MATRIX. GET data / FILE */VARIABLES date1 TO varD50. GET id /FILE * /VARIABLE ID. COMPUTE IndX=MAKE(NCOL(data)/5,1,1). LOOP #=1 TO NCOL(data)/5. COMPUTE IndX(#)=#. END LOOP. SAVE ({KRONEKER(ID,MAKE(NCOL(data)/5,1,1) ), KRONEKER(MAKE(NROW(data),1,1), IndX),RESHAPE(data,NROW(data)*NCOL(data)/5,5) }) /OUTFILE =*/VARIABLES =ID index date varA varB varC varD. END MATRIX.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
In reply to this post by David Marso
Nice one, David. But try it with actual date variables. When I tried that (for Date1, Date2, and Date3 only), the date values were expressed as # of seconds since October 14, 1582, and I had to use ALTER TYPE to get a real date variable. I expect this will also be an issue for the MATRIX solution you posted.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
Just after the MATRIX code use FORMATS date (ADATE). Or whatever date format rocks your world. On Tue, May 16, 2017 at 2:05 PM, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote: Nice one, David. But try it with actual date variables. When I tried that (for Date1, Date2, and Date3 only), the date values were expressed as # of seconds since October 14, 1582, and I had to use ALTER TYPE to get a real date variable. I expect this will also be an issue for the MATRIX solution you posted.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Already solved that one. I’m working on the CASESTOVARS syntax. Thanks everyone. B Brian Dates, M.A. From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of David Marso Just after the MATRIX code use FORMATS date (ADATE). Or whatever date format rocks your world. On Tue, May 16, 2017 at 2:05 PM, Bruce Weaver [via SPSSX Discussion] <[hidden email]> wrote: Nice one, David. But try it with actual date variables. When I tried that (for Date1, Date2, and Date3 only), the date values were expressed as # of seconds since October 14, 1582, and I had to use ALTER TYPE
to get a real date variable. I expect this will also be an issue for the MATRIX solution you posted.
David Marso wrote Test solution for El Grande VarsToCases.
David Marso wrote You can use a single MAKE subcommand with an index and parse the index variable as I previously suggested.
Rich Ulrich wrote The problem seems to be the long set of MAKE specs in CasesToVars.
-- If you reply to this email, your message will be added to the discussion below: To unsubscribe from Reshaping from Wide to Long with Differing Dates and Variable Labels,
click here. Please reply to the list and not to my personal email.
View this message in context:
Re: Reshaping from Wide to Long with Differing Dates and Variable Labels |
|
Administrator
|
Brian, I got very close earlier but all my values were a constant. This is cleaner than having to know all of the variable names ;-) The Key here is /AUTOFIX=NO. COMPUTE @=1. VARSTOCASES / MAKE values FROM date1 TO @ /INDEX =vnames(values). SELECT IF vnames NE '@'. COMPUTE #firstNumber=CHAR.INDEX(vnames,'0123456789',1). STRING Root (A8). COMPUTE NewIndex=NUMBER(CHAR.SUBSTR(vnames,#firstNumber),F2). COMPUTE Root=CHAR.SUBSTR(vnames,1,#firstNumber-1). EXECUTE. DELETE VARIABLES vnames . CASESTOVARS /ID=ID NewIndex /INDEX=Root /AUTOFIX=NO. On Tue, May 16, 2017 at 2:46 PM, bdates [via SPSSX Discussion] <[hidden email]> wrote:
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
David,
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Thank-you very much. I'm separated from my SPSS right now, so I'll try this in the morning. Thanks to Bruce, Gene, and Rich as well. I really appreciate this.
B
From: SPSSX(r) Discussion [[hidden email]] on behalf of David Marso [[hidden email]]
Sent: Tuesday, May 16, 2017 4:29 PM To: [hidden email] Subject: Re: Reshaping from Wide to Long with Differing Dates and Variable Labels Brian,
I got very close earlier but all my values were a constant.
This is cleaner than having to know all of the variable names ;-)
The Key here is /AUTOFIX=NO.
COMPUTE @=1.
VARSTOCASES / MAKE values FROM date1 TO @ /INDEX =vnames(values). SELECT IF vnames NE '@'. COMPUTE #firstNumber=CHAR.INDEX(vnames,'0123456789',1). STRING Root (A8). COMPUTE NewIndex=NUMBER(CHAR.SUBSTR(vnames,#firstNumber),F2). COMPUTE Root=CHAR.SUBSTR(vnames,1,#firstNumber-1). EXECUTE. DELETE VARIABLES vnames . CASESTOVARS /ID=ID NewIndex /INDEX=Root /AUTOFIX=NO. On Tue, May 16, 2017 at 2:46 PM, bdates [via SPSSX Discussion]
<[hidden email]> wrote:
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" View this message in context: Re: Reshaping from Wide to Long with Differing Dates and Variable Labels Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
