|
I am sure I have seen this solution somewhere before but I just can find it right now.
My dataset had the following structure: INTERVIEW_ID, QUESTION, ANSWER 1, Q1, A1 1, Q2, A1 1, Q3, A4 1, Q3, A5 1, Q4, A1 1, Q5, A3 2, Q1, A1 2, Q2, A2 2, Q3, A1 2, Q5, A1 etc.... and I would like to restructure the data to one line for each interview. The specific dataset would have to become: INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5 1, A1, A1, A4, A5, A1, A3 2, A1, A2, A1,,, A1 I am dealing with two particular aspects: [1] I have some multiple response questions [2] I have some missing values and I would like to get a rectangular structure cases by variables. Can someone suggest solutions or previous posting that can help me to solve this issue? Thanks, Luca Luca Meyer www.lucameyer.com PASW Statistics v. 18.0.1 (13-nov-2009) R version 2.9.2 (2009-08-24) Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
see FILE TYPE MIXED under <help>
INTERVIEW_ID would identify the case and QUESTIONS would identify the record within the case. Art Kendall Social Research Consultants On 4/22/2010 9:07 AM, Luca Meyer wrote: > I am sure I have seen this solution somewhere before but I just can find it right now. > > My dataset had the following structure: > > INTERVIEW_ID, QUESTION, ANSWER > 1, Q1, A1 > 1, Q2, A1 > 1, Q3, A4 > 1, Q3, A5 > 1, Q4, A1 > 1, Q5, A3 > 2, Q1, A1 > 2, Q2, A2 > 2, Q3, A1 > 2, Q5, A1 > etc.... > > and I would like to restructure the data to one line for each interview. The specific dataset would have to become: > > INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5 > 1, A1, A1, A4, A5, A1, A3 > 2, A1, A2, A1,,, A1 > > I am dealing with two particular aspects: > [1] I have some multiple response questions > [2] I have some missing values > and I would like to get a rectangular structure cases by variables. > > Can someone suggest solutions or previous posting that can help me to solve this issue? > > Thanks, > Luca > > Luca Meyer > www.lucameyer.com > PASW Statistics v. 18.0.1 (13-nov-2009) > R version 2.9.2 (2009-08-24) > Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Thanks Art,
Since I already imported it on a SAV format wouldn't it be possible to use CASESTOVARS to restructure the file? Luca Il giorno 22/apr/2010, alle ore 17.54, Art Kendall ha scritto: > see FILE TYPE MIXED under <help> > INTERVIEW_ID would identify the case and QUESTIONS would identify the record within the case. > > > > Art Kendall > Social Research Consultants > > On 4/22/2010 9:07 AM, Luca Meyer wrote: >> I am sure I have seen this solution somewhere before but I just can find it right now. >> >> My dataset had the following structure: >> >> INTERVIEW_ID, QUESTION, ANSWER >> 1, Q1, A1 >> 1, Q2, A1 >> 1, Q3, A4 >> 1, Q3, A5 >> 1, Q4, A1 >> 1, Q5, A3 >> 2, Q1, A1 >> 2, Q2, A2 >> 2, Q3, A1 >> 2, Q5, A1 >> etc.... >> >> and I would like to restructure the data to one line for each interview. The specific dataset would have to become: >> >> INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5 >> 1, A1, A1, A4, A5, A1, A3 >> 2, A1, A2, A1,,, A1 >> >> I am dealing with two particular aspects: >> [1] I have some multiple response questions >> [2] I have some missing values >> and I would like to get a rectangular structure cases by variables. >> >> Can someone suggest solutions or previous posting that can help me to solve this issue? >> >> Thanks, >> Luca >> >> Luca Meyer >> www.lucameyer.com >> PASW Statistics v. 18.0.1 (13-nov-2009) >> R version 2.9.2 (2009-08-24) >> Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0 >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
If you post your attempt(s) at using CASESTOVARS, perhaps someone will spot the problem.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Luca Meyer-3
At 09:07 AM 4/22/2010, Luca Meyer wrote:
My dataset had the following structure: INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5 1 A1 A1 A4 A5 A1 A3 2 A1 A2 A1 A1 3 B1 B2 B3 B4a B4b B4c B5 I am dealing with two particular aspects: Your problem is that you don't have, at the start, which questions have multiple responses, so you don't have enough for CASESTOVARS to work on. The trick is a transformation program that uses AGGREGATE to identify which questions have multiple responses, and then creates a variable "Response" that identifies responses rather than questions: where a question has multiple responses, it has a different value for each response. * Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=INTERVIEW_ID QUESTION /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=QUESTION /MaxResp 'Max # of responses to this question, any interview' = MAX(NResp). LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:36 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER NResp MaxResp 1 Q1 A1 1 1 1 Q2 A1 1 1 1 Q3 A4 2 2 1 Q3 A5 2 2 1 Q4 A1 1 3 1 Q5 A3 1 1 2 Q1 A1 1 1 2 Q2 A2 1 1 2 Q3 A1 1 2 2 Q5 A1 1 1 3 Q1 B1 1 1 3 Q2 B2 1 1 3 Q3 B3 1 2 3 Q4 B4a 3 3 3 Q4 B4b 3 3 3 Q4 B4c 3 3 3 Q5 B5 1 1 Number of cases read: 17 Number of cases listed: 17 * Count responses within questions, and create a variable . * identifying responses, rather than questions. . NUMERIC Resp# (F3). DO IF $CASENUM EQ 1 OR INTERVIEW_ID NE LAG(INTERVIEW_ID) OR QUESTION NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION) ,'.' ,LTRIM(STRING(Resp#,F3))). END IF. LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:36 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER NResp MaxResp Resp# Response 1 Q1 A1 1 1 1 Q1 1 Q2 A1 1 1 1 Q2 1 Q3 A4 2 2 1 Q3.1 1 Q3 A5 2 2 2 Q3.2 1 Q4 A1 1 3 1 Q4.1 1 Q5 A3 1 1 1 Q5 2 Q1 A1 1 1 1 Q1 2 Q2 A2 1 1 1 Q2 2 Q3 A1 1 2 1 Q3.1 2 Q5 A1 1 1 1 Q5 3 Q1 B1 1 1 1 Q1 3 Q2 B2 1 1 1 Q2 3 Q3 B3 1 2 1 Q3.1 3 Q4 B4a 3 3 1 Q4.1 3 Q4 B4b 3 3 2 Q4.2 3 Q4 B4c 3 3 3 Q4.3 3 Q5 B5 1 1 1 Q5 Number of cases read: 17 Number of cases listed: 17 * Now the CASESTOVARS is straightforward. It can't be clicked up . * like this, though; the "/DROP" clause has to be added by hand . CASESTOVARS /ID = INTERVIEW_ID /DROP = QUESTION NResp MaxResp Resp# /INDEX = Response /GROUPBY = VARIABLE . Cases to Variables |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:37 | |-----------------------------|---------------------------| [LongForm] Generated Variables |--------|--------|------| |Original|Response|Result| |Variable| |------| | | |Name | |--------|--------|------| |ANSWER |Q1 |Q1 | | |Q2 |Q2 | | |Q3.1 |Q3.1 | | |Q3.2 |Q3.2 | | |Q4.1 |Q4.1 | | |Q4.2 |Q4.2 | | |Q4.3 |Q4.3 | | |Q5 |Q5 | |--------|--------|------| Processing Statistics |---------------|---| |Cases In |17 | |Cases Out |3 | |---------------|---| |Cases In/Cases |5.7| |Out | | |---------------|---| |Variables In |7 | |Variables Out |9 | |---------------|---| |Index Values |8 | |---------------|---| LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:37 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5 1 A1 A1 A4 A5 A1 A3 2 A1 A2 A1 A1 3 B1 B2 B3 B4a B4b B4c B5 Number of cases read: 3 Number of cases listed: 3 ============================= APPENDIX: Test data, and code ============================= (INTERVIEW_ID 3 added to originally posted test data) DATA LIST LIST (", ")/ INTERVIEW_ID, QUESTION, ANSWER (F2, A2, A3). BEGIN DATA 1, Q1, A1 1, Q2, A1 1, Q3, A4 1, Q3, A5 1, Q4, A1 1, Q5, A3 2, Q1, A1 2, Q2, A2 2, Q3, A1 2, Q5, A1 3, Q1, B1 3, Q2, B2 3, Q3, B3 3, Q4, B4a 3, Q4, B4b 3, Q4, B4c 3, Q5, B5 END DATA. DATASET NAME LongForm. LIST. * Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=INTERVIEW_ID QUESTION /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=QUESTION /MaxResp 'Max # of responses to this question, any interview' = MAX(NResp). LIST. * Count responses within questions, and create a variable . * identifying responses, rather than questions. . NUMERIC Resp# (F3). DO IF $CASENUM EQ 1 OR INTERVIEW_ID NE LAG(INTERVIEW_ID) OR QUESTION NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION) ,'.' ,LTRIM(STRING(Resp#,F3))). END IF. LIST. * Now the CASESTOVARS is straightforward. It can't be clicked up . * like this, though; the "/DROP" clause has to be added by hand . CASESTOVARS /ID = INTERVIEW_ID /DROP = QUESTION NResp MaxResp Resp# /INDEX = Response /GROUPBY = VARIABLE . LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Many thanks Richard,
I indeed had some doubts that CASESTOVARS could recognize multiple responses without some kind of datamap. The solution you propose works all right. Luca Il giorno 24/apr/2010, alle ore 01.19, Richard Ristow ha scritto:
|
| Free forum by Nabble | Edit this page |
