SPSSX Discussion

Casestovars simple question

Classic

List

Threaded

6 messages Options

Luca Meyer-3

Casestovars simple question

I am sure I have seen this solution somewhere before but I just can find it right now.

My dataset had the following structure:

INTERVIEW_ID, QUESTION, ANSWER
1, Q1, A1
1, Q2, A1
1, Q3, A4
1, Q3, A5
1, Q4, A1
1, Q5, A3
2, Q1, A1
2, Q2, A2
2, Q3, A1
2, Q5, A1
etc....

and I would like to restructure the data to one line for each interview. The specific dataset would have to become:

INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5
1, A1, A1, A4, A5, A1, A3
2, A1, A2, A1,,, A1

I am dealing with two particular aspects:
[1] I have some multiple response questions
[2] I have some missing values
and I would like to get a rectangular structure cases by variables.

Can someone suggest solutions or previous posting that can help me to solve this issue?

Thanks,
Luca

Luca Meyer
www.lucameyer.com
PASW Statistics v. 18.0.1 (13-nov-2009)
R version 2.9.2 (2009-08-24)
Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Casestovars simple question

see FILE TYPE MIXED under <help>
INTERVIEW_ID would identify the case and QUESTIONS would identify the
record within the case.

Art Kendall
Social Research Consultants

On 4/22/2010 9:07 AM, Luca Meyer wrote:

> I am sure I have seen this solution somewhere before but I just can find it right now.
>
> My dataset had the following structure:
>
> INTERVIEW_ID, QUESTION, ANSWER
> 1, Q1, A1
> 1, Q2, A1
> 1, Q3, A4
> 1, Q3, A5
> 1, Q4, A1
> 1, Q5, A3
> 2, Q1, A1
> 2, Q2, A2
> 2, Q3, A1
> 2, Q5, A1
> etc....
>
> and I would like to restructure the data to one line for each interview. The specific dataset would have to become:
>
> INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5
> 1, A1, A1, A4, A5, A1, A3
> 2, A1, A2, A1,,, A1
>
> I am dealing with two particular aspects:
> [1] I have some multiple response questions
> [2] I have some missing values
> and I would like to get a rectangular structure cases by variables.
>
> Can someone suggest solutions or previous posting that can help me to solve this issue?
>
> Thanks,
> Luca
>
> Luca Meyer
> www.lucameyer.com
> PASW Statistics v. 18.0.1 (13-nov-2009)
> R version 2.9.2 (2009-08-24)
> Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Luca Meyer-3

Re: Casestovars simple question

Thanks Art,

Since I already imported it on a SAV format wouldn't it be possible to use CASESTOVARS to restructure the file?

Luca

Il giorno 22/apr/2010, alle ore 17.54, Art Kendall ha scritto:

> see FILE TYPE MIXED under <help>
> INTERVIEW_ID would identify the case and QUESTIONS would identify the record within the case.
>
>
>
> Art Kendall
> Social Research Consultants
>
> On 4/22/2010 9:07 AM, Luca Meyer wrote:
>> I am sure I have seen this solution somewhere before but I just can find it right now.
>>
>> My dataset had the following structure:
>>
>> INTERVIEW_ID, QUESTION, ANSWER
>> 1, Q1, A1
>> 1, Q2, A1
>> 1, Q3, A4
>> 1, Q3, A5
>> 1, Q4, A1
>> 1, Q5, A3
>> 2, Q1, A1
>> 2, Q2, A2
>> 2, Q3, A1
>> 2, Q5, A1
>> etc....
>>
>> and I would like to restructure the data to one line for each interview. The specific dataset would have to become:
>>
>> INTERVIEW_ID, Q1, Q2, Q3.1, Q3.2, Q4, Q5
>> 1, A1, A1, A4, A5, A1, A3
>> 2, A1, A2, A1,,, A1
>>
>> I am dealing with two particular aspects:
>> [1] I have some multiple response questions
>> [2] I have some missing values
>> and I would like to get a rectangular structure cases by variables.
>>
>> Can someone suggest solutions or previous posting that can help me to solve this issue?
>>
>> Thanks,
>> Luca
>>
>> Luca Meyer
>> www.lucameyer.com
>> PASW Statistics v. 18.0.1 (13-nov-2009)
>> R version 2.9.2 (2009-08-24)
>> Mac OS X 10.6.3 (10D573) - kernel Darwin 10.3.0
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>

Bruce Weaver

Re: Casestovars simple question

Administrator

Luca Meyer-3 wrote

Thanks Art,

Since I already imported it on a SAV format wouldn't it be possible to use CASESTOVARS to restructure the file?

Luca

If you post your attempt(s) at using CASESTOVARS, perhaps someone will spot the problem.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Richard Ristow

Re: Casestovars simple question

In reply to this post by Luca Meyer-3

At 09:07 AM 4/22/2010, Luca Meyer wrote:

My dataset had the following structure:
|-----------------------------|---------------------------| |Output Created |23-APR-2010 18:31:41 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER 1 Q1 A1 1 Q2 A1 1 Q3 A4 1 Q3 A5 1 Q4 A1 1 Q5 A3 2 Q1 A1 2 Q2 A2 2 Q3 A1 2 Q5 A1 3 Q1 B1 3 Q2 B2 3 Q3 B3 3 Q4 B4a 3 Q4 B4b 3 Q4 B4c 3 Q5 B5 Number of cases read: 17 Number of cases listed: 17

I would like to restructure the data to one line for each interview. The specific dataset would have to become:

INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5 1 A1 A1 A4 A5 A1 A3 2 A1 A2 A1 A1 3 B1 B2 B3 B4a B4b B4c B5

I am dealing with two particular aspects:
[1] I have some multiple response questions
[2] I have some missing values
and I would like to get a rectangular structure cases by variables.

Your problem is that you don't have, at the start, which questions have multiple responses, so you don't have enough for CASESTOVARS to work on. The trick is a transformation program that uses AGGREGATE to identify which questions have multiple responses, and then creates a variable "Response" that identifies responses rather than questions: where a question has multiple responses, it has a different value for each response.

* Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=INTERVIEW_ID QUESTION /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=QUESTION /MaxResp 'Max # of responses to this question, any interview' = MAX(NResp). LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:36 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER NResp MaxResp 1 Q1 A1 1 1 1 Q2 A1 1 1 1 Q3 A4 2 2 1 Q3 A5 2 2 1 Q4 A1 1 3 1 Q5 A3 1 1 2 Q1 A1 1 1 2 Q2 A2 1 1 2 Q3 A1 1 2 2 Q5 A1 1 1 3 Q1 B1 1 1 3 Q2 B2 1 1 3 Q3 B3 1 2 3 Q4 B4a 3 3 3 Q4 B4b 3 3 3 Q4 B4c 3 3 3 Q5 B5 1 1 Number of cases read: 17 Number of cases listed: 17 * Count responses within questions, and create a variable . * identifying responses, rather than questions. . NUMERIC Resp# (F3). DO IF $CASENUM EQ 1 OR INTERVIEW_ID NE LAG(INTERVIEW_ID) OR QUESTION NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION) ,'.' ,LTRIM(STRING(Resp#,F3))). END IF. LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:36 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER NResp MaxResp Resp# Response 1 Q1 A1 1 1 1 Q1 1 Q2 A1 1 1 1 Q2 1 Q3 A4 2 2 1 Q3.1 1 Q3 A5 2 2 2 Q3.2 1 Q4 A1 1 3 1 Q4.1 1 Q5 A3 1 1 1 Q5 2 Q1 A1 1 1 1 Q1 2 Q2 A2 1 1 1 Q2 2 Q3 A1 1 2 1 Q3.1 2 Q5 A1 1 1 1 Q5 3 Q1 B1 1 1 1 Q1 3 Q2 B2 1 1 1 Q2 3 Q3 B3 1 2 1 Q3.1 3 Q4 B4a 3 3 1 Q4.1 3 Q4 B4b 3 3 2 Q4.2 3 Q4 B4c 3 3 3 Q4.3 3 Q5 B5 1 1 1 Q5 Number of cases read: 17 Number of cases listed: 17 * Now the CASESTOVARS is straightforward. It can't be clicked up . * like this, though; the "/DROP" clause has to be added by hand . CASESTOVARS /ID = INTERVIEW_ID /DROP = QUESTION NResp MaxResp Resp# /INDEX = Response /GROUPBY = VARIABLE . Cases to Variables |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:37 | |-----------------------------|---------------------------| [LongForm] Generated Variables |--------|--------|------| |Original|Response|Result| |Variable| |------| | | |Name | |--------|--------|------| |ANSWER |Q1 |Q1 | | |Q2 |Q2 | | |Q3.1 |Q3.1 | | |Q3.2 |Q3.2 | | |Q4.1 |Q4.1 | | |Q4.2 |Q4.2 | | |Q4.3 |Q4.3 | | |Q5 |Q5 | |--------|--------|------| Processing Statistics |---------------|---| |Cases In |17 | |Cases Out |3 | |---------------|---| |Cases In/Cases |5.7| |Out | | |---------------|---| |Variables In |7 | |Variables Out |9 | |---------------|---| |Index Values |8 | |---------------|---| LIST. List |-----------------------------|---------------------------| |Output Created |23-APR-2010 19:12:37 | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5 1 A1 A1 A4 A5 A1 A3 2 A1 A2 A1 A1 3 B1 B2 B3 B4a B4b B4c B5 Number of cases read: 3 Number of cases listed: 3=============================
APPENDIX: Test data, and code
=============================
(INTERVIEW_ID 3 added to originally posted test data)
DATA LIST LIST (", ")/ INTERVIEW_ID, QUESTION, ANSWER (F2, A2, A3). BEGIN DATA 1, Q1, A1 1, Q2, A1 1, Q3, A4 1, Q3, A5 1, Q4, A1 1, Q5, A3 2, Q1, A1 2, Q2, A2 2, Q3, A1 2, Q5, A1 3, Q1, B1 3, Q2, B2 3, Q3, B3 3, Q4, B4a 3, Q4, B4b 3, Q4, B4c 3, Q5, B5 END DATA. DATASET NAME LongForm. LIST. * Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=INTERVIEW_ID QUESTION /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=QUESTION /MaxResp 'Max # of responses to this question, any interview' = MAX(NResp). LIST. * Count responses within questions, and create a variable . * identifying responses, rather than questions. . NUMERIC Resp# (F3). DO IF $CASENUM EQ 1 OR INTERVIEW_ID NE LAG(INTERVIEW_ID) OR QUESTION NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION) ,'.' ,LTRIM(STRING(Resp#,F3))). END IF. LIST. * Now the CASESTOVARS is straightforward. It can't be clicked up . * like this, though; the "/DROP" clause has to be added by hand . CASESTOVARS /ID = INTERVIEW_ID /DROP = QUESTION NResp MaxResp Resp# /INDEX = Response /GROUPBY = VARIABLE . LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Luca Meyer-3

Re: Casestovars simple question

Many thanks Richard,

I indeed had some doubts that CASESTOVARS could recognize multiple responses without some kind of datamap.

The solution you propose works all right.

Luca

Il giorno 24/apr/2010, alle ore 01.19, Richard Ristow ha scritto:

At 09:07 AM 4/22/2010, Luca Meyer wrote:

My dataset had the following structure:
|-----------------------------|---------------------------| |Output Created               |23-APR-2010 18:31:41       | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER       1      Q1       A1       1      Q2       A1       1      Q3       A4       1      Q3       A5       1      Q4       A1       1      Q5       A3       2      Q1       A1       2      Q2       A2       2      Q3       A1       2      Q5       A1       3      Q1       B1       3      Q2       B2       3      Q3       B3       3      Q4       B4a       3      Q4       B4b       3      Q4       B4c       3      Q5       B5 Number of cases read: 17    Number of cases listed: 17

I would like to restructure the data to one line for each interview. The specific dataset would have to become:

INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5       1      A1 A1 A4   A5   A1             A3       2      A1 A2 A1                       A1       3      B1 B2 B3        B4a B4b B4c B5
I am dealing with two particular aspects:
[1] I have some multiple response questions
[2] I have some missing values
and I would like to get a rectangular structure cases by variables.

Your problem is that you don't have, at the start, which questions have multiple responses, so you don't have enough for CASESTOVARS to work on. The trick is a transformation program that uses AGGREGATE to identify which questions have multiple responses, and then creates a variable "Response" that identifies responses rather than questions: where a question has multiple responses, it has a different value for each response.

* Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES    /BREAK=INTERVIEW_ID QUESTION    /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES    /BREAK=QUESTION    /MaxResp 'Max # of responses to this question, any interview'    = MAX(NResp). LIST. List |-----------------------------|---------------------------| |Output Created               |23-APR-2010 19:12:36       | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER   NResp MaxResp       1      Q1       A1           1       1       1      Q2       A1           1       1       1      Q3       A4           2       2       1      Q3       A5           2       2       1      Q4       A1           1       3       1      Q5       A3           1       1       2      Q1       A1           1       1       2      Q2       A2           1       1       2      Q3       A1           1       2       2      Q5       A1           1       1       3      Q1       B1           1       1       3      Q2       B2           1       1       3      Q3       B3           1       2       3      Q4       B4a          3       3       3      Q4       B4b          3       3       3      Q4       B4c          3       3       3      Q5       B5           1       1 Number of cases read: 17    Number of cases listed: 17 * Count responses within questions, and create a variable           . * identifying responses, rather than questions.                     . NUMERIC Resp# (F3). DO IF       $CASENUM EQ 1         OR INTERVIEW_ID NE LAG(INTERVIEW_ID)         OR QUESTION     NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF    MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION)                             ,'.'                             ,LTRIM(STRING(Resp#,F3))). END IF. LIST. List |-----------------------------|---------------------------| |Output Created               |23-APR-2010 19:12:36       | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID QUESTION ANSWER   NResp MaxResp Resp# Response       1      Q1       A1           1       1    1 Q1       1      Q2       A1           1       1    1 Q2       1      Q3       A4           2       2    1 Q3.1       1      Q3       A5           2       2    2 Q3.2       1      Q4       A1           1       3    1 Q4.1       1      Q5       A3           1       1    1 Q5       2      Q1       A1           1       1    1 Q1       2      Q2       A2           1       1    1 Q2       2      Q3       A1           1       2    1 Q3.1       2      Q5       A1           1       1    1 Q5       3      Q1       B1           1       1    1 Q1       3      Q2       B2           1       1    1 Q2       3      Q3       B3           1       2    1 Q3.1       3      Q4       B4a          3       3    1 Q4.1       3      Q4       B4b          3       3    2 Q4.2       3      Q4       B4c          3       3    3 Q4.3       3      Q5       B5           1       1    1 Q5 Number of cases read: 17    Number of cases listed: 17 * Now the CASESTOVARS is straightforward. It can't be clicked up    . * like this, though; the "/DROP" clause has to be added by hand     . CASESTOVARS /ID      = INTERVIEW_ID /DROP    = QUESTION NResp MaxResp Resp# /INDEX   = Response /GROUPBY = VARIABLE . Cases to Variables |-----------------------------|---------------------------| |Output Created               |23-APR-2010 19:12:37       | |-----------------------------|---------------------------| [LongForm] Generated Variables |--------|--------|------| |Original|Response|Result| |Variable|        |------| |        |        |Name | |--------|--------|------| |ANSWER |Q1      |Q1    | |        |Q2      |Q2    | |        |Q3.1    |Q3.1 | |        |Q3.2    |Q3.2 | |        |Q4.1    |Q4.1 | |        |Q4.2    |Q4.2 | |        |Q4.3    |Q4.3 | |        |Q5      |Q5    | |--------|--------|------| Processing Statistics |---------------|---| |Cases In       |17 | |Cases Out      |3 | |---------------|---| |Cases In/Cases |5.7| |Out            |   | |---------------|---| |Variables In   |7 | |Variables Out |9 | |---------------|---| |Index Values   |8 | |---------------|---| LIST. List |-----------------------------|---------------------------| |Output Created               |23-APR-2010 19:12:37       | |-----------------------------|---------------------------| [LongForm] INTERVIEW_ID Q1 Q2 Q3.1 Q3.2 Q4.1 Q4.2 Q4.3 Q5       1      A1 A1 A4   A5   A1             A3       2      A1 A2 A1                       A1       3      B1 B2 B3        B4a B4b B4c B5 Number of cases read: 3    Number of cases listed: 3=============================
APPENDIX: Test data, and code
=============================
(INTERVIEW_ID 3 added to originally posted test data)
DATA LIST LIST (", ")/    INTERVIEW_ID, QUESTION, ANSWER    (F2,          A2,       A3). BEGIN DATA      1, Q1, A1    1, Q2, A1    1, Q3, A4    1, Q3, A5    1, Q4, A1    1, Q5, A3    2, Q1, A1    2, Q2, A2    2, Q3, A1    2, Q5, A1    3, Q1, B1    3, Q2, B2    3, Q3, B3    3, Q4, B4a    3, Q4, B4b    3, Q4, B4c    3, Q5, B5 END DATA. DATASET NAME LongForm. LIST. * Using AGGREGATE, identify which questions have multiple responses . AGGREGATE OUTFILE=* MODE=ADDVARIABLES    /BREAK=INTERVIEW_ID QUESTION    /NResp '# of responses to this question, this interview' = NU. AGGREGATE OUTFILE=* MODE=ADDVARIABLES    /BREAK=QUESTION    /MaxResp 'Max # of responses to this question, any interview'    = MAX(NResp). LIST. * Count responses within questions, and create a variable           . * identifying responses, rather than questions.                     . NUMERIC Resp# (F3). DO IF       $CASENUM EQ 1         OR INTERVIEW_ID NE LAG(INTERVIEW_ID)         OR QUESTION     NE LAG(QUESTION). . COMPUTE Resp# = 1. ELSE. . COMPUTE Resp# = LAG(Resp#) + 1. END IF. STRING Response (A5). DO IF    MaxResp EQ 1. . COMPUTE Response = QUESTION. ELSE. . COMPUTE Response = CONCAT(RTRIM(QUESTION)                             ,'.'                             ,LTRIM(STRING(Resp#,F3))). END IF.                             LIST. * Now the CASESTOVARS is straightforward. It can't be clicked up    . * like this, though; the "/DROP" clause has to be added by hand     . CASESTOVARS /ID      = INTERVIEW_ID /DROP    = QUESTION NResp MaxResp Resp# /INDEX   = Response /GROUPBY = VARIABLE . LIST.