SPSSX Discussion

Combining Cases into Single Row

Classic

List

Threaded

4 messages Options

Miriam Fultz, Ph.D.

Combining Cases into Single Row

Dear Listers:

I'm sure this is a truly simple problem, but alas, I am running out of
time...

I have a dataset in which there are multiple rows for some of the cases and
one row for the others. For the cases with multiple rows, there are only
actual data differences for just three variables: Score, percent, and
testtype. The variables are defined as follows:

1) Score = science achievement score in one of four content/grade level
tests

2) Percent = associate with the test score

3) Testtype = identifies whether the testtype is 1=physical sci/fourth
grade; 2=physical science/fifth grade; 3=earth science/fourth grade; 4=earth
science/fifth grade.

If a student completed two tests (which they legitimately could), then they
have two rows. About half of the cases took one test (so just one row). I
found several cases who completed three tests (three rows for those cases).

Here is an example of the data (there are also demographic variables that
come after ID1 and the values of these across the rows for a case are the
same):

Dist School V3 V4 V5 TESTTYPE
V7 Score Percent Lastname
Firstname ID1

3 18 32 5 3 4
1 9 37.50 Smith
John 0000002222

3 18 32 5 3 2
1 10 41.67 Smith
John 0000002222

3 18 32 5 3 4
1 13 54.17 Jones
Mary 0000003333

3 18 32 5 3 2
1 15 62.50 Jones
Mary 0000003333

3 18 38 5 3 2
1 6 25.00 Brown
Robert 0000004444

So, the first four lines represent two cases with two rows per case and the
last is a person who just took one of the four tests. All identifying
information is fictitious.

What I need is to have one row per case. It seems to me that I would need
to have four score variables instead of the one as well as four percent
variables.

Any suggestions would be most appreciated.

Spousta Jan

Re: Combining Cases into Single Row

Hi Miriam,

Try Data -> Restructure in the menu, then select Restructure selected cases into variables and follow the wizard. Do not forget to save the file before because it replaces the data.

Regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Miriam Fultz, Ph.D.
Sent: Monday, August 27, 2007 4:01 AM
To: [hidden email]
Subject: Combining Cases into Single Row

Dear Listers:

I'm sure this is a truly simple problem, but alas, I am running out of time...

I have a dataset in which there are multiple rows for some of the cases and one row for the others. For the cases with multiple rows, there are only actual data differences for just three variables: Score, percent, and testtype. The variables are defined as follows:

1) Score = science achievement score in one of four content/grade level tests

2) Percent = associate with the test score

3) Testtype = identifies whether the testtype is 1=physical sci/fourth grade; 2=physical science/fifth grade; 3=earth science/fourth grade; 4=earth science/fifth grade.

If a student completed two tests (which they legitimately could), then they have two rows. About half of the cases took one test (so just one row). I found several cases who completed three tests (three rows for those cases).

Here is an example of the data (there are also demographic variables that come after ID1 and the values of these across the rows for a case are the
same):

Dist School V3 V4 V5 TESTTYPE
V7 Score Percent Lastname
Firstname ID1

3 18 32 5 3 4
1 9 37.50 Smith
John 0000002222

3 18 32 5 3 2
1 10 41.67 Smith
John 0000002222

3 18 32 5 3 4
1 13 54.17 Jones
Mary 0000003333

3 18 32 5 3 2
1 15 62.50 Jones
Mary 0000003333

3 18 38 5 3 2
1 6 25.00 Brown
Robert 0000004444

So, the first four lines represent two cases with two rows per case and the last is a person who just took one of the four tests. All identifying information is fictitious.

What I need is to have one row per case. It seems to me that I would need to have four score variables instead of the one as well as four percent variables.

Any suggestions would be most appreciated.

_____

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

-.- --

Richard Ristow

Re: Combining Cases into Single Row

In reply to this post by Miriam Fultz, Ph.D.

At 10:00 PM 8/26/2007, Miriam Fultz, Ph.D. wrote:

>I have a dataset in which there are multiple rows for some of the
>cases and one row for the others. For the cases with multiple rows,
>there are only actual data differences for just three variables:
>Score, percent, and testtype. The variables are defined as follows:
>
>1) Score = science achievement score in one of four content/grade
>level tests
>
>2) Percent = associate with the test score
>
>3) Testtype = identifies whether the testtype is 1=physical sci/fourth
>grade; 2=physical science/fifth grade; 3=earth science/fourth grade;
>4=earth science/fifth grade.
>
>If a student completed two tests (which they legitimately could), then
>they have two rows. About half of the cases took one test (so just
>one row). I found several cases who completed three tests (three rows
>for those cases).
>
>Here is an example of the data (there are also demographic variables
>that
>come after ID1 and the values of these across the rows for a case are
>the
>same):

|-----------------------------|---------------------------|
|Output Created |27-AUG-2007 15:27:01 |
|-----------------------------|---------------------------|
Dist School V3 V4 V5 TEST V7 Score Percent Lastname
Firstname ID1

3 18 32 5 3 4 1 9 37.50
Smith John 0000002222
3 18 32 5 3 2 1 10 41.67
Smith John 0000002222
3 18 32 5 3 4 1 13 54.17
Jones Mary 0000003333
3 18 32 5 3 2 1 15 62.50
Jones Mary 0000003333
3 18 38 5 3 2 1 6 25.00
Brown Robert 0000004444
Number of cases read: 5 Number of cases listed: 5

>So, the first four lines represent two cases with two rows per case
>and the last is a person who just took one of the four tests.
>
>I need to have one row per case. It seems to me that I would need
>four score variables instead of the one as well as four percent
>variables.

Indeed, so. Below is SPSS draft output to get that result. (Jan's
suggestion is how you do it.) You'll find only 2 of each variable,
because that's all their are in the test data, and CASESTOVARS creates
only as many variables as needed for the data it sees. In your real
data, with up to four records, it would create the four sets.

Now, as a recurring theme: Are you sure you want to do this?
CASESTOVARS changes data from 'long' organization (separate records for
each logical instance in each case) to 'wide' (multiple logical
instances per record).

Think what you're going to do next, and perhaps post a follow-up. Many
manipulations and analyses are easier with data in 'long' form.

That's said, here's the syntax and output. I clicked up the syntax
entirely from the menus.

SORT CASES BY Dist School ID1 TEST .
CASESTOVARS
/ID = Dist School ID1
/INDEX = TEST
/GROUPBY = INDEX .

Notes
|----------------------------|---------------------------|
|Output Created |27-AUG-2007 15:27:01 |
|----------------------------|---------------------------|
Generated Variables
|--------|----|---------|
|Original|TEST|Result |
|Variable| |---------|
| | |Name |
|--------|----|---------|
|Score |2 |Score.2 |
| |4 |Score.4 |
|--------|----|---------|
|Percent |2 |Percent.2|
| |4 |Percent.4|
|--------|----|---------|

Processing Statistics
|---------------|---|
|Cases In |5 |
|Cases Out |3 |
|---------------|---|
|Cases In/Cases |1.7|
|Out | |
|---------------|---|
|Variables In |12 |
|Variables Out |13 |
|---------------|---|
|Index Values |2 |
|---------------|---|

TEMPORARY.
STRING SPACE (A5).
LIST DIST to Percent.4 SPACE.

List
|-----------------------------|---------------------------|
|Output Created |27-AUG-2007 15:27:01 |
|-----------------------------|---------------------------|
Di Sch Firstnam Scor Percent Scor
Percent
st ool ID1 V3 V4 V5 V7 Lastname
e e.2 .2 e.4 .4 SPACE

3 18 0000002222 32 5 3 1
Smith John 10 41.67 9 37.50
3 18 0000003333 32 5 3 1
Jones Mary 15 62.50 13 54.17
3 18 0000004444 38 5 3 1
Brown Robert 6 25.00 . .

Number of cases read: 3 Number of cases listed: 3

===================
APPENDIX: Test data
===================
DATA LIST LIST SKIP=2/
Dist School V3 V4 V5 TEST V7 Score Percent Lastname Firstname ID1
(F2 F3 3(F2) F2 F2 F4 F6.2 2(A8) N10).
BEGIN DATA
Dist School V3 V4 V5 TEST V7 Score Percent Lastname Firstname ID1
Type
3 18 32 5 3 4 1 9 37.50
Smith John 0000002222
3 18 32 5 3 2 1 10 41.67
Smith John 0000002222
3 18 32 5 3 4 1 13 54.17
Jones Mary 0000003333
3 18 32 5 3 2 1 15 62.50
Jones Mary 0000003333
3 18 38 5 3 2 1 6 25.00
Brown Robert 0000004444
END DATA.
FORMATS ID1 (N10).
LIST.

Richard Ristow

Re: Combining Cases into Single Row

At 01:49 PM 8/28/2007, Miriam Fultz, Ph.D. wrote, off-list (I'm
replying on-list, by custom):

>My immediate need is to determine the number of students who took both
>the pre and post tests (for reporting purposes). The client has
>defined "match" as a student who took the pretest and any one posttest
>(the desired number of posttests completed is two). What would you
>recommend in order to determine the total number of pre-post matches
>-- AGGREGATE?

Exactly. AGGREGATE is a great friend; see below.

>Ultimately, I will conduct an MLM (HLM) analysis (but depending upon
>the timing of the report, perhaps an ANCOVA per the desires of the
>client). If I leave the file in a long form (which would facilitate
>the MLM), I also want to know the number of have pre-one post; pre-two
>posttests; pre -- three posttest scores.

The following is SPSS 15 draft output (WRR: not saved separately). Test
data is as in previous post, except shortening some names and formats
to give shorter printed output lines. I don't know what marks a
pre-test vs. a post-test. The following takes test types 1 and 2 as
pre-test, and 3 and 4 as post-test; but that's easy to change.

|-----------------------------|---------------------------|
|Output Created |28-AUG-2007 16:34:15 |
|-----------------------------|---------------------------|
Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1

3 18 32 5 3 4 1 9 37.50 Smith John 0000002222
3 18 32 5 3 2 1 10 41.67 Smith John 0000002222
3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333
3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333
3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444

Number of cases read: 5 Number of cases listed: 5

TEMPORARY.
NUMERIC PreTest PostTest (F2).
IF ANY (TEST,1,2) PreTest = 1.
IF ANY (TEST,3,4) PostTest = 1.

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK = Dist School ID1
/NPre 'Number of pre-tests' = SUM(PreTest)
/NPost 'Number of post-tests' = SUM(PostTest).

RECODE NPre NPost (SYSMIS = 0).
FORMATS NPre NPost (F2).

TEMPORARY.
STRING SPACE (A10).
LIST Dist TO NPost SPACE.

List
|-----------------------------|---------------------------|
|Output Created |28-AUG-2007 16:37:27 |
|-----------------------------|---------------------------|
Di Sch TE Scor NP
st ool V3 V4 V5 ST V7 e Percent LName FName ID1 re NPost
SPACE

3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 1 1
3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 1 1
3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 1 1
3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 1 1
3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 1 0

Number of cases read: 5 Number of cases listed: 5

===================
APPENDIX: Test data
===================
(Identical to in previous post, except shortening names and formats to
give shorter printed output lines.)

DATA LIST LIST SKIP=2/
Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1
(F2 F3 3(F2) F2 F2 F4 F6.2 2(A6) N10).
BEGIN DATA
Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1
Type
3 18 32 5 3 4 1 9 37.50 Smith John 0000002222
3 18 32 5 3 2 1 10 41.67 Smith John 0000002222
3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333
3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333
3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444
END DATA.
FORMATS ID1 (N10).
LIST.