SPSSX Discussion

from many rows to one row

Classic

List

Threaded

10 messages Options

msherman

from many rows to one row

Dear list: I checked out the syntax examples at spsstools.net in order to take many rows of data for each participant and convert them into one row. The sample syntax files work fine but when I use my data they do not work. I believe the problem may have to do with missing values and I am not sure how to handle them.
I have the following variables.
ID, Sequencenumber, DV1 DV2. Highest sequence for any one subject is 5

The data looks something like this.
ID Sequencenumber DV1 DV2
1 1 10 8
1 2 . .
1 3 5 5
1 4 8 3
2 1 6 5
3 1 8 2
3 2 7 .
3 3 . 7
3 4 8 .
3 5 7 4
4 1 2 6
5 1 2 8
5 2 7 3

and I want to make it look like.

ID DV1_1 DV1_2 DV1_3 DV1_4 DV1_5 DV2_1 DV2_2 DV2_3 DV2_4 DV2_5
1 10 . 5 8 . 8 . 5 3 .
2 6 . . . . 5 . . . .
3 8 7 . 8 7 2 . 7 . 4

etc.

I welcome any advice. thanks, martin sherman

Judith Saebel

Re: from many rows to one row

Try the command casestovars (see below) or use the Restructure option
under the Data menu

data list fixed/id 1-2 Sequencenumber 4 DV1 6-7 DV2 9.

begin data
1 1 10 8
1 2
1 3 5 5
1 4 8 3
2 1 6 5
3 1 8 2
3 2 7
3 3 7
3 4 8
3 5 7 4
4 1 2 6
5 1 2 8
5 2 7 3
end data.

sort cases id Sequencenumber .
casestovars/
id = id/
index = Sequencenumber/
groupby = variable.

msherman

Re: from many rows to one row

The restructure command works perfectly. thank you. martin

>>> "Judith Saebel" <[hidden email]> 10/8/2007 9:41 PM >>>
Try the command casestovars (see below) or use the Restructure option
under the Data menu

data list fixed/id 1-2 Sequencenumber 4 DV1 6-7 DV2 9.

begin data
1 1 10 8
1 2
1 3 5 5
1 4 8 3
2 1 6 5
3 1 8 2
3 2 7
3 3 7
3 4 8
3 5 7 4
4 1 2 6
5 1 2 8
5 2 7 3
end data.

sort cases id Sequencenumber .
casestovars/
id = id/
index = Sequencenumber/
groupby = variable.

Richard Ristow

Re: from many rows to one row

In reply to this post by msherman

Sort of one of my soap-box speeches -

At 09:18 PM 10/8/2007, Martin Sherman wrote:

>I have the following variables.
>ID, Sequencenumber, DV1 DV2. Highest sequence for any one subject is
>5
>
>The data looks something like this.
>ID Sequencenumber DV1 DV2
>1 1 10 8
>1 2 . .
>1 3 5 5
>1 4 8 3
>2 1 6 5
>3 1 8 2
>3 2 7 .
>3 3 . 7
>3 4 8 .
>3 5 7 4
>4 1 2 6
>5 1 2 8
>5 2 7 3
>
>and I want to make it look like.
>
>ID DV1_1 DV1_2 DV1_3 DV1_4 DV1_5 DV2_1
>DV2_2 DV2_3 DV2_4 DV2_5
> 1 10 . 5 8
> . 8 . 5
> 3 .
> 2 6 . . .
> . 5 . .
> . .
> 3 8 7 . 8
> 7 2 . 7
> . 4

As you saw, Judith Saebel's advice to use CASESTOVARS, either directly
as syntax or via the Restructure wizard, is exactly what you want.

My advice or question: Do you really want to do this? For many
purposes, data in 'long' organization, as you have it, works better
than data in 'wide' organization, as it is after restructuring. It
depends on what you are going to do, so I won't go farther; but do give
one hard look whether the restructuring is a good idea.

-Best of regards,
Richard

Richard Ristow

Re: from many rows to one row

Responding to an off-list question, as it's probably of general
interest.

At 11:50 AM 10/10/2007, Elena Tuerk wrote:

>I've been struggling with this data management issue myself and had a
>question for you regarding your preference for "long" vs. "wide"
>organization. Can you still run paired samples t-tests for the same
>subject ID with a long organization? As it is, I've been restructuring
>data in order to get the format needed to run these analyses. I'd
>appreciate your perspective.

No, you need wide organization for a paired-sample t-test. That's true
for within-subjects factors in other procedures as well, including at
least GLM, MANOVA and GENLIN.

I preach long organization, but organization choice actually depends on
taste and circumstances. In yours, I'd consider long organization, with
CASESTOVARS for just the variables needed for analysis immediately
before running a procedure that requires wide organization. (But be
careful: CASESTOVARS irrevocably drops variables not included in the
restructured data. Make sure you're keeping a separate master copy of
the original data.)

Long organization is preferable or required for a few procedures -
notably MEANS, which I like for producing descriptive statistics. (But
don't do an ANOVA in MEANS if your BY variable is actually a
within-subjects factor.)

When you do data transformations with wide data, you commonly have to
do them separately for each variable that's part of a within-subjects
factor; repeating the code, or using a DO REPEAT, LOOP, or macro loop.
With long organization, you usually need only one set of transformation
code; the required loop is implicit in SPSS's own looping through the
cases.

However, if I were doing a lot of within-subjects analyses, I might
well go to wide organization and stay there. That's particularly true
if there are many variables that don't vary within subjects, and
there's not much need for looping over within-subjects variables in
transformation programs. (For example, sometimes you won't have to
transform those variables at all.)

Now - any help?

-Best wishes and best of luck,
Richard

Tuerk, Elena

Re: from many rows to one row

Richard,

Thanks for your comments, they were very helpful. I'm finding that it's most
efficient to maintain a long organization so as to only create one set of
variables for the items and one set of code (rather than write separate syntax
to score pre- and post- data) and then to convert the variables of interest
(e.g., scale scores) to pre- and post- for within-subject analyses. I've been
working with a relatively small sample, so I've been doing this
manually (which
is not a great idea, of course) because I've run into problems with the
CASETOVARS point-and-click menu.

To elaborate (and elicit help!), if I select the "participant id" as the
identifier variable and then the outcome variable "BDI Total Score" as the
index variable, with times of measurement identified by a third variable, I
keep running into errors. I would greatly appreciate any clarification so as
to make better use of this tool.

Best,

Elena

stace swayne

Inserting data

In reply to this post by Richard Ristow

Dear List,

I have 400 new ID numbers like the following (12644168). To fit the structure of my data, I want to insert the letter "A" in front of each ID number like the following (A12644168).

Can anyone, share syntax on how to do this.

Thanks,

Stace

---------------------------------
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.

Mark A Davenport MADAVENP

Re: Inserting data

This is not the most elegant way but it works.

The new string 'A12644168' will be called new_id.

I first create a new var called var01 which is all 'A's

STRING var01 (A1).
COMPUTE var01="A".
Then create the new id number.
STRING new_id (A9).
COMPUTE new_id=CONCAT(RTRIM(var01),var02).

You can then drop var01.
DELETE VAR var01.

Put it all together and it looks like this.
STRING var01 (A1).
COMPUTE var01="A".
STRING new_id (A9).
COMPUTE new_id=CONCAT(RTRIM(var01),var02).
DELETE VAR var01.

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)

stace swayne <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
10/10/2007 03:57 PM
Please respond to
stace swayne <[hidden email]>

To
[hidden email]
cc

Subject
Inserting data

Dear List,

I have 400 new ID numbers like the following (12644168). To fit the
structure of my data, I want to insert the letter "A" in front of each ID
number like the following (A12644168).

Can anyone, share syntax on how to do this.

Thanks,

Stace

---------------------------------
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail,
news, photos & more.

Richard Ristow

Re: from many rows to one row

In reply to this post by Tuerk, Elena

At 03:19 PM 10/10/2007, Elena Tuerk wrote:

>I'm finding that it's most efficient to maintain a long organization
>so as to only create one set of variables for the items and one set of
>code (rather than write separate syntax to score pre- and post- data)
>and then to convert the variables of interest (e.g., scale scores) to
>pre- and post- for within-subject analyses.

Good. To my mind, an excellent strategy.

>[But] I've run into problems with the CASETOVARS point-and-click menu.
>If I select the "participant id" as the identifier variable and then
>the outcome variable "BDI Total Score" as the index variable, with
>times of measurement identified by a third variable, I keep running
>into errors. I would greatly appreciate any clarification.

I'd feel surer of this if I tested it, but I think you've hit a
maddening quirk of the CASESTOVARS menu (or wizard).

The quirky panel is Step 2 of 5: "Cases to Variables: Select
Variables". It has a window for "Identifier Variable(s)" and one for
"Index Variable(s)"; but, where the dickens to you put the variables
you're actually transposing, like "BDI Total Score"? Answer: you OMIT
THEM ALTOGETHER.

That's far from intuitive. It's not comfortable to do, even if you know
it's the right answer. But, try it.

Now, "Index Variable(s)"? You should have a variable in your data that
marks whether a case is the pre-test or the post-test for that subject.
Use that as the index variable.

Tuerk, Elena

Re: from many rows to one row

This worked beautifully. Thanks so much for your help.

Richard Ristow wrote:

> At 03:19 PM 10/10/2007, Elena Tuerk wrote:
>
>> I'm finding that it's most efficient to maintain a long organization
>> so as to only create one set of variables for the items and one set
>> of code (rather than write separate syntax to score pre- and post-
>> data) and then to convert the variables of interest (e.g., scale
>> scores) to pre- and post- for within-subject analyses.
>
> Good. To my mind, an excellent strategy.
>
>> [But] I've run into problems with the CASETOVARS point-and-click
>> menu. If I select the "participant id" as the identifier variable and
>> then the outcome variable "BDI Total Score" as the index variable,
>> with times of measurement identified by a third variable, I keep
>> running into errors. I would greatly appreciate any clarification.
>
> I'd feel surer of this if I tested it, but I think you've hit a
> maddening quirk of the CASESTOVARS menu (or wizard).
>
> The quirky panel is Step 2 of 5: "Cases to Variables: Select
> Variables". It has a window for "Identifier Variable(s)" and one for
> "Index Variable(s)"; but, where the dickens to you put the variables
> you're actually transposing, like "BDI Total Score"? Answer: you OMIT
> THEM ALTOGETHER.
>
> That's far from intuitive. It's not comfortable to do, even if you
> know it's the right answer. But, try it.
>
> Now, "Index Variable(s)"? You should have a variable in your data that
> marks whether a case is the pre-test or the post-test for that
> subject. Use that as the index variable.