SPSSX Discussion

A simple question

Classic

List

Threaded

24 messages Options

Susan Elgie

A simple question

Hi All,
I seldom do data manipulation, and am faced with a quandry and not much
time.
I have:

V1 V2 V3 V4
1.00 sysmis 2.00 3.00
4.00 sysmis sysmis 5.00
6.00 sysmis 7.00 sysmis
8.00 9.00 1.00 2.00

I need to move these over so that I have:
1.00 2.00 3.00 sysmis
4.00 5.00 sysmis sysmis
6.00 7.00 sysmis sysmis
8.00 9.00 1.00 2.00

In other words, move all the valid values into the beginning of the series,
and leave the missing at the end.

Can you help? Thanks!

Susan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Katkowski, David

Re: A simple question

This should work for only four variables. If you have more, it would be best to create a macro, or, better yes, wrap some Python around it.

do if sysmis(V1).
do if ~sysmis(V2).
compute V1=V2.
compute V2=$sysmis.
else if ~sysmis(V3).
compute V1=V3.
compute V3=$sysmis.
else if ~sysmis(V4).
compute V1=V4.
compute V4=$sysmis.
end if.
end if.

do if sysmis(V2).
do if ~sysmis(V3).
compute V2=V3.
compute V3=$sysmis.
else if ~sysmis(V4).
compute V2=V4.
compute V4=$sysmis.
end if.
end if.

do if sysmis(V3).
do if ~sysmis(V4).
compute V3=V4.
compute V4=$sysmis.
end if.
end if.
exe.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Susan Elgie
Sent: Wednesday, May 07, 2008 10:18 AM
To: [hidden email]
Subject: A simple question

Hi All,
I seldom do data manipulation, and am faced with a quandry and not much
time.
I have:

V1 V2 V3 V4
1.00 sysmis 2.00 3.00
4.00 sysmis sysmis 5.00
6.00 sysmis 7.00 sysmis
8.00 9.00 1.00 2.00

I need to move these over so that I have:
1.00 2.00 3.00 sysmis
4.00 5.00 sysmis sysmis
6.00 7.00 sysmis sysmis
8.00 9.00 1.00 2.00

In other words, move all the valid values into the beginning of the series,
and leave the missing at the end.

Can you help? Thanks!

Susan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Susan Elgie

Re: A simple question

Thanks David. Yes I can see doing that with 4 variables, but I have several
series of more than 4 in a longitudinal study, a situation which I failed to
make clear in the original posting. I struggled with vectors and did not
quite get there although I thought the way might lie in that direction.
Don't think I can manage Python.

More ideas?

Thanks again, Susan

On Wed, May 7, 2008 at 1:40 PM, Katkowski, David <
[hidden email]> wrote:

> This should work for only four variables. If you have more, it would be
> best to create a macro, or, better yes, wrap some Python around it.
>
> do if sysmis(V1).
> do if ~sysmis(V2).
> compute V1=V2.
> compute V2=$sysmis.
> else if ~sysmis(V3).
> compute V1=V3.
> compute V3=$sysmis.
> else if ~sysmis(V4).
> compute V1=V4.
> compute V4=$sysmis.
> end if.
> end if.
>
> do if sysmis(V2).
> do if ~sysmis(V3).
> compute V2=V3.
> compute V3=$sysmis.
> else if ~sysmis(V4).
> compute V2=V4.
> compute V4=$sysmis.
> end if.
> end if.
>
> do if sysmis(V3).
> do if ~sysmis(V4).
> compute V3=V4.
> compute V4=$sysmis.
> end if.
> end if.
> exe.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Susan Elgie
> Sent: Wednesday, May 07, 2008 10:18 AM
> To: [hidden email]
> Subject: A simple question
>
> Hi All,
> I seldom do data manipulation, and am faced with a quandry and not much
> time.
> I have:
>
> V1 V2 V3 V4
> 1.00 sysmis 2.00 3.00
> 4.00 sysmis sysmis 5.00
> 6.00 sysmis 7.00 sysmis
> 8.00 9.00 1.00 2.00
>
> I need to move these over so that I have:
> 1.00 2.00 3.00 sysmis
> 4.00 5.00 sysmis sysmis
> 6.00 7.00 sysmis sysmis
> 8.00 9.00 1.00 2.00
>
> In other words, move all the valid values into the beginning of the
> series,
> and leave the missing at the end.
>
> Can you help? Thanks!
>
> Susan
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008
> 7:46 AM
>
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008
> 7:46 AM
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: A simple question

So the simple question was too simple, eh?

Try this:

vector new(4).
vector old= v1 to v4.
compute #i=1.
compute #j=1.
loop if #j< 5.
do if not(sysmis(old(#i))).
compute new(#j) = old(#i).
compute #j=#j+1.
end if.
compute #i=#i+1.
end loop.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Susan Elgie
Sent: Wednesday, May 07, 2008 12:36 PM
To: [hidden email]
Subject: Re: A simple question

Thanks David. Yes I can see doing that with 4 variables, but I have several
series of more than 4 in a longitudinal study, a situation which I failed to
make clear in the original posting. I struggled with vectors and did not
quite get there although I thought the way might lie in that direction.
Don't think I can manage Python.

More ideas?

Thanks again, Susan

On Wed, May 7, 2008 at 1:40 PM, Katkowski, David <
[hidden email]> wrote:

Susan Elgie

Re: A simple question

ViAnn,
That did it! It gave a nasty-looking out of range error message, but the
proof is in the pudding, and the numbers were in the right place. Thank you
immensely.

Thank you also to David. I would be interested in seeing the Python
solution, but will probably use this one for now since I am in quite a
hurry. But there is a longer term to this study as well, and it would be
good to learn Python. (longitudinal studies are almost infinite aren't
they?)

Have good days everybody,

Susan

On Wed, May 7, 2008 at 3:06 PM, ViAnn Beadle <[hidden email]> wrote:

> So the simple question was too simple, eh?
>
> Try this:
>
> vector new(4).
> vector old= v1 to v4.
> compute #i=1.
> compute #j=1.
> loop if #j< 5.
> do if not(sysmis(old(#i))).
> compute new(#j) = old(#i).
> compute #j=#j+1.
> end if.
> compute #i=#i+1.
> end loop.
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Susan Elgie
> Sent: Wednesday, May 07, 2008 12:36 PM
> To: [hidden email]
> Subject: Re: A simple question
>
> Thanks David. Yes I can see doing that with 4 variables, but I have
> several
> series of more than 4 in a longitudinal study, a situation which I failed
> to
> make clear in the original posting. I struggled with vectors and did not
> quite get there although I thought the way might lie in that direction.
> Don't think I can manage Python.
>
> More ideas?
>
> Thanks again, Susan
>
> On Wed, May 7, 2008 at 1:40 PM, Katkowski, David <
> [hidden email]> wrote:
>
> > This should work for only four variables. If you have more, it would be
> > best to create a macro, or, better yes, wrap some Python around it.
> >
> > do if sysmis(V1).
> > do if ~sysmis(V2).
> > compute V1=V2.
> > compute V2=$sysmis.
> > else if ~sysmis(V3).
> > compute V1=V3.
> > compute V3=$sysmis.
> > else if ~sysmis(V4).
> > compute V1=V4.
> > compute V4=$sysmis.
> > end if.
> > end if.
> >
> > do if sysmis(V2).
> > do if ~sysmis(V3).
> > compute V2=V3.
> > compute V3=$sysmis.
> > else if ~sysmis(V4).
> > compute V2=V4.
> > compute V4=$sysmis.
> > end if.
> > end if.
> >
> > do if sysmis(V3).
> > do if ~sysmis(V4).
> > compute V3=V4.
> > compute V4=$sysmis.
> > end if.
> > end if.
> > exe.
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> > Susan Elgie
> > Sent: Wednesday, May 07, 2008 10:18 AM
> > To: [hidden email]
> > Subject: A simple question
> >
> > Hi All,
> > I seldom do data manipulation, and am faced with a quandry and not much
> > time.
> > I have:
> >
> > V1 V2 V3 V4
> > 1.00 sysmis 2.00 3.00
> > 4.00 sysmis sysmis 5.00
> > 6.00 sysmis 7.00 sysmis
> > 8.00 9.00 1.00 2.00
> >
> > I need to move these over so that I have:
> > 1.00 2.00 3.00 sysmis
> > 4.00 5.00 sysmis sysmis
> > 6.00 7.00 sysmis sysmis
> > 8.00 9.00 1.00 2.00
> >
> > In other words, move all the valid values into the beginning of the
> > series,
> > and leave the missing at the end.
> >
> > Can you help? Thanks!
> >
> > Susan
> >
>
>

ViAnn Beadle

Re: A simple question

I think that a wide data structure is probably not the best way to maintain
your data in a longitudinal study, depending on the kind of analyses you are
doing. Generally speaking narrow is always easiest to work with. This is why
time series data is always organized so that cases are time points and
columns are variables being measured.

From: Susan Elgie [mailto:[hidden email]]
Sent: Wednesday, May 07, 2008 1:36 PM
To: ViAnn Beadle
Cc: [hidden email]
Subject: Re: A simple question

ViAnn,
That did it! It gave a nasty-looking out of range error message, but the
proof is in the pudding, and the numbers were in the right place. Thank you
immensely.

Thank you also to David. I would be interested in seeing the Python
solution, but will probably use this one for now since I am in quite a
hurry. But there is a longer term to this study as well, and it would be
good to learn Python. (longitudinal studies are almost infinite aren't
they?)

Have good days everybody,

Susan

On Wed, May 7, 2008 at 3:06 PM, ViAnn Beadle <[hidden email]> wrote:

So the simple question was too simple, eh?

Try this:

vector new(4).
vector old= v1 to v4.
compute #i=1.
compute #j=1.
loop if #j< 5.
do if not(sysmis(old(#i))).
compute new(#j) = old(#i).
compute #j=#j+1.
end if.
compute #i=#i+1.
end loop.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Susan Elgie

Sent: Wednesday, May 07, 2008 12:36 PM
To: [hidden email]

Subject: Re: A simple question

Thanks David. Yes I can see doing that with 4 variables, but I have several
series of more than 4 in a longitudinal study, a situation which I failed to
make clear in the original posting. I struggled with vectors and did not
quite get there although I thought the way might lie in that direction.
Don't think I can manage Python.

More ideas?

Thanks again, Susan

On Wed, May 7, 2008 at 1:40 PM, Katkowski, David <
[hidden email]> wrote:

Richard Ristow

Re: A simple question

In reply to this post by Susan Elgie

At 10:17 AM 5/7/2008, Susan Elgie wrote:

>I have:
|-----------------------------|---------------------------|
|Output Created |07-MAY-2008 15:47:16 |
|-----------------------------|---------------------------|
[Spacey]

V1 V2 V3 V4

1.00 . 2.00 3.00
4.00 . . 5.00
6.00 . 7.00 .
8.00 9.00 1.00 2.00

Number of cases read: 4 Number of cases listed: 4

>I need to move these over so that I have:
>1.00 2.00 3.00 sysmis
>4.00 5.00 sysmis sysmis
>6.00 7.00 sysmis sysmis
>8.00 9.00 1.00 2.00

As ViAnn Beadle writes, VECTOR/LOOP logic is good. It doesn't need a
separate "new" vector, though; see the following. Variable "#Hold_It"
is needed so the source value may be erased (made SYSMIS) without
losing it. (Assigning the source value to the destination value
directly and then erasing the source value would lose the value, when
source and destination are the same.)

VECTOR datum=v1 TO v4.
COMPUTE #From = 1.
COMPUTE #To = 1.
LOOP #From = 1 TO 4.
. DO IF NOT MISSING(datum(#From)).
. COMPUTE #Hold_It = datum(#From).
. COMPUTE datum(#From) = $SYSMIS.
. COMPUTE datum(#To) = #Hold_It.
. COMPUTE #To = #To + 1.
. END IF.
END LOOP.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |07-MAY-2008 16:08:15 |
|-----------------------------|---------------------------|
V1 V2 V3 V4

1.00 2.00 3.00 .
4.00 5.00 . .
6.00 7.00 . .
8.00 9.00 1.00 2.00

Number of cases read: 4 Number of cases listed: 4

(ViAnn's solution also works, but the loop doesn't terminate until
MXLOOPS if the last value in the "old" list is missing.)
===================
APPENDIX: Test data
===================
DATA LIST LIST/
V1 V2 V3 V4
(4F6.2).
* Copious warning messages: .
BEGIN DATA
1.00 sysmis 2.00 3.00
4.00 sysmis sysmis 5.00
6.00 sysmis 7.00 sysmis
8.00 9.00 1.00 2.00
END DATA.
DATASET NAME Spacey WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Zdaniuk, Bozena-2

collinearity and stepwise regression

In reply to this post by Susan Elgie

Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time?
Thanks in advance for any thoughts on that.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

RD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Swank, Paul R

Re: collinearity and stepwise regression

The stepwise process itself is sensitive to data problems, including
multicollinearity.

Paul R. Swank, Ph.D.
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center - Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Zdaniuk, Bozena
Sent: Wednesday, May 07, 2008 3:33 PM
To: [hidden email]
Subject: collinearity and stepwise regression

Hello, everybody. Would collinearity (not a severe one) be less of a
problem in a stepwise regression, since the variables are entered one at
a time?
Thanks in advance for any thoughts on that.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

RD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mark A Davenport MADAVENP

Re: collinearity and stepwise regression

In reply to this post by Zdaniuk, Bozena-2

Stepwise comes with additional baggage that dwarfs any problem you might
have with collinearity. DON'T USE IT.

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)

"Zdaniuk, Bozena" <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
05/07/2008 04:33 PM
Please respond to
"Zdaniuk, Bozena" <[hidden email]>

To
[hidden email]
cc

Subject
collinearity and stepwise regression

Hello, everybody. Would collinearity (not a severe one) be less of a
problem in a stepwise regression, since the variables are entered one at a
time?
Thanks in advance for any thoughts on that.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

RD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: collinearity and stepwise regression

In reply to this post by Zdaniuk, Bozena-2

I don't know that I would agree with never using stepwise methods, at least in an exploratory fashion. But no doubt, you can't take the results at face value.

However, stepwise does not help with high multi-collinearity, because in that situation you are dealing with two or more almost equivalent variables from the stepping point of view, and choosing one over another is nearly arbitrary and sensitive to very small differences. Forward and backward stepping helps a little, but it only really works if you don't care about the model.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Zdaniuk, Bozena
Sent: Wednesday, May 07, 2008 2:33 PM
To: [hidden email]
Subject: [SPSSX-L] collinearity and stepwise regression

Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time?
Thanks in advance for any thoughts on that.
Bozena

Bozena Zdaniuk, Ph.D.
University of Pittsburgh
UCSUR, 6th Fl.
121 University Place
Pittsburgh, PA 15260
Ph.: 412-624-5736
Fax: 412-624-4810
Email: [hidden email]

RD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Katkowski, David

Re: A simple question

In reply to this post by Richard Ristow

Can also use Python. Just specify the "highv" variable as the largest value for the set of V(some number) variables.

BEGIN PROGRAM.
import spss

spss.Submit(r"get file 'c:/test.sav'.")

lowv = 1
highv = 4
syntax = ""

for i in range(lowv,highv):
syntax = syntax + "do if (sysmis(V%s)).\n" %(i)
syntax = syntax + """do if (~sysmis(V%(next)s)).
compute V%(cur)s=V%(next)s.
compute V%(next)s=$sysmis.\n""" %{'cur':i, 'next':i+1}
for j in range(i+2,highv+1):
syntax = syntax + """else if (~sysmis(V%(next)s)).
compute V%(cur)s=V%(next)s.
compute V%(next)s=$sysmis.\n""" %{'cur':i, 'curmod':i+2, 'next':j}
if j==highv:
syntax = syntax + "end if.\nend if.\n"

syntax = syntax + 'end if.\nend if.\nexe.'

print syntax
spss.Submit(syntax)
END PROGRAM.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, May 07, 2008 4:13 PM
To: [hidden email]
Subject: Re: A simple question

At 10:17 AM 5/7/2008, Susan Elgie wrote:

>I have:
|-----------------------------|---------------------------|
|Output Created |07-MAY-2008 15:47:16 |
|-----------------------------|---------------------------|
[Spacey]

V1 V2 V3 V4

1.00 . 2.00 3.00
4.00 . . 5.00
6.00 . 7.00 .
8.00 9.00 1.00 2.00

Number of cases read: 4 Number of cases listed: 4

>I need to move these over so that I have:
>1.00 2.00 3.00 sysmis
>4.00 5.00 sysmis sysmis
>6.00 7.00 sysmis sysmis
>8.00 9.00 1.00 2.00

As ViAnn Beadle writes, VECTOR/LOOP logic is good. It doesn't need a
separate "new" vector, though; see the following. Variable "#Hold_It"
is needed so the source value may be erased (made SYSMIS) without
losing it. (Assigning the source value to the destination value
directly and then erasing the source value would lose the value, when
source and destination are the same.)

VECTOR datum=v1 TO v4.
COMPUTE #From = 1.
COMPUTE #To = 1.
LOOP #From = 1 TO 4.
. DO IF NOT MISSING(datum(#From)).
. COMPUTE #Hold_It = datum(#From).
. COMPUTE datum(#From) = $SYSMIS.
. COMPUTE datum(#To) = #Hold_It.
. COMPUTE #To = #To + 1.
. END IF.
END LOOP.
LIST.

List
|-----------------------------|---------------------------|
|Output Created |07-MAY-2008 16:08:15 |
|-----------------------------|---------------------------|
V1 V2 V3 V4

1.00 2.00 3.00 .
4.00 5.00 . .
6.00 7.00 . .
8.00 9.00 1.00 2.00

Number of cases read: 4 Number of cases listed: 4

(ViAnn's solution also works, but the loop doesn't terminate until
MXLOOPS if the last value in the "old" list is missing.)
===================
APPENDIX: Test data
===================
DATA LIST LIST/
V1 V2 V3 V4
(4F6.2).
* Copious warning messages: .
BEGIN DATA
1.00 sysmis 2.00 3.00
4.00 sysmis sysmis 5.00
6.00 sysmis 7.00 sysmis
8.00 9.00 1.00 2.00
END DATA.
DATASET NAME Spacey WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

SR Millis-3

Re: collinearity and stepwise regression

In reply to this post by Peck, Jon

With the availability of methods like LAR/LASSO, Bayesian model averaging, and penalized maximum likelihood methods---to name a few---I cannot think of any reason to use stepwise methods, even in any exploratory context. Of course, SPSS needs to play catch-up with R/S-Plus, Stata, and SAS in making these newer and better variable selection techniques available in SPSS.

Scott Millis

--- On Wed, 5/7/08, Peck, Jon <[hidden email]> wrote:

> I don't know that I would agree with never using
> stepwise methods, at least in an exploratory fashion. But
> no doubt, you can't take the results at face value.
>
> Jon Peck

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kooij, A.J. van der

Re: collinearity and stepwise regression

SPSS (the Leiden branch) is working on that. In Release 17, Ridge regression, the Lasso, and the Elastic Net will be CATREG options. Also, options are added for estimation of prediction error and model selection (.632 bootstrap and cross-validation).
CATREG performs linear regression (when applying numeric optimal scaling level for all variables) and nonlinear regression (apply scaling levels other than numeric: nominal, non-monotic spline, ordinal, and monotonic spline).
So, in CATREG 17 penalized regression is available for both linear AND NONLINEAR regression.

Regards,

Anita van der Kooij
Data Theory Group
Leiden University

________________________________

From: SPSSX(r) Discussion on behalf of SR Millis
Sent: Wed 07/05/2008 23:55
To: [hidden email]
Subject: Re: collinearity and stepwise regression

With the availability of methods like LAR/LASSO, Bayesian model averaging, and penalized maximum likelihood methods---to name a few---I cannot think of any reason to use stepwise methods, even in any exploratory context. Of course, SPSS needs to play catch-up with R/S-Plus, Stata, and SAS in making these newer and better variable selection techniques available in SPSS.

Scott Millis

--- On Wed, 5/7/08, Peck, Jon <[hidden email]> wrote:

> I don't know that I would agree with never using
> stepwise methods, at least in an exploratory fashion. But
> no doubt, you can't take the results at face value.
>
> Jon Peck

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: A simple question

In reply to this post by Katkowski, David

At 05:14 PM 5/7/2008, Katkowski, David wrote:

>Can also use Python.

Right. And the posted Python code works.

There are various ways of using Python in SPSS. This one uses Python
as a macro processor analogous to SPSS's own macros, writing the loop
as 'unrolled' code:

do if (sysmis(V1)).
do if (~sysmis(V2)).
compute V1=V2.
compute V2=$sysmis.
else if (~sysmis(V3)).
compute V1=V3.
compute V3=$sysmis.
else if (~sysmis(V4)).
compute V1=V4.
compute V4=$sysmis.
end if.
end if.
do if (sysmis(V2)).
do if (~sysmis(V3)).
compute V2=V3.
compute V3=$sysmis.
else if (~sysmis(V4)).
compute V2=V4.
compute V4=$sysmis.
end if.
end if.
do if (sysmis(V3)).
do if (~sysmis(V4)).
compute V3=V4.
compute V4=$sysmis.
end if.
end if.

One's preferred style in programming is very much a taste, and *de
gustibus non disputandum est*. This is a comment from taste. I grant
that I'm notorious for preferring native SPSS code, and that I've
been twitted about it, with some justice

However, my taste wouldn't be to use Python this way, for the same
reason I wouldn't use a macro loop. In either case, you've got two
different languages to read: the macro language (DEFINE or Python)
and the target language. I think it's considerable less clear.

And a good macro processor (Python is excellent) makes it too easy to
write complicated code. This Python logic has nested loops, emitting
one block for each *target* variable; and within each of those, three
lines for each *later source* variable. Code length goes roughly as
the square of the number of variables processed.

So, I'd stick with native VECTOR/LOOP logic, as less, simpler, and
cleaner code.

End of comments about taste. The Python code *does* work, just fine.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Juanito Talili

factor loadings vs item-total correlation

In reply to this post by SR Millis-3

One domain in a self-administered questionnaire has seven items which were quantified using 5-point Likert scale. This domain was subjected to one-factor CFA and found that the factor loadings of the seven items are statistically significant.

Using the same data (data used in the CFA), the item-total correlation was computed for each item and found that the coefficients are close to 1.0 (ranging from 0.7 to 0.9).

Do the item-total correlations validate the factor loadings? Or, are they two different things with different uses? Please comment.

Thank you.

Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

SR Millis-3

Re: collinearity and stepwise regression

In reply to this post by Zdaniuk, Bozena-2

Rather than solving problems caused by collinearity, when using a stepwise method, variable selection is made arbitrarily by collinearity. See Frank Harrell's book, "Regression modeling strategies."

Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email: [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682

--- On Wed, 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote:

> From: Zdaniuk, Bozena <[hidden email]>
> Subject: collinearity and stepwise regression
> To: [hidden email]
> Date: Wednesday, May 7, 2008, 4:33 PM
> Hello, everybody. Would collinearity (not a severe one) be
> less of a problem in a stepwise regression, since the
> variables are entered one at a time?
> Thanks in advance for any thoughts on that.
> Bozena
>
> Bozena Zdaniuk, Ph.D.
> University of Pittsburgh
> UCSUR, 6th Fl.
> 121 University Place
> Pittsburgh, PA 15260
> Ph.: 412-624-5736
> Fax: 412-624-4810
> Email: [hidden email]
>
>
> RD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

Alexander J. Shackman-2

Re: collinearity and stepwise regression

how severe is the collinearity, exactly? if extreme, collinear variables
could be collapsed using either pca/factor-analysis, or by taking the mean
of z-transformed vars

hth, alex

On Thu, May 8, 2008 at 10:42 AM, SR Millis <[hidden email]> wrote:

> Rather than solving problems caused by collinearity, when using a stepwise
> method, variable selection is made arbitrarily by collinearity. See Frank
> Harrell's book, "Regression modeling strategies."
>
>
> Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat
> Professor & Director of Research
> Dept of Physical Medicine & Rehabilitation
> Wayne State University School of Medicine
> 261 Mack Blvd
> Detroit, MI 48201
> Email: [hidden email]
> Tel: 313-993-8085
> Fax: 313-966-7682
>
>
> --- On Wed, 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote:
>
> > From: Zdaniuk, Bozena <[hidden email]>
> > Subject: collinearity and stepwise regression
> > To: [hidden email]
> > Date: Wednesday, May 7, 2008, 4:33 PM
> > Hello, everybody. Would collinearity (not a severe one) be
> > less of a problem in a stepwise regression, since the
> > variables are entered one at a time?
> > Thanks in advance for any thoughts on that.
> > Bozena
> >
> > Bozena Zdaniuk, Ph.D.
> > University of Pittsburgh
> > UCSUR, 6th Fl.
> > 121 University Place
> > Pittsburgh, PA 15260
> > Ph.: 412-624-5736
> > Fax: 412-624-4810
> > Email: [hidden email]
> >
> >
> > RD
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email] (not to SPSSX-L), with no body
> > text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the
> > command
> > INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Ornelas, Fermin-2

Re: collinearity and stepwise regression

I think the original question mentioned that collinearity was not severe. Having said that, if the number of variables was not very large, I suggest to proceed first to reduce it to a satisfactory level, i.e. variance proportion coefficients le .5, condition index LT 30 and VIF < 10. After satisfying this criteria then proceed to final model selection. There is another point to consider that if collinearity is not degrading and if the purpose of the model is prediction then the model should be fine. We know that if collinearity is severe then hypotheses testing are seriously questionable.

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
1789 W. Jefferson Street
Phoenix, AZ 85007
Tel: (602) 542-5639
E-mail: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Alexander J. Shackman
Sent: Thursday, May 08, 2008 9:36 AM
To: [hidden email]
Subject: Re: collinearity and stepwise regression

how severe is the collinearity, exactly? if extreme, collinear variables
could be collapsed using either pca/factor-analysis, or by taking the mean
of z-transformed vars

hth, alex

On Thu, May 8, 2008 at 10:42 AM, SR Millis <[hidden email]> wrote:

--
Alexander J. Shackman
Laboratory for Affective Neuroscience
Waisman Laboratory for Brain Imaging & Behavior
University of Wisconsin-Madison
1202 West Johnson Street
Madison, Wisconsin 53706

Telephone: +1 (608) 358-5025
FAX: +1 (608) 265-2875
EMAIL: [hidden email]
http://psyphz.psych.wisc.edu/~shackman

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Gregory Hildebrandt

Re: collinearity and stepwise regression

In reply to this post by Zdaniuk, Bozena-2

Bozena,

Periodically, there are discussions of multicollinearity by the SPSS group,
and I have always been meaning to chime in. I did have an e-mail discussion
with Jon Peck of SPSS who frequently contributes to these discussions, and I
indicated that I thought multicollinearity was the biggest challenge in
mutivariate regression analysis. He disagreed, and indicated that causality
is the biggest challenge. I might have replied that a major reason
why causality is so challenging is because of multicollinearity.

With respect to the specific question about using stepwise regression to
deal with multicollinearity, I believe that mutilicollinearity is precisely
the reason one shouldn't use stepwise regression. If two variables are each
highly correlated either directly using the simple correlation
coefficient, or correlated in a more complex fashion that would be
identified uusing Variance Inflation Factor, = 1/(1 - Rk_sq), where Rk)sq is
the Rk-sq obtained when the kth independent variable is regressed on all the
others, then stepwise regression can create problems. In stepwise
regression, the first of two highly correlated variables may be entered into
the regression using the stepwise procedure under the default or a
specified entry rule, but it's inclusion can block the second variable from
entering. If both are deemed theoretically important, say, the effect of
both income and wealth on consumption, than it is likely that stepwise
regression will first enter income. Then other variables in the model will
come in using stepwise regression. Wealth however, will likely be
excluded.from the regression.

Personally, I believe the best approach is specify the model as carefully
as possible (I reveal my background in economics), using theory, which in
pragmatic terms can viewed as a good story, and use this prior knowledge
to formulate the model. I believe the t tests should be given more weight
than multicollinearity indicators such as the VIF, when revising the initial
specification. For example, I have a student writing a paper who has two
variables in a model with VIFs of around 37 or 38, yet both variables are
statistically significant. Not well known is the fact that SPSS has a
default tolerance level (the reciprocal of the VIF) in SPSS, which is used
to exclude variables equal to .001. Yet, I have encountered situations in
which a statistically significant variable was excluded because it didn't
meet the tolerance criterion. I found this surprising, but, unfortunately,
didn't save the results.

I believe the best summary statement on multicollinearity was by Jon Peck in
the SPSS discussion group. He said: "There is no rule that tells you
whether the correlations are too high (if less than perfect). The
collinearity indications just tells you why your estimated variances are so
high,"

Consider the t-statistic for the kh variable, which may have a high VIF in a
multiple regression analysis. For a two tail test in which the null
hypothesis is that Bk= 0, the t_calculated = Bhatk/(SE(Bhatk)). In the
multiple regression context SE(Bhatk) = SEE/(sqrt(sum((xk_sq*(1- Rk_sq)),
where sum(xk_sq) is the sum of the squared deviation of Xk about its mean
value Xkbar. If on has a high Bhatk, a low SEE, and/or or a high
sum(xk-sq), the t can still be significant when the VIF is high. This is
true even after accounting for the changes in Bhatk, when the variable with
which it is correlated enters the regression. (One can compute the changes
in Bhatk using specification analysis.)

Until recently, I thought I understood a possible role for stepwise
regression. SPSS permits one to enter variables one block at a time. In
the Studenmund text, *Using Econometrics*, Chapter 8, "Multicolliearity,"
there is an example where one is trying to predict high school SAT
scores. The variables believed to be theoretically relevant are GPA,
APMATH, and APENG (one can question the direction of causation here). And
then there are other variables in the data set, whose importance is not
known. So, I thought it would be interesting to first include the three
theoretically relevant variables in Block 1, and use the Enter Method to
bring them simultaneously into the regression. I included the remaining
variables in the Block 2, and used the Stepwise Method to enter these
variables sequantially based on the defaultl entry criterion.. For this
high school in which there were many Asian students for whom English was a
second language, only a sample of observations were used, Race entered the
regression in Block 2, and blocked English as a Second Language unless I
changed the entry criterion. The t-test for race was about 1.0, and for
English as a Second Language somewhat less than 1.0. The author of the book
indicated that he and a group of experts agreed that because of their
institutional knowledge of the situation English as a Second Language
should be included in the final regression (despite the low t-test) and Race
excluded. If one accepts this view, Stepwise regression doesn't work in
this situation either.

I would be interested in hearing more comments about multicollinearity.

Greg

On 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote:

>
> Hello, everybody. Would collinearity (not a severe one) be less of a
> problem in a stepwise regression, since the variables are entered one at a
> time?
> Thanks in advance for any thoughts on that.
> Bozena
>
> Bozena Zdaniuk, Ph.D.
> University of Pittsburgh
> UCSUR, 6th Fl.
> 121 University Place
> Pittsburgh, PA 15260
> Ph.: 412-624-5736
> Fax: 412-624-4810
> Email: [hidden email]
>
>
> RD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>