|
Hi All,
I seldom do data manipulation, and am faced with a quandry and not much time. I have: V1 V2 V3 V4 1.00 sysmis 2.00 3.00 4.00 sysmis sysmis 5.00 6.00 sysmis 7.00 sysmis 8.00 9.00 1.00 2.00 I need to move these over so that I have: 1.00 2.00 3.00 sysmis 4.00 5.00 sysmis sysmis 6.00 7.00 sysmis sysmis 8.00 9.00 1.00 2.00 In other words, move all the valid values into the beginning of the series, and leave the missing at the end. Can you help? Thanks! Susan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This should work for only four variables. If you have more, it would be best to create a macro, or, better yes, wrap some Python around it.
do if sysmis(V1). do if ~sysmis(V2). compute V1=V2. compute V2=$sysmis. else if ~sysmis(V3). compute V1=V3. compute V3=$sysmis. else if ~sysmis(V4). compute V1=V4. compute V4=$sysmis. end if. end if. do if sysmis(V2). do if ~sysmis(V3). compute V2=V3. compute V3=$sysmis. else if ~sysmis(V4). compute V2=V4. compute V4=$sysmis. end if. end if. do if sysmis(V3). do if ~sysmis(V4). compute V3=V4. compute V4=$sysmis. end if. end if. exe. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Susan Elgie Sent: Wednesday, May 07, 2008 10:18 AM To: [hidden email] Subject: A simple question Hi All, I seldom do data manipulation, and am faced with a quandry and not much time. I have: V1 V2 V3 V4 1.00 sysmis 2.00 3.00 4.00 sysmis sysmis 5.00 6.00 sysmis 7.00 sysmis 8.00 9.00 1.00 2.00 I need to move these over so that I have: 1.00 2.00 3.00 sysmis 4.00 5.00 sysmis sysmis 6.00 7.00 sysmis sysmis 8.00 9.00 1.00 2.00 In other words, move all the valid values into the beginning of the series, and leave the missing at the end. Can you help? Thanks! Susan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM No virus found in this outgoing message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks David. Yes I can see doing that with 4 variables, but I have several
series of more than 4 in a longitudinal study, a situation which I failed to make clear in the original posting. I struggled with vectors and did not quite get there although I thought the way might lie in that direction. Don't think I can manage Python. More ideas? Thanks again, Susan On Wed, May 7, 2008 at 1:40 PM, Katkowski, David < [hidden email]> wrote: > This should work for only four variables. If you have more, it would be > best to create a macro, or, better yes, wrap some Python around it. > > do if sysmis(V1). > do if ~sysmis(V2). > compute V1=V2. > compute V2=$sysmis. > else if ~sysmis(V3). > compute V1=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V1=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V2). > do if ~sysmis(V3). > compute V2=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V2=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V3). > do if ~sysmis(V4). > compute V3=V4. > compute V4=$sysmis. > end if. > end if. > exe. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Susan Elgie > Sent: Wednesday, May 07, 2008 10:18 AM > To: [hidden email] > Subject: A simple question > > Hi All, > I seldom do data manipulation, and am faced with a quandry and not much > time. > I have: > > V1 V2 V3 V4 > 1.00 sysmis 2.00 3.00 > 4.00 sysmis sysmis 5.00 > 6.00 sysmis 7.00 sysmis > 8.00 9.00 1.00 2.00 > > I need to move these over so that I have: > 1.00 2.00 3.00 sysmis > 4.00 5.00 sysmis sysmis > 6.00 7.00 sysmis sysmis > 8.00 9.00 1.00 2.00 > > In other words, move all the valid values into the beginning of the > series, > and leave the missing at the end. > > Can you help? Thanks! > > Susan > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > No virus found in this incoming message. > Checked by AVG. > Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 > 7:46 AM > > > No virus found in this outgoing message. > Checked by AVG. > Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 > 7:46 AM > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
So the simple question was too simple, eh?
Try this: vector new(4). vector old= v1 to v4. compute #i=1. compute #j=1. loop if #j< 5. do if not(sysmis(old(#i))). compute new(#j) = old(#i). compute #j=#j+1. end if. compute #i=#i+1. end loop. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Susan Elgie Sent: Wednesday, May 07, 2008 12:36 PM To: [hidden email] Subject: Re: A simple question Thanks David. Yes I can see doing that with 4 variables, but I have several series of more than 4 in a longitudinal study, a situation which I failed to make clear in the original posting. I struggled with vectors and did not quite get there although I thought the way might lie in that direction. Don't think I can manage Python. More ideas? Thanks again, Susan On Wed, May 7, 2008 at 1:40 PM, Katkowski, David < [hidden email]> wrote: > This should work for only four variables. If you have more, it would be > best to create a macro, or, better yes, wrap some Python around it. > > do if sysmis(V1). > do if ~sysmis(V2). > compute V1=V2. > compute V2=$sysmis. > else if ~sysmis(V3). > compute V1=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V1=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V2). > do if ~sysmis(V3). > compute V2=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V2=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V3). > do if ~sysmis(V4). > compute V3=V4. > compute V4=$sysmis. > end if. > end if. > exe. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Susan Elgie > Sent: Wednesday, May 07, 2008 10:18 AM > To: [hidden email] > Subject: A simple question > > Hi All, > I seldom do data manipulation, and am faced with a quandry and not much > time. > I have: > > V1 V2 V3 V4 > 1.00 sysmis 2.00 3.00 > 4.00 sysmis sysmis 5.00 > 6.00 sysmis 7.00 sysmis > 8.00 9.00 1.00 2.00 > > I need to move these over so that I have: > 1.00 2.00 3.00 sysmis > 4.00 5.00 sysmis sysmis > 6.00 7.00 sysmis sysmis > 8.00 9.00 1.00 2.00 > > In other words, move all the valid values into the beginning of the > series, > and leave the missing at the end. > > Can you help? Thanks! > > Susan > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
ViAnn,
That did it! It gave a nasty-looking out of range error message, but the proof is in the pudding, and the numbers were in the right place. Thank you immensely. Thank you also to David. I would be interested in seeing the Python solution, but will probably use this one for now since I am in quite a hurry. But there is a longer term to this study as well, and it would be good to learn Python. (longitudinal studies are almost infinite aren't they?) Have good days everybody, Susan On Wed, May 7, 2008 at 3:06 PM, ViAnn Beadle <[hidden email]> wrote: > So the simple question was too simple, eh? > > Try this: > > vector new(4). > vector old= v1 to v4. > compute #i=1. > compute #j=1. > loop if #j< 5. > do if not(sysmis(old(#i))). > compute new(#j) = old(#i). > compute #j=#j+1. > end if. > compute #i=#i+1. > end loop. > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Susan Elgie > Sent: Wednesday, May 07, 2008 12:36 PM > To: [hidden email] > Subject: Re: A simple question > > Thanks David. Yes I can see doing that with 4 variables, but I have > several > series of more than 4 in a longitudinal study, a situation which I failed > to > make clear in the original posting. I struggled with vectors and did not > quite get there although I thought the way might lie in that direction. > Don't think I can manage Python. > > More ideas? > > Thanks again, Susan > > On Wed, May 7, 2008 at 1:40 PM, Katkowski, David < > [hidden email]> wrote: > > > This should work for only four variables. If you have more, it would be > > best to create a macro, or, better yes, wrap some Python around it. > > > > do if sysmis(V1). > > do if ~sysmis(V2). > > compute V1=V2. > > compute V2=$sysmis. > > else if ~sysmis(V3). > > compute V1=V3. > > compute V3=$sysmis. > > else if ~sysmis(V4). > > compute V1=V4. > > compute V4=$sysmis. > > end if. > > end if. > > > > do if sysmis(V2). > > do if ~sysmis(V3). > > compute V2=V3. > > compute V3=$sysmis. > > else if ~sysmis(V4). > > compute V2=V4. > > compute V4=$sysmis. > > end if. > > end if. > > > > do if sysmis(V3). > > do if ~sysmis(V4). > > compute V3=V4. > > compute V4=$sysmis. > > end if. > > end if. > > exe. > > > > -----Original Message----- > > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > > Susan Elgie > > Sent: Wednesday, May 07, 2008 10:18 AM > > To: [hidden email] > > Subject: A simple question > > > > Hi All, > > I seldom do data manipulation, and am faced with a quandry and not much > > time. > > I have: > > > > V1 V2 V3 V4 > > 1.00 sysmis 2.00 3.00 > > 4.00 sysmis sysmis 5.00 > > 6.00 sysmis 7.00 sysmis > > 8.00 9.00 1.00 2.00 > > > > I need to move these over so that I have: > > 1.00 2.00 3.00 sysmis > > 4.00 5.00 sysmis sysmis > > 6.00 7.00 sysmis sysmis > > 8.00 9.00 1.00 2.00 > > > > In other words, move all the valid values into the beginning of the > > series, > > and leave the missing at the end. > > > > Can you help? Thanks! > > > > Susan > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I think that a wide data structure is probably not the best way to maintain
your data in a longitudinal study, depending on the kind of analyses you are doing. Generally speaking narrow is always easiest to work with. This is why time series data is always organized so that cases are time points and columns are variables being measured. From: Susan Elgie [mailto:[hidden email]] Sent: Wednesday, May 07, 2008 1:36 PM To: ViAnn Beadle Cc: [hidden email] Subject: Re: A simple question ViAnn, That did it! It gave a nasty-looking out of range error message, but the proof is in the pudding, and the numbers were in the right place. Thank you immensely. Thank you also to David. I would be interested in seeing the Python solution, but will probably use this one for now since I am in quite a hurry. But there is a longer term to this study as well, and it would be good to learn Python. (longitudinal studies are almost infinite aren't they?) Have good days everybody, Susan On Wed, May 7, 2008 at 3:06 PM, ViAnn Beadle <[hidden email]> wrote: So the simple question was too simple, eh? Try this: vector new(4). vector old= v1 to v4. compute #i=1. compute #j=1. loop if #j< 5. do if not(sysmis(old(#i))). compute new(#j) = old(#i). compute #j=#j+1. end if. compute #i=#i+1. end loop. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Susan Elgie Sent: Wednesday, May 07, 2008 12:36 PM To: [hidden email] Subject: Re: A simple question Thanks David. Yes I can see doing that with 4 variables, but I have several series of more than 4 in a longitudinal study, a situation which I failed to make clear in the original posting. I struggled with vectors and did not quite get there although I thought the way might lie in that direction. Don't think I can manage Python. More ideas? Thanks again, Susan On Wed, May 7, 2008 at 1:40 PM, Katkowski, David < [hidden email]> wrote: > This should work for only four variables. If you have more, it would be > best to create a macro, or, better yes, wrap some Python around it. > > do if sysmis(V1). > do if ~sysmis(V2). > compute V1=V2. > compute V2=$sysmis. > else if ~sysmis(V3). > compute V1=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V1=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V2). > do if ~sysmis(V3). > compute V2=V3. > compute V3=$sysmis. > else if ~sysmis(V4). > compute V2=V4. > compute V4=$sysmis. > end if. > end if. > > do if sysmis(V3). > do if ~sysmis(V4). > compute V3=V4. > compute V4=$sysmis. > end if. > end if. > exe. > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Susan Elgie > Sent: Wednesday, May 07, 2008 10:18 AM > To: [hidden email] > Subject: A simple question > > Hi All, > I seldom do data manipulation, and am faced with a quandry and not much > time. > I have: > > V1 V2 V3 V4 > 1.00 sysmis 2.00 3.00 > 4.00 sysmis sysmis 5.00 > 6.00 sysmis 7.00 sysmis > 8.00 9.00 1.00 2.00 > > I need to move these over so that I have: > 1.00 2.00 3.00 sysmis > 4.00 5.00 sysmis sysmis > 6.00 7.00 sysmis sysmis > 8.00 9.00 1.00 2.00 > > In other words, move all the valid values into the beginning of the > series, > and leave the missing at the end. > > Can you help? Thanks! > > Susan > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Susan Elgie
At 10:17 AM 5/7/2008, Susan Elgie wrote:
>I have: |-----------------------------|---------------------------| |Output Created |07-MAY-2008 15:47:16 | |-----------------------------|---------------------------| [Spacey] V1 V2 V3 V4 1.00 . 2.00 3.00 4.00 . . 5.00 6.00 . 7.00 . 8.00 9.00 1.00 2.00 Number of cases read: 4 Number of cases listed: 4 >I need to move these over so that I have: >1.00 2.00 3.00 sysmis >4.00 5.00 sysmis sysmis >6.00 7.00 sysmis sysmis >8.00 9.00 1.00 2.00 As ViAnn Beadle writes, VECTOR/LOOP logic is good. It doesn't need a separate "new" vector, though; see the following. Variable "#Hold_It" is needed so the source value may be erased (made SYSMIS) without losing it. (Assigning the source value to the destination value directly and then erasing the source value would lose the value, when source and destination are the same.) VECTOR datum=v1 TO v4. COMPUTE #From = 1. COMPUTE #To = 1. LOOP #From = 1 TO 4. . DO IF NOT MISSING(datum(#From)). . COMPUTE #Hold_It = datum(#From). . COMPUTE datum(#From) = $SYSMIS. . COMPUTE datum(#To) = #Hold_It. . COMPUTE #To = #To + 1. . END IF. END LOOP. LIST. List |-----------------------------|---------------------------| |Output Created |07-MAY-2008 16:08:15 | |-----------------------------|---------------------------| V1 V2 V3 V4 1.00 2.00 3.00 . 4.00 5.00 . . 6.00 7.00 . . 8.00 9.00 1.00 2.00 Number of cases read: 4 Number of cases listed: 4 (ViAnn's solution also works, but the loop doesn't terminate until MXLOOPS if the last value in the "old" list is missing.) =================== APPENDIX: Test data =================== DATA LIST LIST/ V1 V2 V3 V4 (4F6.2). * Copious warning messages: . BEGIN DATA 1.00 sysmis 2.00 3.00 4.00 sysmis sysmis 5.00 6.00 sysmis 7.00 sysmis 8.00 9.00 1.00 2.00 END DATA. DATASET NAME Spacey WINDOW=FRONT. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Susan Elgie
Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time?
Thanks in advance for any thoughts on that. Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] RD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
The stepwise process itself is sensitive to data problems, including
multicollinearity. Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Zdaniuk, Bozena Sent: Wednesday, May 07, 2008 3:33 PM To: [hidden email] Subject: collinearity and stepwise regression Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time? Thanks in advance for any thoughts on that. Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] RD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
Stepwise comes with additional baggage that dwarfs any problem you might
have with collinearity. DON'T USE IT. *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) "Zdaniuk, Bozena" <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 05/07/2008 04:33 PM Please respond to "Zdaniuk, Bozena" <[hidden email]> To [hidden email] cc Subject collinearity and stepwise regression Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time? Thanks in advance for any thoughts on that. Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] RD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
I don't know that I would agree with never using stepwise methods, at least in an exploratory fashion. But no doubt, you can't take the results at face value.
However, stepwise does not help with high multi-collinearity, because in that situation you are dealing with two or more almost equivalent variables from the stepping point of view, and choosing one over another is nearly arbitrary and sensitive to very small differences. Forward and backward stepping helps a little, but it only really works if you don't care about the model. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Zdaniuk, Bozena Sent: Wednesday, May 07, 2008 2:33 PM To: [hidden email] Subject: [SPSSX-L] collinearity and stepwise regression Hello, everybody. Would collinearity (not a severe one) be less of a problem in a stepwise regression, since the variables are entered one at a time? Thanks in advance for any thoughts on that. Bozena Bozena Zdaniuk, Ph.D. University of Pittsburgh UCSUR, 6th Fl. 121 University Place Pittsburgh, PA 15260 Ph.: 412-624-5736 Fax: 412-624-4810 Email: [hidden email] RD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
Can also use Python. Just specify the "highv" variable as the largest value for the set of V(some number) variables.
BEGIN PROGRAM. import spss spss.Submit(r"get file 'c:/test.sav'.") lowv = 1 highv = 4 syntax = "" for i in range(lowv,highv): syntax = syntax + "do if (sysmis(V%s)).\n" %(i) syntax = syntax + """do if (~sysmis(V%(next)s)). compute V%(cur)s=V%(next)s. compute V%(next)s=$sysmis.\n""" %{'cur':i, 'next':i+1} for j in range(i+2,highv+1): syntax = syntax + """else if (~sysmis(V%(next)s)). compute V%(cur)s=V%(next)s. compute V%(next)s=$sysmis.\n""" %{'cur':i, 'curmod':i+2, 'next':j} if j==highv: syntax = syntax + "end if.\nend if.\n" syntax = syntax + 'end if.\nend if.\nexe.' print syntax spss.Submit(syntax) END PROGRAM. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, May 07, 2008 4:13 PM To: [hidden email] Subject: Re: A simple question At 10:17 AM 5/7/2008, Susan Elgie wrote: >I have: |-----------------------------|---------------------------| |Output Created |07-MAY-2008 15:47:16 | |-----------------------------|---------------------------| [Spacey] V1 V2 V3 V4 1.00 . 2.00 3.00 4.00 . . 5.00 6.00 . 7.00 . 8.00 9.00 1.00 2.00 Number of cases read: 4 Number of cases listed: 4 >I need to move these over so that I have: >1.00 2.00 3.00 sysmis >4.00 5.00 sysmis sysmis >6.00 7.00 sysmis sysmis >8.00 9.00 1.00 2.00 As ViAnn Beadle writes, VECTOR/LOOP logic is good. It doesn't need a separate "new" vector, though; see the following. Variable "#Hold_It" is needed so the source value may be erased (made SYSMIS) without losing it. (Assigning the source value to the destination value directly and then erasing the source value would lose the value, when source and destination are the same.) VECTOR datum=v1 TO v4. COMPUTE #From = 1. COMPUTE #To = 1. LOOP #From = 1 TO 4. . DO IF NOT MISSING(datum(#From)). . COMPUTE #Hold_It = datum(#From). . COMPUTE datum(#From) = $SYSMIS. . COMPUTE datum(#To) = #Hold_It. . COMPUTE #To = #To + 1. . END IF. END LOOP. LIST. List |-----------------------------|---------------------------| |Output Created |07-MAY-2008 16:08:15 | |-----------------------------|---------------------------| V1 V2 V3 V4 1.00 2.00 3.00 . 4.00 5.00 . . 6.00 7.00 . . 8.00 9.00 1.00 2.00 Number of cases read: 4 Number of cases listed: 4 (ViAnn's solution also works, but the loop doesn't terminate until MXLOOPS if the last value in the "old" list is missing.) =================== APPENDIX: Test data =================== DATA LIST LIST/ V1 V2 V3 V4 (4F6.2). * Copious warning messages: . BEGIN DATA 1.00 sysmis 2.00 3.00 4.00 sysmis sysmis 5.00 6.00 sysmis 7.00 sysmis 8.00 9.00 1.00 2.00 END DATA. DATASET NAME Spacey WINDOW=FRONT. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM No virus found in this outgoing message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.9/1419 - Release Date: 5/7/2008 7:46 AM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Peck, Jon
With the availability of methods like LAR/LASSO, Bayesian model averaging, and penalized maximum likelihood methods---to name a few---I cannot think of any reason to use stepwise methods, even in any exploratory context. Of course, SPSS needs to play catch-up with R/S-Plus, Stata, and SAS in making these newer and better variable selection techniques available in SPSS.
Scott Millis --- On Wed, 5/7/08, Peck, Jon <[hidden email]> wrote: > I don't know that I would agree with never using > stepwise methods, at least in an exploratory fashion. But > no doubt, you can't take the results at face value. > > Jon Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
SPSS (the Leiden branch) is working on that. In Release 17, Ridge regression, the Lasso, and the Elastic Net will be CATREG options. Also, options are added for estimation of prediction error and model selection (.632 bootstrap and cross-validation).
CATREG performs linear regression (when applying numeric optimal scaling level for all variables) and nonlinear regression (apply scaling levels other than numeric: nominal, non-monotic spline, ordinal, and monotonic spline). So, in CATREG 17 penalized regression is available for both linear AND NONLINEAR regression. Regards, Anita van der Kooij Data Theory Group Leiden University ________________________________ From: SPSSX(r) Discussion on behalf of SR Millis Sent: Wed 07/05/2008 23:55 To: [hidden email] Subject: Re: collinearity and stepwise regression With the availability of methods like LAR/LASSO, Bayesian model averaging, and penalized maximum likelihood methods---to name a few---I cannot think of any reason to use stepwise methods, even in any exploratory context. Of course, SPSS needs to play catch-up with R/S-Plus, Stata, and SAS in making these newer and better variable selection techniques available in SPSS. Scott Millis --- On Wed, 5/7/08, Peck, Jon <[hidden email]> wrote: > I don't know that I would agree with never using > stepwise methods, at least in an exploratory fashion. But > no doubt, you can't take the results at face value. > > Jon Peck ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Katkowski, David
At 05:14 PM 5/7/2008, Katkowski, David wrote:
>Can also use Python. Right. And the posted Python code works. There are various ways of using Python in SPSS. This one uses Python as a macro processor analogous to SPSS's own macros, writing the loop as 'unrolled' code: do if (sysmis(V1)). do if (~sysmis(V2)). compute V1=V2. compute V2=$sysmis. else if (~sysmis(V3)). compute V1=V3. compute V3=$sysmis. else if (~sysmis(V4)). compute V1=V4. compute V4=$sysmis. end if. end if. do if (sysmis(V2)). do if (~sysmis(V3)). compute V2=V3. compute V3=$sysmis. else if (~sysmis(V4)). compute V2=V4. compute V4=$sysmis. end if. end if. do if (sysmis(V3)). do if (~sysmis(V4)). compute V3=V4. compute V4=$sysmis. end if. end if. One's preferred style in programming is very much a taste, and *de gustibus non disputandum est*. This is a comment from taste. I grant that I'm notorious for preferring native SPSS code, and that I've been twitted about it, with some justice However, my taste wouldn't be to use Python this way, for the same reason I wouldn't use a macro loop. In either case, you've got two different languages to read: the macro language (DEFINE or Python) and the target language. I think it's considerable less clear. And a good macro processor (Python is excellent) makes it too easy to write complicated code. This Python logic has nested loops, emitting one block for each *target* variable; and within each of those, three lines for each *later source* variable. Code length goes roughly as the square of the number of variables processed. So, I'd stick with native VECTOR/LOOP logic, as less, simpler, and cleaner code. End of comments about taste. The Python code *does* work, just fine. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by SR Millis-3
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
Rather than solving problems caused by collinearity, when using a stepwise method, variable selection is made arbitrarily by collinearity. See Frank Harrell's book, "Regression modeling strategies."
Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat Professor & Director of Research Dept of Physical Medicine & Rehabilitation Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Wed, 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote: > From: Zdaniuk, Bozena <[hidden email]> > Subject: collinearity and stepwise regression > To: [hidden email] > Date: Wednesday, May 7, 2008, 4:33 PM > Hello, everybody. Would collinearity (not a severe one) be > less of a problem in a stepwise regression, since the > variables are entered one at a time? > Thanks in advance for any thoughts on that. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > > RD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
how severe is the collinearity, exactly? if extreme, collinear variables
could be collapsed using either pca/factor-analysis, or by taking the mean of z-transformed vars hth, alex On Thu, May 8, 2008 at 10:42 AM, SR Millis <[hidden email]> wrote: > Rather than solving problems caused by collinearity, when using a stepwise > method, variable selection is made arbitrarily by collinearity. See Frank > Harrell's book, "Regression modeling strategies." > > > Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat > Professor & Director of Research > Dept of Physical Medicine & Rehabilitation > Wayne State University School of Medicine > 261 Mack Blvd > Detroit, MI 48201 > Email: [hidden email] > Tel: 313-993-8085 > Fax: 313-966-7682 > > > --- On Wed, 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote: > > > From: Zdaniuk, Bozena <[hidden email]> > > Subject: collinearity and stepwise regression > > To: [hidden email] > > Date: Wednesday, May 7, 2008, 4:33 PM > > Hello, everybody. Would collinearity (not a severe one) be > > less of a problem in a stepwise regression, since the > > variables are entered one at a time? > > Thanks in advance for any thoughts on that. > > Bozena > > > > Bozena Zdaniuk, Ph.D. > > University of Pittsburgh > > UCSUR, 6th Fl. > > 121 University Place > > Pittsburgh, PA 15260 > > Ph.: 412-624-5736 > > Fax: 412-624-4810 > > Email: [hidden email] > > > > > > RD > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > [hidden email] (not to SPSSX-L), with no body > > text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the > > command > > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Alexander J. Shackman Laboratory for Affective Neuroscience Waisman Laboratory for Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 Telephone: +1 (608) 358-5025 FAX: +1 (608) 265-2875 EMAIL: [hidden email] http://psyphz.psych.wisc.edu/~shackman ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I think the original question mentioned that collinearity was not severe. Having said that, if the number of variables was not very large, I suggest to proceed first to reduce it to a satisfactory level, i.e. variance proportion coefficients le .5, condition index LT 30 and VIF < 10. After satisfying this criteria then proceed to final model selection. There is another point to consider that if collinearity is not degrading and if the purpose of the model is prediction then the model should be fine. We know that if collinearity is severe then hypotheses testing are seriously questionable.
Fermin Ornelas, Ph.D. Management Analyst III, AZ DES 1789 W. Jefferson Street Phoenix, AZ 85007 Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Alexander J. Shackman Sent: Thursday, May 08, 2008 9:36 AM To: [hidden email] Subject: Re: collinearity and stepwise regression how severe is the collinearity, exactly? if extreme, collinear variables could be collapsed using either pca/factor-analysis, or by taking the mean of z-transformed vars hth, alex On Thu, May 8, 2008 at 10:42 AM, SR Millis <[hidden email]> wrote: > Rather than solving problems caused by collinearity, when using a stepwise > method, variable selection is made arbitrarily by collinearity. See Frank > Harrell's book, "Regression modeling strategies." > > > Scott R Millis, PhD, MEd, ABPP (CN,CL,RP), CStat > Professor & Director of Research > Dept of Physical Medicine & Rehabilitation > Wayne State University School of Medicine > 261 Mack Blvd > Detroit, MI 48201 > Email: [hidden email] > Tel: 313-993-8085 > Fax: 313-966-7682 > > > --- On Wed, 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote: > > > From: Zdaniuk, Bozena <[hidden email]> > > Subject: collinearity and stepwise regression > > To: [hidden email] > > Date: Wednesday, May 7, 2008, 4:33 PM > > Hello, everybody. Would collinearity (not a severe one) be > > less of a problem in a stepwise regression, since the > > variables are entered one at a time? > > Thanks in advance for any thoughts on that. > > Bozena > > > > Bozena Zdaniuk, Ph.D. > > University of Pittsburgh > > UCSUR, 6th Fl. > > 121 University Place > > Pittsburgh, PA 15260 > > Ph.: 412-624-5736 > > Fax: 412-624-4810 > > Email: [hidden email] > > > > > > RD > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > [hidden email] (not to SPSSX-L), with no body > > text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the > > command > > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Alexander J. Shackman Laboratory for Affective Neuroscience Waisman Laboratory for Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 Telephone: +1 (608) 358-5025 FAX: +1 (608) 265-2875 EMAIL: [hidden email] http://psyphz.psych.wisc.edu/~shackman ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Zdaniuk, Bozena-2
Bozena,
Periodically, there are discussions of multicollinearity by the SPSS group, and I have always been meaning to chime in. I did have an e-mail discussion with Jon Peck of SPSS who frequently contributes to these discussions, and I indicated that I thought multicollinearity was the biggest challenge in mutivariate regression analysis. He disagreed, and indicated that causality is the biggest challenge. I might have replied that a major reason why causality is so challenging is because of multicollinearity. With respect to the specific question about using stepwise regression to deal with multicollinearity, I believe that mutilicollinearity is precisely the reason one shouldn't use stepwise regression. If two variables are each highly correlated either directly using the simple correlation coefficient, or correlated in a more complex fashion that would be identified uusing Variance Inflation Factor, = 1/(1 - Rk_sq), where Rk)sq is the Rk-sq obtained when the kth independent variable is regressed on all the others, then stepwise regression can create problems. In stepwise regression, the first of two highly correlated variables may be entered into the regression using the stepwise procedure under the default or a specified entry rule, but it's inclusion can block the second variable from entering. If both are deemed theoretically important, say, the effect of both income and wealth on consumption, than it is likely that stepwise regression will first enter income. Then other variables in the model will come in using stepwise regression. Wealth however, will likely be excluded.from the regression. Personally, I believe the best approach is specify the model as carefully as possible (I reveal my background in economics), using theory, which in pragmatic terms can viewed as a good story, and use this prior knowledge to formulate the model. I believe the t tests should be given more weight than multicollinearity indicators such as the VIF, when revising the initial specification. For example, I have a student writing a paper who has two variables in a model with VIFs of around 37 or 38, yet both variables are statistically significant. Not well known is the fact that SPSS has a default tolerance level (the reciprocal of the VIF) in SPSS, which is used to exclude variables equal to .001. Yet, I have encountered situations in which a statistically significant variable was excluded because it didn't meet the tolerance criterion. I found this surprising, but, unfortunately, didn't save the results. I believe the best summary statement on multicollinearity was by Jon Peck in the SPSS discussion group. He said: "There is no rule that tells you whether the correlations are too high (if less than perfect). The collinearity indications just tells you why your estimated variances are so high," Consider the t-statistic for the kh variable, which may have a high VIF in a multiple regression analysis. For a two tail test in which the null hypothesis is that Bk= 0, the t_calculated = Bhatk/(SE(Bhatk)). In the multiple regression context SE(Bhatk) = SEE/(sqrt(sum((xk_sq*(1- Rk_sq)), where sum(xk_sq) is the sum of the squared deviation of Xk about its mean value Xkbar. If on has a high Bhatk, a low SEE, and/or or a high sum(xk-sq), the t can still be significant when the VIF is high. This is true even after accounting for the changes in Bhatk, when the variable with which it is correlated enters the regression. (One can compute the changes in Bhatk using specification analysis.) Until recently, I thought I understood a possible role for stepwise regression. SPSS permits one to enter variables one block at a time. In the Studenmund text, *Using Econometrics*, Chapter 8, "Multicolliearity," there is an example where one is trying to predict high school SAT scores. The variables believed to be theoretically relevant are GPA, APMATH, and APENG (one can question the direction of causation here). And then there are other variables in the data set, whose importance is not known. So, I thought it would be interesting to first include the three theoretically relevant variables in Block 1, and use the Enter Method to bring them simultaneously into the regression. I included the remaining variables in the Block 2, and used the Stepwise Method to enter these variables sequantially based on the defaultl entry criterion.. For this high school in which there were many Asian students for whom English was a second language, only a sample of observations were used, Race entered the regression in Block 2, and blocked English as a Second Language unless I changed the entry criterion. The t-test for race was about 1.0, and for English as a Second Language somewhat less than 1.0. The author of the book indicated that he and a group of experts agreed that because of their institutional knowledge of the situation English as a Second Language should be included in the final regression (despite the low t-test) and Race excluded. If one accepts this view, Stepwise regression doesn't work in this situation either. I would be interested in hearing more comments about multicollinearity. Greg On 5/7/08, Zdaniuk, Bozena <[hidden email]> wrote: > > Hello, everybody. Would collinearity (not a severe one) be less of a > problem in a stepwise regression, since the variables are entered one at a > time? > Thanks in advance for any thoughts on that. > Bozena > > Bozena Zdaniuk, Ph.D. > University of Pittsburgh > UCSUR, 6th Fl. > 121 University Place > Pittsburgh, PA 15260 > Ph.: 412-624-5736 > Fax: 412-624-4810 > Email: [hidden email] > > > RD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
