|
I am trying (unsuccessfully) to write spss syntax to flag consecutive
cases of zeroes in a large data set. For example, in a file of 35,000 cases, I need to find (and eventually delete) sequences in which 120 zeros appear, one after another. Then I need to delete them. All other zeros need to remain in the file. Can anyone suggest command language that would help me do this? I've tried using the LAG function, but I can't get it to work. Many thanks! Simon Marshall. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I'm assuming based upon your attempt to use lag that the sequence is across
cases. Is it for a single variable? Use the LEAVE command to leave the value of the prior case and increment a counter variable when the variable in the current case equals 0. COMPUTE deletecase=0. IF (x eq 0) zerocounter=zerocounter+1. IF (x ne 0) zerocounter=0. IF (zerocounter ge 120) deletecase=1. LEAVE zerocounter. EXECUTE. SELECT IF (deletecase eq 1). ... -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Simon Marshall Sent: Friday, November 30, 2007 1:48 PM To: [hidden email] Subject: syntax to find n consecutive cases of zeroes. I am trying (unsuccessfully) to write spss syntax to flag consecutive cases of zeroes in a large data set. For example, in a file of 35,000 cases, I need to find (and eventually delete) sequences in which 120 zeros appear, one after another. Then I need to delete them. All other zeros need to remain in the file. Can anyone suggest command language that would help me do this? I've tried using the LAG function, but I can't get it to work. Many thanks! Simon Marshall. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Simon Marshall
At 03:47 PM 11/30/2007, Simon Marshall wrote:
>I am trying (unsuccessfully) to write spss syntax to flag >consecutive cases of zeroes in a large data set. For example, I need >to find sequences in which 120 zeros appear, one after another. Then >I need to delete them. All other zeros need to remain in the file. Lag can't attach the length of the *whole* run of 0s to each record, because it can't 'know' at the beginning of the run, how long the run is. Fo summarize the whole run in one record, use AGGREGATE, like this (SPSS 14 draft output; not saved separately). It counts the lengths of runs of NON-zeroes as well as runs of zeroes (in variable DATUM), but that shouldn't be a problem. It does NOT handle missing values of DATUM. For the selection you're interested in, run the code below, and then SELECT IF (DATUM EQ 0) AND RunLen GE 120. Test dataset, followed by code: |-----------------------------|---------------------------| |Output Created |30-NOV-2007 19:05:52 | |-----------------------------|---------------------------| ID DATUM 1 1.23 2 .00 3 .00 4 4.56 5 7.89 6 .00 7 9.87 8 6.54 9 3.21 10 .00 11 .00 12 .00 13 .00 14 .00 15 2.46 16 .00 17 .00 Number of cases read: 17 Number of cases listed: 17 NUMERIC Run# (F3) /* Count, runs of 0s (or non-0s) */. NUMERIC SoFar(F5) /* Length of run, up to this point */. DO IF $CASENUM EQ 1. . COMPUTE Run# = 1. . COMPUTE SoFar = 1. ELSE IF (DATUM EQ 0) EQ (LAG(DATUM) EQ 0) /* Cute, huh? */. . COMPUTE Run# = LAG(Run#). . COMPUTE SoFar = LAG(SoFar) + 1. ELSE. . COMPUTE Run# = LAG(Run#) + 1. . COMPUTE SoFar = 1. END IF. AGGREGATE OUTFILE=* MODE =AddVariables /Break =Run# /RunLen =MAX(SoFar). LIST. List |-----------------------------|---------------------------| |Output Created |30-NOV-2007 19:05:52 | |-----------------------------|---------------------------| ID DATUM Run# SoFar RunLen 1 1.23 1 1 1 2 .00 2 1 2 3 .00 2 2 2 4 4.56 3 1 2 5 7.89 3 2 2 6 .00 4 1 1 7 9.87 5 1 3 8 6.54 5 2 3 9 3.21 5 3 3 10 .00 6 1 5 11 .00 6 2 5 12 .00 6 3 5 13 .00 6 4 5 14 .00 6 5 5 15 2.46 7 1 1 16 .00 8 1 2 17 .00 8 2 2 Number of cases read: 17 Number of cases listed: 17 =================== APPENDIX: Test data =================== DATA LIST LIST /ID DATUM. BEGIN DATA. 1 1.23 2 0 3 0 4 4.56 5 7.89 6 0 7 9.87 8 6.54 9 3.21 10 0 11 0 12 0 13 0 14 0 15 2.46 16 0 17 0 END DATA. FORMATS ID (F3) DATUM (F6.2). LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ViAnn Beadle
In addition to an earlier posting and under the same proviso that you
are looking for cases with zero in a particular variable, here is the following answer. Perhaps you want to do other things with the sequence. For instance remove the whole sequence of n=120 consecutive cases with zeroes in a particular variable, also the first 119. Or remove all sequences with at least n=120 zeroes. That is more difficult since when the run of zeroes starts, you do not know yet how long it will be. SPSS only knows after having seen the whole run. There are several ways of solving the problem, but I think they always involve passing the file more than once. This is one: 1 Make a permanent variable zerocounter that (like ViAnn did) is incremented when the current case is zero, and set to zero otherwise. It tells you the number of zeroes up to and including the current case. 2 Make a permanent variable number equal to #casenum. 3 Sort cases in descending order of 'number'. Now you meet each sequence of zeroes with its highest number in front. 4 Make a permanent variable runlength that is equal to zero if zerocounter is zero, equal to maximum (zerocounter, lag(runlength)) otherwise. Now every case carries the extra information about the length of the run of zeroes it belongs to, its rank number in that run, and the rank number of the original order of the data base. 5 Now you can do many things. You can, in particular, delete whole runs longer than 120. 6 Drop runlength and zerocounter. 7 Sort, if you want, cases back in their original order. Regards, Peter Das Netherlands > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of > Simon Marshall > Sent: Friday, November 30, 2007 1:48 PM > To: [hidden email] > Subject: syntax to find n consecutive cases of zeroes. > > I am trying (unsuccessfully) to write spss syntax to flag consecutive > cases of zeroes in a large data set. For example, in a file of 35,000 > cases, I need to find (and eventually delete) sequences in which 120 > zeros > appear, one after another. Then I need to delete them. All other zeros > need to remain in the file. > > Can anyone suggest command language that would help me do this? I've > tried > using the LAG function, but I can't get it to work. > > Many thanks! > > Simon Marshall. ViAnn Beadle heeft op vrijdag, 30 nov 2007 om 23:10 (Europe/Zurich) het volgende geschreven: > I'm assuming based upon your attempt to use lag that the sequence is > across > cases. Is it for a single variable? > > Use the LEAVE command to leave the value of the prior case and > increment a > counter variable when the variable in the current case equals 0. > > COMPUTE deletecase=0. > IF (x eq 0) zerocounter=zerocounter+1. > IF (x ne 0) zerocounter=0. > IF (zerocounter ge 120) deletecase=1. > LEAVE zerocounter. > EXECUTE. > SELECT IF (deletecase eq 1). > ... > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
