Administrator
|
"Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-) "
MAGIC!!! Obvious is BORING! I really don't expect novices to get it and could really care less if they do!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
One of the things that David's solutions use is what I'd call "nesting" and "recursion". I'm curious if spss has limits on either of those elements. For nesting, I think the question is how many sets of parentheses can be accommodated and for recursion how many times can the same function be called.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Saturday, December 07, 2013 5:14 PM To: [hidden email] Subject: Re: Computing variables based on multiple rows in a tall-format file "Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-) " MAGIC!!! Obvious is BORING! I really don't expect novices to get it and could really care less if they do! Bruce Weaver wrote > I / > mostly / > agree with Art on this one, partly because there are some aspects of > David's one-liner that are a bit mysterious, especially to novices, I > expect. E.g., on Row 1, LAG(SEQ) and LAG(ID) both return SYSMIS. The > reason why David's code works is that those SYSMIS values are > appearing as arguments to the SUM function, and SUM returns a valid > result if at least one argument has a valid value--see the demo below. > > Another somewhat mysterious aspect of David's method is that LAG(SEQ) > can be used on the right side of the COMPUTE statement that is > bringing SEQ into existence as a variable. That this can be done may > not be intuitively obvious! ;-) > > Re Richard's DO-IF, I prefer to have both conditions that result in > SEQ = > 1 on a single conditional statement with an OR. And perhaps two > separate IF statements would be even easier for novice users to > understand. See below. > > > NEW FILE. > DATASET CLOSE all. > DATA LIST FREE / ID. > BEGIN DATA > 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5 END DATA. > > * David's one-liner. > > COMPUTE SEQ1=SUM(1,LAG(SEQ1)*(LAG(ID) EQ ID)). > > * Variation on Richard's DO-IF. > * The two conditions for which SEQ2 = 1 are combined with OR. > > DO IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) . > . COMPUTE SEQ2 = 1. > ELSE. > . COMPUTE SEQ2 = LAG(SEQ2) + 1. > END IF. > > * Two IF statements. > * This is somewhat less efficient in machine time than DO-IF, > * but possibly easier for the novice user to understand. > > IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) SEQ3 = 1. > IF MISSING(SEQ3) SEQ3 = LAG(SEQ3) + 1. > > FORMATS ID SEQ1 to SEQ3(f5.0). > LIST. > > * The slightly mysterious thing about David's one-liner > * is that the result is not SYSMIS on Row 1, because > * on Row 1, LAG(SEQ1) and LAG(ID) both return SYSMIS. > > COMPUTE LagID = LAG(ID). > COMPUTE LagSEQ1 = LAG(SEQ1). > FORMATS LagID LagSEQ1(F5.0). > LIST. > > Output: > ID SEQ1 SEQ2 SEQ3 LagID LagSEQ1 > > 1 1 1 1 . . > 1 2 2 2 1 1 > 1 3 3 3 1 2 > 1 4 4 4 1 3 > etc. > > * The reason David's code does not return SYSMIS as a > * result is that those missing values appear within > * the SUM function: SUM will return a valid result > * if at lest one of the arguments is valid. If you > * compute a sum using plus signs, on the other hand, > * all variables must be valid. > > NEW FILE. > DATASET CLOSE all. > DATA LIST LIST / V1 to V3 (3f1). > BEGIN DATA > 1 2 3 > 1 2 . > 1 . . > . . . > END DATA. > > COMPUTE SumViaSUM = SUM(V1 to V3). > COMPUTE SumViaPlus = V1 + V2 + V3. > FORMATS SumViaSUM SumViaPlus (F5.0). > LIST. > > Output: > V1 V2 V3 SumViaSUM SumViaPlus > > 1 2 3 6 6 > 1 2 . 3 . > 1 . . 1 . > . . . . . > > Number of cases read: 4 Number of cases listed: 4 > > David Marso wrote >> Assumptions of my one liner. >> Something is equal to something else or it isn't (true=1 false=0). >> Multiplication of X by 0 = 0, by 1 = X >> 0+1 = 1, X+1 = X+1. >> Anybody having a problem with this might ponder their choice of >> careers or majors? >> >> Art Kendall wrote >>> <Don flame shields!> >>> Your solution does require fewer lines and characters in >>> the syntax, and most likely fewer internal operations. >>> However, I deny your assertion that the one line of code is >>> "reasonably intuitive". >>> It would help beginners to see and understand Rich's solution >>> and then see that it can be expressed more compactly. >>> >>> Your solution is "reasonably intuitive" only for people with at >>> least a moderate amount of experience at some computer >>> languages. >>> >>> However, Rich's do repeat syntax is much easier for people to >>> read. >>> It is my impression that many posts from this list are from >>> beginners. >>> It is also my impression that people searching the archives are >>> beginners. >>> >>> >>> The easier it is for people to read and understand the syntax >>> the easier it is to communicate the process to other people, >>> e.g., classmates working as peer reviewers, on the job QA >>> reviewers, triers-of-fact, archive users, process maintainers >>> and up-daters. >>> >>> Soapbox: Efficiency in terms of cognitive load trumps saving >>> storage space for syntax and often trumps saving small amounts >>> of processing time. Marginal labor cost is a greater >>> consideration than marginal cost of machine resources. >>> >>> A rhetorical question: How much computer time would it >>> take to >>> do David's solution compared to the computer time to do Rich's >>> solution? How many times of running the same code >>> would it take >>> to actually get a measurable difference in computer time, e.g., >>> 10 seconds? >>> >>> <remove flame shield.> >>> >>> >>> Art Kendall >>> Social Research Consultants >>> On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote: >>> >>> I don't see why people use that ponderous DO IF >>> $CASENUM = 1 .... ELSE blah blah blah (7 lines more or less) >>> approach when a counter can be build with !!!!! >>> >>> >>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)). >>> >>> >>> DATA LIST FREE / ID. >>> >>> BEGIN DATA >>> >>> 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5 >>> >>> END DATA. >>> >>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)). >>> >>> LIST. >>> >>> >>> ID SEQ >>> >>> >>> 1.00 1.00 >>> 1.00 2.00 >>> 1.00 3.00 >>> 1.00 4.00 >>> 1.00 5.00 >>> 1.00 6.00 >>> 2.00 1.00 >>> 2.00 2.00 >>> 2.00 3.00 >>> 3.00 1.00 >>> 3.00 2.00 >>> 3.00 3.00 >>> 3.00 4.00 >>> 3.00 5.00 >>> 3.00 6.00 >>> 3.00 7.00 >>> 4.00 1.00 >>> 4.00 2.00 >>> 4.00 3.00 >>> 5.00 1.00 >>> 5.00 2.00 >>> 5.00 3.00 >>> 5.00 4.00 >>> >>> >>> >>> >>> Number of cases read: 23 Number of cases >>> listed: 23 >>> >>> >>> >>> Michael >>> Cohn wrote >>> I think this >>> solves all my problems! Many thanks to Andy and Richard for >>> their help. I wasn't familiar with the LAG and AGGREGATE >>> functions in SPSS but now I know what to start learning >>> about. >>> >>> >>> - Michael >>> >>> >>> ---------------------------------- >>> >>> Michael A. Cohn, PhD >>> >>> [hidden >>> email] >>> Osher Center for Integrative Medicine >>> >>> University of California, San Francisco >>> >>> >>> From: "Richard Ristow [via SPSSX Discussion]" < [hidden >>> email] <mailto: [hidden >>> email] >> >>> >>> Date: Wednesday, December 4, 2013 at 14:56 >>> >>> To: Michael Cohn < [hidden >>> email] <mailto: [hidden >>> email] >> >>> >>> Subject: Re: Computing variables based on multiple rows in a >>> tall-format file >>> >>> >>> At 03:23 PM 12/4/2013, Michael Cohn wrote: >>> >>> >>> >My data file is currently in "tall" format (one row per >>> measurement >>> >>> >per participant). Is there a way to generate variables >>> in each >>> >>> >record that are based on information in that user's >>> other records? >>> >>> >>> You'll get many answers; the fact is, all of these are quite >>> easy. >>> >>> The code I'm posting (not tested) assumes variables >>> >>> >>> PcptID -- Participant identifier >>> >>> Date -- Date stamp >>> >>> Outcome -- Outcome value >>> >>> >>> neither of the first two are ever missing; and your file is >>> sorted in >>> >>> ascending order on the first two. >>> >>> >>> >* A sequential index variable based on the record's >>> datestamp (i.e., >>> >>> >number a participant's responses 1, 2, 3... in >>> chronological order). >>> >>> >>> Various ways; here's a simple one, using transformation >>> language: >>> >>> >>> NUMERIC VisitSeq (F4). >>> >>> >>> DO IF $CASENUM EQ 1. >>> >>> . COMPUTE VisitSeq = 1. >>> >>> ELSE IF PcptID NE LAG(PcptID). >>> >>> . COMPUTE VisitSeq = 1. >>> >>> ELSE. >>> >>> . COMPUTE VisitSeq = LAG(VisitSeq) + 1. >>> >>> END IF. >>> >>> >>> >>> >* The length of time between the record and the earliest >>> record for that >>> >>> >participant >>> >>> >* The difference between the outcome variable in the >>> record and the minimum >>> >>> >value ever recorded for that participant. >>> >>> >>> In both cases, start by putting the minimum value for the >>> participant >>> >>> in every record for that participant, and then it's easy: >>> >>> >>> AGGREGATE OUTFILE=* MODE=ADDVARIABLES >>> >>> /BREAK=PcptID >>> >>> /Earliest 'Date of earliest record for >>> participant' = >>> MIN(Date) >>> >>> /MinOut 'Lowest outcome value for >>> participant' = >>> MIN(Outcome). >>> >>> >>> >It's easy to do these using a spreadsheet or a python >>> script ... >>> >>> >>> Actually, I think it's probably easier in long ('tall') form >>> in >>> >>> native SPSS than in either of those two. >>> >>> >>> -Best of luck, >>> >>> Richard >>> >>> >>> ===================== >>> >>> To manage your subscription to SPSSX-L, send a message >>> to >>> >>> [hidden >>> >>> email]</user/SendEmail.jtp?type=node&node=5723438&i=0> >>> (not to SPSSX-L), with no body text except the >>> >>> command. To leave the list, send the command >>> >>> SIGNOFF SPSSX-L >>> >>> For a list of commands to manage subscriptions, send the >>> command >>> >>> INFO REFCARD >>> >>> >>> >>> ________________________________ >>> >>> If you reply to this email, your message will be added to >>> the discussion below: >>> >>> >>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html >>> To unsubscribe from Computing variables based on multiple >>> rows in a tall-format file, click here< >>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j >>> tp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2Npb >>> S51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=> >>> . >>> >>> NAML< >>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j >>> tp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml& >>> base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template. >>> NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrum >>> bs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble% >>> 3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>> >>> >>> Please >>> reply to the list and not to my personal email. >>> >>> Those desiring my consulting or training services please feel >>> free to email me. >>> >>> --- >>> >>> "Nolite dare sanctum canibus neque mittatis margaritas vestras >>> ante porcos ne forte conculcent eas pedibus suis." >>> >>> Cum es damnatorum possederunt porcos iens ut salire off >>> sanguinum cliff in abyssum?" >>> >>> >>> >>> >>> If you reply to this email, your >>> message will be added to the discussion below: >>> >>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-ba >>> sed-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html >>> >>> >>> To start a new topic under SPSSX Discussion, email >>> >>> ml-node+s1045642n1068821h68@.nabble >>> >>> To unsubscribe from SPSSX Discussion, click >>> here . >>> NAML ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723500.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I'm pretty sure the notion of recursion doesn't apply.
I can't think of where recursion would fit into standard SPSS syntax since aside from python SPSS doesn't really have functions except in certain uses of macro (but then I think of recursion as a primarily mathematical concept and MACRO doesn't know anything about math). I would be very interested in seeing a recursive method implemented in SPSS particularly in a macro. An example of recursion is say the Fibonacci sequence 1 1 2 3 5 8 13 21 .. such that Fib(n)=Fib(n-1)+Fib(n-2) Fib(1)=Fib(2)=1 or Factorial(N)=Factorial(N-1)*N I do however use nested functions to a great degree in my coding. I have yet to encounter any sort of misbehavior in standard syntax. MATRIX does have a nesting limit of 6 . If you blow past that you are rewarded with the following useless directive (useless because technical support will likely not be of any help whatsoever since the message is in all likelihood on the money ;-) >Error # 12540 >Parser stack overflow. The possible reason: nesting level is more than 6. A >program error has occurred: Please note the circumstances under which this >error occurred, attempting to replicate it if possible, and then notify >Technical Support. >Execution of this command stops. The MATRIX statement skipped. This resulted from the following monstrosity which I doubt I would ever push MATRIX. COMPUTE X=T({1,1,1,1,1;1,2,3,4,5;6,7,8,9,11}). COMPUTE Boogie1=DET(DET(INV(DET(INV(T(X)*X))*T(X)*X))*T(X)*X). PRINT Boogie1. COMPUTE Boogie2=DET(DET(DET(INV(DET(INV(T(X)*X))*T(X)*X))*T(X)*X)*T(X)*X). PRINT Boogie2. END MATRIX.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Maguin, Eugene
Dear Gene,
At 10:06 AM 12/9/2013, you wrote (in thread "Computing variables based on multiple rows in a tall-format file"): >One of the things that David's solutions use is what I'd call >"nesting" and "recursion". You did spark quite a discussion, with some real illumination on the working of macros -- I hope you've been enjoying it. I would never have imagined that the macro system's memory management was sophisticated enough to allow true recursive macros. Nor that they'd be as useful as it looks like they are. Happy New Year! -Best wishes, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |