|
Dear all,
I wanted to demonstrate to a colleague what happens if you match two files containing a variable with the same name but different contents (v1 in my example). However, the second example renders v1 as a column of system missing values. I can obviously preclude this by inserting an extra 'exe.' but what I'd like to know is why I need this. I mean, SPSS should first 'execute' [comp v1=2] and only then match the files, right? Why does the first example work while the second doesn't? TIA, Ruben *example 1, normal. datas clo all. data list free/id. begin data 1 2 3 4 5 6 7 8 9 10 end data. datas nam d1. comp v1=1. data list free/id. begin data 1 2 3 4 5 6 7 8 9 10 end data. datas nam d2. comp v1=2. matc fil fil d1 /fil d2 /by id. exe. *example 2, system missings. datas clo all. data list free/id. begin data 1 2 3 4 5 6 7 8 9 10 end data. datas nam d1. comp v1=1. data list free/id. begin data 1 2 3 4 5 6 7 8 9 10 end data. datas nam d2. comp v1=2. *exe. matc fil fil d2 /fil d1 /by id. exe. New Windows 7: Find the right PC for you. Learn more. |
|
(Raynald, I'm copying you because this appears to be one, or perhaps two,
situations that require EXECUTE, and are not covered in section "Use
EXECUTE Sparingly" section of Programming and Data Management for
SPSS.)
At 03:51 AM 1/13/2010, Ruben van den Berg wrote: [The following example] renders v1 as a column of system missing values:|-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:44 | |-----------------------------|---------------------------| [d1B] id 1.00 2.00 3.00 4.00 5.00 Number of cases read: 5 Number of cases listed: 5 |-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:44 | |-----------------------------|---------------------------| [d2B] id 1.00 2.00 3.00 4.00 5.00 Number of cases read: 5 Number of cases listed: 5 DATASET ACTIVATE d1B. COMPUTE v1=1. DATASET ACTIVATE d2B. COMPUTE v1=2. MATCH FILES FILE d2B /FILE d1B /by id. LIST. List |-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:45 | |-----------------------------|---------------------------| id v1 1.00 . 2.00 . 3.00 . 4.00 . 5.00 . Number of cases read: 5 Number of cases listed: 5 I can preclude this by inserting an extra 'execute.' [before the MATCH FILES, which gives v1=2], but I'd like to know why I need this. I mean, SPSS should first 'execute' [comp v1=2] and only then match the files, right? The reason, I think, is that SPSS doesn't do the 'EXECUTE'. The reason for that is mostly good: running the transformation program only when the data is needed for a procedure means that program and procedure, together, need only a single pass through the data. Running the program first (as happens when you use 'EXECUTE') requires three data passes: Read the data, apply the transformation program, and write the data back; then, read the data again to feed it to the procedure.(1) The example, as written, tries to run two transformation programs in parallel: the one for d1B that calculates v1=1, and the one for d2B that calculates v1=2. Apparently SPSS can't run two transformation programs at once; it would, indeed, be difficult to support. So, this is one of the (unusual) cases that won't run without EXECUTE. By the by, and more mysterious to me, the following code also doesn't work without the EXECUTE(2): * Add a serial number for instances within organizations: . NUMERIC Instance (F3). DO IF $CASENUM EQ 1. . COMPUTE Instance = 1. ELSE IF org_id NE LAG(org_id). . COMPUTE Instance = 1. ELSE. . COMPUTE Instance = LAG(Instance) + 1. END IF. EXECUTE. * 'Unroll': . VARSTOCASES /MAKE Value "Vbl's value" FROM q1 q2 /INDEX = Name "Vbl's name" (Value) /KEEP = org_id Instance /NULL = KEEP. ............................. (1) Versions of SPSS that use the Virtual Active File, I think 12.5 and beyond, may need fewer data passes. (2) From posting Date: Mon, 23 Nov 2009 13:12:05 -0500 From: Richard Ristow <[hidden email]> Subject: Re: Repeat Analysis of the data. To: [hidden email] ============================= APPENDIX: Test data, and code ============================= data list free/id. begin data 1 2 3 4 5 end data. DATASET NAME d1B. LIST. data list free/id. begin data 1 2 3 4 5 end data. DATASET NAME d2B. LIST. DATASET ACTIVATE d1B. COMPUTE v1=1. DATASET ACTIVATE d2B. COMPUTE v1=2. MATCH FILES FILE d2B /FILE d1B /by id. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Dear Richard and others,
Thanks for sharing your thoughts on this! However, I insist that both syntax examples should work from a logical point of view. Whether I get a column of system missings should not depend on which dataset is specified first in the "matc fil" command. My original syntax included an 'exe' after match files so why doensn't this 'exe' start with carrying out the computation of v1 in d2 (which was requested before the match files command)? >The example, as written, tries to run two transformation programs in parallel: the one for d1B that calculates v1=1, and >the one for d2B that calculates v1=2. I'm not sure about this: when the second data list command has been run, transformations on the first data set are carried out (or at least this seems so). I pasted below another example where an extra 'execute' is needed but I find this one more understandable since it involves the system variable $casenum which probably changes during the 'sel if' command. Kind regards and have a nice weekend! *Doesn't work. datas clo all. data list free/v1. begin data 1 2 3 4 5 6 7 8 9 10 end data. comp id=$casenum. sel if id ne 1. exe. *Works. datas clo all. data list free/v1. begin data 1 2 3 4 5 6 7 8 9 10 end data. comp id=$casenum. exe. sel if id ne 1. exe. Date: Thu, 14 Jan 2010 14:36:52 -0500 From: [hidden email] Subject: Re: Run pending transformations To: [hidden email] (Raynald, I'm copying you because this appears to be one, or perhaps two, situations that require EXECUTE, and are not covered in section "Use EXECUTE Sparingly" section of Programming and Data Management for SPSS.) At 03:51 AM 1/13/2010, Ruben van den Berg wrote: [The following example] renders v1 as a column of system missing values:|-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:44 | |-----------------------------|---------------------------| [d1B] id 1.00 2.00 3.00 4.00 5.00 Number of cases read: 5 Number of cases listed: 5 |-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:44 | |-----------------------------|---------------------------| [d2B] id 1.00 2.00 3.00 4.00 5.00 Number of cases read: 5 Number of cases listed: 5 DATASET ACTIVATE d1B. COMPUTE v1=1. DATASET ACTIVATE d2B. COMPUTE v1=2. MATCH FILES FILE d2B /FILE d1B /by id. LIST. List |-----------------------------|---------------------------| |Output Created |14-JAN-2010 14:07:45 | |-----------------------------|---------------------------| id v1 1.00 . 2.00 . 3.00 . 4.00 . 5.00 . Number of cases read: 5 Number of cases listed: 5 I can preclude this by inserting an extra 'execute.' [before the MATCH FILES, which gives v1=2], but I'd like to know why I need this. I mean, SPSS should first 'execute' [comp v1=2] and only then match the files, right? The reason, I think, is that SPSS doesn't do the 'EXECUTE'. The reason for that is mostly good: running the transformation program only when the data is needed for a procedure means that program and procedure, together, need only a single pass through the data. Running the program first (as happens when you use 'EXECUTE') requires three data passes: Read the data, apply the transformation program, and write the data back; then, read the data again to feed it to the procedure.(1) The example, as written, tries to run two transformation programs in parallel: the one for d1B that calculates v1=1, and the one for d2B that calculates v1=2. Apparently SPSS can't run two transformation programs at once; it would, indeed, be difficult to support. So, this is one of the (unusual) cases that won't run without EXECUTE. By the by, and more mysterious to me, the following code also doesn't work without the EXECUTE(2): * Add a serial number for instances within organizations: . NUMERIC Instance (F3). DO IF $CASENUM EQ 1. . COMPUTE Instance = 1. ELSE IF org_id NE LAG(org_id). . COMPUTE Instance = 1. ELSE. . COMPUTE Instance = LAG(Instance) + 1. END IF. EXECUTE. * 'Unroll': . VARSTOCASES /MAKE Value "Vbl's value" FROM q1 q2 /INDEX = Name "Vbl's name" (Value) /KEEP = org_id Instance /NULL = KEEP. ............................. (1) Versions of SPSS that use the Virtual Active File, I think 12.5 and beyond, may need fewer data passes. (2) From posting Date: Mon, 23 Nov 2009 13:12:05 -0500 From: Richard Ristow <[hidden email]> Subject: Re: Repeat Analysis of the data. To: [hidden email] ============================= APPENDIX: Test data, and code ============================= data list free/id. begin data 1 2 3 4 5 end data. DATASET NAME d1B. LIST. data list free/id. begin data 1 2 3 4 5 end data. DATASET NAME d2B. LIST. DATASET ACTIVATE d1B. COMPUTE v1=1. DATASET ACTIVATE d2B. COMPUTE v1=2. MATCH FILES FILE d2B /FILE d1B /by id. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD New Windows 7: Simplify what you do everyday. Find the right PC for you. |
|
In reply to this post by Richard Ristow
At 01:55 PM 1/19/2010, Jon Fry wrote:
>The MATCH FILES example is a bug. I confirmed the behavior you >report, and can explain it, but it needs fixing. Before the named >datasets existed, MATCH FILES ran an implicit EXECUTE if there were >pending transformations and the command included either FILE=* or >TABLE=*. If the command did not include one of those, the results of >the pending transformations were due to be discarded anyway. That >logic was not updated for named datasets. It should execute pending >transformations if the active dataset is named, as well. Thank you! >The VARSTOCASES example seems to be a bug as well. I have not looked >into it yet. Thank you, and ever onward. -Best regards, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
