Data management riddle (can you think of an easier way?)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Data management riddle (can you think of an easier way?)

David Marso
Administrator
Background: (Related to the DELETE VARIABLES / MATCH FILES issue previously posted).
I have a macro which requires certain variables say {a b c } to operate correctly.
Some files to be processed may not have all of a b c .
The process results in creation of another set of variables {s}.
After processing and creation of {s} I wish to leave the resulting data file unaltered with the exception of the addition of {s}.
Here is a hackish version which I don't particularly care for.
Anyone have a better (more elegant) way to do this?
--

*Goal : Retain a b c s .
DATA LIST FREE / a b c.
begin data
1 2 3 4 5 6
end data.

DO REPEAT v=@1 a b c @2 .
COMPUTE v=MAX(v,1).
END REPEAT.
COMPUTE S=1.
DATASET NAME x0.
DATASET COPY x1.
DATASET ACTIVATE x0.
MATCH FILES  FILE * / KEEP s.
DATASET ACTIVATE x1.
MATCH FILES  FILE * /DROP s @1 TO @2.
MATCH FILES  FILE * / FILE X0.
EXECUTE.
 
** Goal retain b c s .
DATA LIST FREE / b c.
begin data
1 2 3 4 5 6
end data.

DO REPEAT v=@1 a b c @2 .
COMPUTE v=MAX(v,1).
END REPEAT.
COMPUTE S=1.
DATASET NAME x0.
DATASET COPY x1.
DATASET ACTIVATE x0.
MATCH FILES / FILE * / KEEP s.
DATASET ACTIVATE x1.
MATCH FILES  FILE */  DROP s @1 TO @2.
MATCH FILES  FILE */FILE X0 .
EXECUTE.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

David Marso
Administrator
NEVER MIND ;-(
Must caffeinate immediately!

SIMPLE:
DELETE VARIABLES @1 TO @2.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

David Marso
Administrator
Actually, not so simple when I look back at my code .
A better representation of the issue.
Say among {s} there is s1 s2 s3 and I have specified in the macro call to only retain s1.
so simple DELETE VARIABLES or a single MATCH FILES won't cut the mustard.

David Marso wrote
NEVER MIND ;-(
Must caffeinate immediately!

SIMPLE:
DELETE VARIABLES @1 TO @2.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

Andy W
For your example it is totally overkill to copy the dataset (although I frequently do it, so it may just be it isn't a great example you provide).

Using your example something like this would work (using your SET RESULTS OFF ERRORS OFF suggestion). [I'm really surprised your DO REPEAT command doesn't throw an error to begin with.]

*****************************************************.
*some sensible default if necessary to have valid value and a,b or c is missing.
compute #a = 0.
compute #b = 0.
compute #c = 0.
PRESERVE.
SET RESULTS OFF ERRORS OFF.
compute #a = a.
compute #b = b.
compute #c = c.
*or do repeat - whatever.
RESTORE.
compute S = #a + #b + #c.
exe.
*****************************************************.

I presume this is a good example where pythons access to metadata is handy. I don't think I have any macros that work quite like this though (and assume existence of specific variables) - I always force the user to enter them in as parameters (or have defaults if specific tokens are missing). This is hacky as well, but a bit more concise IMO.

IMO I do not like changing the active dataset name, and like to provide it as a token to the macro (I realize why many would not like doing this though).

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

Jon K Peck
This would, of course, be trivial to do in Python.  This kind of thing is why we introduced programmability in the first place.  And without having to write any Python code, the SPSSINC SELECT VARIABLES extension command could be used to do this.  It is implemented in Python.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        05/08/2013 10:41 AM
Subject:        Re: [SPSSX-L] Data management riddle (can you think of an easier              way?)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




For your example it is totally overkill to copy the dataset (although I
frequently do it, so it may just be it isn't a great example you provide).

Using your example something like this would work (using your SET RESULTS
OFF ERRORS OFF suggestion). [I'm really surprised your DO REPEAT command
doesn't throw an error to begin with.]

*****************************************************.
*some sensible default if necessary to have valid value and a,b or c is
missing.
compute #a = 0.
compute #b = 0.
compute #c = 0.
PRESERVE.
SET RESULTS OFF ERRORS OFF.
compute #a = a.
compute #b = b.
compute #c = c.
*or do repeat - whatever.
RESTORE.
compute S = #a + #b + #c.
exe.
*****************************************************.

I presume this is a good example where pythons access to metadata is handy.
I don't think I have any macros that work quite like this though (and assume
existence of specific variables) - I always force the user to enter them in
as parameters (or have defaults if specific tokens are missing). This is
hacky as well, but a bit more concise IMO.

IMO I do not like changing the active dataset name, and like to provide it
as a token to the macro (I realize why many would not like doing this
though).





-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Data-management-riddle-can-you-think-of-an-easier-way-tp5720086p5720095.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

David Marso
Administrator
In reply to this post by Andy W
Hi Andy,
It is the magic of SUM rather than DO REPEAT.
Maybe Jon can shed some light on the internals.
My first gasp suggests that SUM would choke on the argument internal to the function.
I suspect that when it parses COMPUTE Y it initializes some internal value 'Y' to SYSMIS.  Then when it scans the arguments in the MAX function it sees 'Y' and returns 1 since MAX(1,$SYSMIS) -> 1 .
Thanks for your example, but it won't do what I need.
Basically I have data files which might be stratified by {a b c} and not necessarily sorted.
I want to be able to create (if they don't exist) each of {a b c} behind the scenes, SORT, then pass the whole ball of wax to MATRIX, create some statistics (for each combination of strata) and then return to SPSS for reporting.  Preferably I don't want to confuse people with a mysterious variable showing up,  and I can't alter the existing values if they do exist.  I use MAX(1,y) because I know that these strata variables will have a minimum value of 1 and never be missing if they do exist, so it will either create the variable and assign 1, or leave it unaltered.  I imagine python meta data access would be a solution, but I don't have any intentions of supporting possibly 100's or 1000's of users when their python installation is FUBAR (I have seen enough silliness of that sort on this list over the years).  At some point I may throw a little python into this mix, but presently I am going native with MACRO and MATRIX.  Consequently my code will work everywhere!

DATA LIST FREE/X.
BEGIN DATA
1 2
END DATA.
COMPUTE Y=MAX(1,Y).

LIST.
     X      Y

1.0000 1.0000
2.0000 1.0000

Number of cases read:  2    Number of cases listed:  2


Andy W wrote
For your example it is totally overkill to copy the dataset (although I frequently do it, so it may just be it isn't a great example you provide).

Using your example something like this would work (using your SET RESULTS OFF ERRORS OFF suggestion). [I'm really surprised your DO REPEAT command doesn't throw an error to begin with.]

*****************************************************.
*some sensible default if necessary to have valid value and a,b or c is missing.
compute #a = 0.
compute #b = 0.
compute #c = 0.
PRESERVE.
SET RESULTS OFF ERRORS OFF.
compute #a = a.
compute #b = b.
compute #c = c.
*or do repeat - whatever.
RESTORE.
compute S = #a + #b + #c.
exe.
*****************************************************.

I presume this is a good example where pythons access to metadata is handy. I don't think I have any macros that work quite like this though (and assume existence of specific variables) - I always force the user to enter them in as parameters (or have defaults if specific tokens are missing). This is hacky as well, but a bit more concise IMO.

IMO I do not like changing the active dataset name, and like to provide it as a token to the macro (I realize why many would not like doing this though).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data management riddle (can you think of an easier way?)

Andy W
The logic I presented works the same even if you can't use scratch variables (e.g. compute a2,b2 and c2 and then match files and drop those specific variables at the end). It isn't clear though from your example how your proposed delete variables does not do what you want it to (since S is outside of the @1 and @2 block).

The only other idea I have is the initialize the set of S variables first, so in the end the variable order in the data file will be

origvars svars addedvars

this way then should be alittle bit easier to drop the unknown created variables and keep them separate from the S variables intermingled in-between (you can use another @* to seperate the svars and added vars). If whether to keep particular svars though is an option for the macro this should be simple to pass MATCH FILES /DROP the list of svars that should not be kept though and then also have @1 to @2 on that drop command as well.

You may need to put alittle more meat on your example to show why the current solutions (including your own) aren't sufficient. I realize you need to use match files to match the output of matrix back to the original dataset, but it isn't clear from this example why you need to use match files at all (nor how the matrix procedure fits in with the rest). It may be as simple as, in the end, copying the original, doing whatever in the sandbox, and then only keeping the needed variables at the end (and rematching back to the original). The problem with that is you need to know the name of the original dataset typically. [I don't want you change the name of my dataset, the same as I don't want to see mysterious variables show up I didn't ask for.]
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/