Match Files in which the BY clause is based on ranges

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Match Files in which the BY clause is based on ranges

wsu_wright
I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

Jon K Peck
 You can't do that directly with MATCH.  One solution would be to calculate the beginning range date corresponding to the date in the main data set and then match on that, assuming that this can be determined by a formula.  It would also be easy using programmability to construct an interval lookup table instead of using matching.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        David Wright <[hidden email]>
To:        [hidden email]
Date:        04/10/2011 10:32 AM
Subject:        [SPSSX-L] Match Files in which the BY clause is based on ranges
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

Albert-Jan Roskam
Hi David,

Check out the Python bisect module.  From http://docs.python.org/library/bisect.html:

The bisect() function can be useful for numeric table lookups. This example uses bisect() to look up a letter grade for an exam score (say) based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80 to 89 is a ‘B’, and so on:
>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
... i = bisect(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']


 
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Sun, April 10, 2011 7:24:36 PM
Subject: Re: [SPSSX-L] Match Files in which the BY clause is based on ranges

 You can't do that directly with MATCH.  One solution would be to calculate the beginning range date corresponding to the date in the main data set and then match on that, assuming that this can be determined by a formula.  It would also be easy using programmability to construct an interval lookup table instead of using matching.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        David Wright <[hidden email]>
To:        [hidden email]
Date:        04/10/2011 10:32 AM
Subject:        [SPSSX-L] Match Files in which the BY clause is based on ranges
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

Jon K Peck
I am adding an interval lookup function to be named vlookupinterval to the extendedTransforms.py module, since this question has come up a number of times.
It will be designed to work with the SPSSINC TRANS extension command and work like the vlookup function already contained in that module for exact match lookups.  That means that it will take an open dataset containing the intervals and values and no Python programming will be required to use it.

Regards,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email]
Date:        04/10/2011 12:50 PM
Subject:        Re: [SPSSX-L] Match Files in which the BY clause is based on              ranges
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi David,

Check out the Python bisect module.  From
http://docs.python.org/library/bisect.html:

The
bisect() function can be useful for numeric table lookups. This example uses bisect() to look up a letter grade for an exam score (say) based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80 to 89 is a ‘B’, and so on:
>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']


 
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




From: Jon K Peck <[hidden email]>
To:
[hidden email]
Sent:
Sun, April 10, 2011 7:24:36 PM
Subject:
Re: [SPSSX-L] Match Files in which the BY clause is based on ranges


You can't do that directly with MATCH.  One solution would be to calculate the beginning range date corresponding to the date in the main data set and then match on that, assuming that this can be determined by a formula.  It would also be easy using programmability to construct an interval lookup table instead of using matching.


HTH,


Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        
David Wright <[hidden email]>
To:        
[hidden email]
Date:        
04/10/2011 10:32 AM
Subject:        
[SPSSX-L] Match Files in which the BY clause is based on ranges
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

Jon K Peck
In reply to this post by Jon K Peck
I have a new interval table lookup routine ready.  It is in the extendedTransforms.py module.  I will post it to the SPSS Community (www.ibm.com/developerworks/spssdevcentral) as soon as some paperwork is done.

Here's  how it works.
You create an SPSS dataset with the lower bound, upper bound, and the associated value. Suppose those variables are in variables lower, upper, and value in a dataset named table and your main dataset has a variable named x to use for the lookup.  You would use this with the SPSSINC TRANS extension command like this to create a new numeric variable named recodedvalue.

spssinc trans result=recodedvalue
/INITIAL 'extendedTransforms.vlookupinterval("lower", "value", "table", "upper")'
/FORMULA func(x).

This requires the Python programmability plugin, the SPSSINC TRANS extension command, and the new version of extendedTransforms.py.

If you omit the upper variable, the intervals are constructed from the values of lower, so each interval is defined by lower(i) to lower(i+1)

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Jon K Peck/Chicago/IBM@IBMUS
To:        [hidden email]
Date:        04/10/2011 11:31 AM
Subject:        Re: [SPSSX-L] Match Files in which the BY clause is based on              ranges
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




 You can't do that directly with MATCH.  One solution would be to calculate the beginning range date corresponding to the date in the main data set and then match on that, assuming that this can be determined by a formula.  It would also be easy using programmability to construct an interval lookup table instead of using matching.

HTH,


Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        
David Wright <[hidden email]>
To:        
[hidden email]
Date:        
04/10/2011 10:32 AM
Subject:        
[SPSSX-L] Match Files in which the BY clause is based on ranges
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

David Marso
Administrator
In reply to this post by wsu_wright
Use ADD FILES instead (BY will interleave the appropriate records IN will flag the desired cases
from
the master file) then use LAG to drag the appropriate values into the flagged records.
I call it the ADD -FLAG LAG AND DRAG- method (sounds almost like a medieval torture
technique).

HTH, David
------------
On Sun, 10 Apr 2011 12:29:26 -0400, David Wright <[hidden email]> wrote:

>I often use the match files command to pull into a current file data
>from validation tables using the "Table" option via the "By" clause.
>For example, I may have a validation table of departments related to
>major codes so it would be an easy:
>
>SORT CASES BY major.
>MATCH FILES file=* /TABLE=major_dept.sav /BY major.
>
>But what happens when the match is not a one-to-one match but a range,
>the 'BY' command would fail to find a matching record.
>
>Let's say I have two files. File 1 contains dates which I wish to add a
>semester ID to from a table that contains date ranges for each semester
>as in:
>
>File 1:
>work_sdate
>08/23/2010
>07/18/2010
>08/22/2010
>01/18/2011
>08/12/2010
>
>File 2 (validation table of date ranges for each semester code):
>sem_sdate  sem_edate sem
>05/16/2010  08/07/2010  U
>08/08/2010  12/25/2010  F
>12/26/2010  05/14/2011  S
>05/15/2011  08/06/2011  F
>
>File 1 after merging sem code:
>work_sdate  sem
>08/23/2010  F
>07/18/2010  U
>08/22/2010  F
>01/18/2011  S
>08/12/2010  F
>
>
>David,
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

Melissa Ives
In reply to this post by wsu_wright
Why not simply calculate SEM based on the start and end dates?  Is it because there are too many ranges?
(shouldn't May-Aug be U both years? I did that below.  UNTESTED)

If (range(work_sdate,date.mdy(5,15,2011),date.mdy(8,06,2011) or
  range(work_sdate,date.mdy(5,16,2010),date.mdy(8,07,2010)) SEM="U".
If (range(work_sdate,date.mdy(12,26,2010),date.mdy(5,14,2011)) SEM="S".
If (range(work_sdate,date.mdy(8,08,2010),date.mdy(12,25,2010)) SEM="F".

Melissa
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Wright
Sent: Sunday, April 10, 2011 11:29 AM
To: [hidden email]
Subject: [SPSSX-L] Match Files in which the BY clause is based on ranges

I often use the match files command to pull into a current file data from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range, the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a semester ID to from a table that contains date ranges for each semester as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

hillel vardi
In reply to this post by wsu_wright
  Shalom

One more way to handle your request is to create a "Table" from file 2
and then use a simple match to file 1 .

Here is the syntax to transfer file 2 into "Table" format .


DATA LIST /sem_sdate  sem_edate sem (aDATE10 2x adate10 a3) .
BEGIN DATA
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F
end data.
execute .
compute seqday= 60*60*24 .
compute sem_days=datedeff(  sem_edate,sem_sdate,"days") .
loop i=0  to sem_days .
compute sem_date=sem_sdate + i * seqday .
XSAVE outfile=tmp.sav / keep= sem_date sem .
end loop.
execute .
get file=tmp.sav .
formats sem_date(edate10) .


Hillel vardi
BGU

On 10/04/2011 19:29, David Wright wrote:

> I often use the match files command to pull into a current file data
> from validation tables using the "Table" option via the "By" clause.
> For example, I may have a validation table of departments related to
> major codes so it would be an easy:
>
> SORT CASES BY major.
> MATCH FILES file=* /TABLE=major_dept.sav /BY major.
>
> But what happens when the match is not a one-to-one match but a range,
> the 'BY' command would fail to find a matching record.
>
> Let's say I have two files. File 1 contains dates which I wish to add a
> semester ID to from a table that contains date ranges for each semester
> as in:
>
> File 1:
> work_sdate
> 08/23/2010
> 07/18/2010
> 08/22/2010
> 01/18/2011
> 08/12/2010
>
> File 2 (validation table of date ranges for each semester code):
> sem_sdate  sem_edate sem
> 05/16/2010  08/07/2010  U
> 08/08/2010  12/25/2010  F
> 12/26/2010  05/14/2011  S
> 05/15/2011  08/06/2011  F
>
> File 1 after merging sem code:
> work_sdate  sem
> 08/23/2010  F
> 07/18/2010  U
> 08/22/2010  F
> 01/18/2011  S
> 08/12/2010  F
>
>
> David,
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Match Files in which the BY clause is based on ranges

wsu_wright
In reply to this post by wsu_wright
Hillel, Melissa, David, Jon, Albert-Jan:

Thanks for all your replies to my syntax question, as always, SPSS
provides numerous methods to achieve the same result and because of that
I learn more every time I have these exchanges.

The python bisect option does work, including the new one from Jon, but
has limitations in distribution.  While I can run python from my
machine, this syntax segment is part of a larger script that is ran
across several offices many of which who do not have python installed.
Melissa your solution would work, however, as you noted, the start/end
dates do vary over time for each semester and with dates going back to
1980 and up to 2050 as part of the validation table, it would laborious
and error prone to manually set each term.  David’s add file method is
one that did not occur to me and does meet the needs of how the syntax
would be executed.  Ultimately I decided on a version of Hillel’s
method, I used it to generate the consecutive date structure to
re-populate the validation table on our production tables, so the
validation table now stores all possible dates.  Now our users can
simply use the match files method and be done with it in connection with
this validation table.

Thanks again for all of your help, it’s such a relief to have a
community that works together to find answers for others.

David


--------------------------------------------------------------------------------------------------
Subject: Re: Match Files in which the BY clause is based on ranges

From: "hillel vardi" <[hidden email]>
Shalom
One more way to handle your request is to create a "Table" from file 2
and then use a simple match to file 1 .
Here is the syntax to transfer file 2 into "Table" format .
DATA LIST /sem_sdate  sem_edate sem (aDATE10 2x adate10 a3) .
BEGIN DATA
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F
end data.
execute .
compute seqday= 60*60*24 .
compute sem_days=datedeff(  sem_edate,sem_sdate,"days") .
loop i=0  to sem_days .
compute sem_date=sem_sdate + i * seqday .
XSAVE outfile=tmp.sav / keep= sem_date sem .
end loop.
execute .
get file=tmp.sav .
formats sem_date(edate10) .

Hillel vardi
BGU


----------------------------------------------------------------------------------------------
From: "Melissa Ives" <[hidden email]>

Why not simply calculate SEM based on the start and end dates?  Is it
because there are too many ranges?
(shouldn't May-Aug be U both years? I did that below.  UNTESTED)
If (range(work_sdate,date.mdy(5,15,2011),date.mdy(8,06,2011) or
    range(work_sdate,date.mdy(5,16,2010),date.mdy(8,07,2010)) SEM="U".
If (range(work_sdate,date.mdy(12,26,2010),date.mdy(5,14,2011)) SEM="S".
If (range(work_sdate,date.mdy(8,08,2010),date.mdy(12,25,2010)) SEM="F".


-----------------------------------------------------------------------------------------------
From: "David Marso" <[hidden email]>
To: [hidden email]

Use ADD FILES instead (BY will interleave the appropriate records IN
will flag the desired cases from the master file) then use LAG to drag
the appropriate values into the flagged records.
I call it the ADD -FLAG LAG AND DRAG- method (sounds almost like a
medieval torture technique).


------------------------------------------------------------------------------------------------
From: "Jon K Peck"
To: [hidden email]

I have a new interval table lookup routine ready.  It is in the
extendedTransforms.py module.  I will post it to the SPSS Community (
www.ibm.com/developerworks/spssdevcentral) as soon as some paperwork is
done.

Here's  how it works.
You create an SPSS dataset with the lower bound, upper bound, and the
associated value. Suppose those variables are in variables lower, upper,
and value in a dataset named table and your main dataset has a variable
named x to use for the lookup.  You would use this with the SPSSINC
TRANS extension command like this to create a new numeric variable named
recodedvalue.
spssinc trans result=recodedvalue
/INITIAL 'extendedTransforms.vlookupinterval("lower", "value", "table",
"upper")'
/FORMULA func(x).
This requires the Python programmability plugin, the SPSSINC TRANS
extension command, and the new version of extendedTransforms.py.
If you omit the upper variable, the intervals are constructed from the
values of lower, so each interval is defined by lower(i) to lower(i+1)


-----------------------------------------------------------------------------------------
From: "Albert-Jan Roskam"

Check out the Python bisect module.  From
http://docs.python.org/library/bisect.html:

The bisect()function can be useful for numeric table lookups. This
example uses bisect()to look up a letter grade for an exam score (say)
based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80
to 89 is a ‘B’, and so on:
>>> def grade ( score , breakpoints = [ 60 , 70 , 80 , 90 ], grades =
>>> 'FDCBA' ):
... i = bisect ( breakpoints , score )
... return grades [ i ]
...
>>> [ grade ( score ) for score in [ 33 , 99 , 77 , 70 , 89 , 90 , 100
>>> ]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']


Original post from David Wright
I often use the match files command to pull into a current file data
from validation tables using the "Table" option via the "By" clause.
For example, I may have a validation table of departments related to
major codes so it would be an easy:

SORT CASES BY major.
MATCH FILES file=* /TABLE=major_dept.sav /BY major.

But what happens when the match is not a one-to-one match but a range,
the 'BY' command would fail to find a matching record.

Let's say I have two files. File 1 contains dates which I wish to add a
semester ID to from a table that contains date ranges for each semester
as in:

File 1:
work_sdate
08/23/2010
07/18/2010
08/22/2010
01/18/2011
08/12/2010

File 2 (validation table of date ranges for each semester code):
sem_sdate  sem_edate sem
05/16/2010  08/07/2010  U
08/08/2010  12/25/2010  F
12/26/2010  05/14/2011  S
05/15/2011  08/06/2011  F

File 1 after merging sem code:
work_sdate  sem
08/23/2010  F
07/18/2010  U
08/22/2010  F
01/18/2011  S
08/12/2010  F


David,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD