Flag Records in one file that meet criteria in records in another file

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Flag Records in one file that meet criteria in records in another file

walden2000
I would like to flag records in File1 that meet criteria in File2 records.  For Example:

File1:  Name City Income
File 2:  City Desired_Income_Low Desired_Income_High

Something like this:

File1:
Name1 Atlanta           79500
Name2 New Orleans    80000
Name3 San Francisco  91000
Name4 New York       105000
Name5 Atlanta           81000

File2:
Atlanta      70000  80000
Houston     60000  75000
New York  100000 110000

Thank you,


Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

Maguin, Eugene
Walden,

What you want to do makes sense to you but not to me. You say:

>>I would like to flag records in File1 that meet criteria in File2 records.
For Example:

File1:  Name City Income
File 2:  City Desired_Income_Low Desired_Income_High

Something like this:

File1:
Name1 Atlanta        79500
Name2 New Orleans    80000
Name3 San Francisco  91000
Name4 New York      105000
Name5 Atlanta        81000

File2:
Atlanta    70000  80000
Houston    60000  75000
New York  100000 110000

I have no idea what 'criteria' means so I'll guess that you do this first.
I suggest a match files command with the table subcommand. Like this.

Match files file='file1'/table='file2'/by city.

The result is

ResultFile:
Name1 Atlanta        79500   70000  80000
Name2 New Orleans    80000
Name3 San Francisco  91000
Name4 New York      105000  100000 110000
Name5 Atlanta        81000   70000  80000

From there you can go on and do whatever you want. You don't say what
'criteria' are so you are on your own now.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

walden2000

You are correct.  I used a bad example.  This is what I am trying to do:

 

Given: 

 

File1:  Chronically delayed flights from previous month  (about 64 records)

 

Airline

Flight_Origen

Fligth_Destination

Fligth_Scheduled_Departure_Time

 

 

File2: Scheduled flights from current or future month (about 80000 records)

 

Airline

Flight_Origen

Fligth_Destination

Flight_Date

Fligth_Scheduled_Departure_Time

 

I need to flag the records on File2 with matching Airline, Flight Origen, Flight Destination and with a Fligth_Scheduled_Departure_Time that is within 30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1.

 

I was using a MATRIX but the arrays appear to loose their value once they exit the MATRIX.  Any hints?  Thank you,

 

Enrique Berrios

Houston, Texas

 

From: Gene Maguin [via SPSSX Discussion] <[hidden email]>
To: walden2000 <[hidden email]>
Sent: Fri, January 7, 2011 10:44:45 AM
Subject: Re: Flag Records in one file that meet criteria in records in another file

Walden,

What you want to do makes sense to you but not to me. You say:

>>I would like to flag records in File1 that meet criteria in File2 records.
For Example:

File1:  Name City Income
File 2:  City Desired_Income_Low Desired_Income_High

Something like this:

File1:
Name1 Atlanta        79500
Name2 New Orleans    80000
Name3 San Francisco  91000
Name4 New York      105000
Name5 Atlanta        81000

File2:
Atlanta    70000  80000
Houston    60000  75000
New York  100000 110000

I have no idea what 'criteria' means so I'll guess that you do this first.
I suggest a match files command with the table subcommand. Like this.

Match files file='file1'/table='file2'/by city.

The result is

ResultFile:
Name1 Atlanta        79500   70000  80000
Name2 New Orleans    80000
Name3 San Francisco  91000
Name4 New York      105000  100000 110000
Name5 Atlanta        81000   70000  80000

From there you can go on and do whatever you want. You don't say what
'criteria' are so you are on your own now.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332121.html
To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here.

Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

Maguin, Eugene
Hi Enrique,

Ok. Much better! Thank you. A key question is whether there are multiple
records in file1 with the same values for Airline, Flight_Origin,
Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer
is NO. If the answer is YES, then something different will need to be done.

Then, the problem is just an 'add more variables to an existing file'
problem. Some assumptions:
1) File1 and file2 are sorted by the match variables.
2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock.


Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time=
   Flight_Scheduled_Departure_TimeFile1)/
   by Airline Flight_Origin Flight_Destination.
Execute. /* Richard will disagree on an Execute here, I think.

Compute flag=0.
If (abs(Flight_Scheduled_Departure_TimeFile1-
   Flight_Scheduled_Departure_Time) le 30) flag=1.

Do if (flag eq 1).
Print / Airline Flight_Origin Flight_Destination Flight_Date
   Flight_Scheduled_Departure_Time.
End if.
Execute.





Given:

File1:  Chronically delayed flights from previous month  (about 64 records)
Airline
Flight_Origin
Flight_Destination
Flight_Scheduled_Departure_Time

File2: Scheduled flights from current or future month (about 80000 records)
Airline
Flight_Origin
Flight_Destination
Flight_Date
Flight_Scheduled_Departure_Time

I need to flag the records on File2 with matching Airline, Flight Origin,
Flight Destination and with a Flight_Scheduled_Departure_Time that is within
30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1.

I was using a MATRIX but the arrays appear to loose their value once they
exit the MATRIX.  Any hints?  Thank you,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

walden2000

Thank you for your prompt response. 

 

I see a small problem.  The answer is NO to the key question.  The records are unique in file1 --> Airline, Flight_Origin, Flight_Destination, Flight_Scheduled_Departure_Time.  However, the match statement is by -->  Airline Flight_Origin Flight_Destination.

 

There is the possiblility of having more than one flight in file1 with the same Airline Flight_Origin Flight_Destination but different Flight_Scheduled_Departure_Time.

 

We could have a file1 that looks like this:

 

CO MIA IAH 1215

CO MIA IAH 1745

RU IAH DFW 1458

 

and a file2 that looks like this:

 

CO MIA IAH 01/10/2011 1205

CO MIA IAH 01/11/2011 1220

CO MIA IAH 01/12/2011 1300

RU IAH DFW 01/10/2011 1045

 

Creating a conflict with the match command...

 

Thank you for your help,

 
Enrique Berrios
Houston, Texas


From: Gene Maguin [via SPSSX Discussion] <[hidden email]>
To: walden2000 <[hidden email]>
Sent: Fri, January 7, 2011 12:51:30 PM
Subject: Re: Flag Records in one file that meet criteria in records in another file

Hi Enrique,

Ok. Much better! Thank you. A key question is whether there are multiple
records in file1 with the same values for Airline, Flight_Origin,
Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer
is NO. If the answer is YES, then something different will need to be done.

Then, the problem is just an 'add more variables to an existing file'
problem. Some assumptions:
1) File1 and file2 are sorted by the match variables.
2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock.


Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time=
   Flight_Scheduled_Departure_TimeFile1)/
   by Airline Flight_Origin Flight_Destination.
Execute. /* Richard will disagree on an Execute here, I think.

Compute flag=0.
If (abs(Flight_Scheduled_Departure_TimeFile1-
   Flight_Scheduled_Departure_Time) le 30) flag=1.

Do if (flag eq 1).
Print / Airline Flight_Origin Flight_Destination Flight_Date
   Flight_Scheduled_Departure_Time.
End if.
Execute.





Given:

File1:  Chronically delayed flights from previous month  (about 64 records)
Airline
Flight_Origin
Flight_Destination
Flight_Scheduled_Departure_Time

File2: Scheduled flights from current or future month (about 80000 records)
Airline
Flight_Origin
Flight_Destination
Flight_Date
Flight_Scheduled_Departure_Time

I need to flag the records on File2 with matching Airline, Flight Origin,
Flight Destination and with a Flight_Scheduled_Departure_Time that is within
30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1.

I was using a MATRIX but the arrays appear to loose their value once they
exit the MATRIX.  Any hints?  Thank you,

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332314.html
To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here.

Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

David Marso
Administrator
In reply to this post by walden2000
Looks like MATCH FILES will be problematic.
Instead, Use ADD FILES and LAG to drag the target arrival time from the "TABLE/CRITERIA" file into
the appropriate records.
Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME.

ADD FILES / FILE ="TARGET"
                   / IN    =TARGET
                   / FILE = "TESTDATA"
                  / IN     =TEST
                  / BY AIRLINE ORIG DEST ARRIVTIME.
IF TARGET  CHECKTIME=ARRIVTIME.
IF TEST CHECKTIME=LAG(CHECKTIME).
HTH, David
---


On Fri, 7 Jan 2011 13:50:47 -0800, walden2000 <[hidden email]> wrote:

>
>Thank you for your prompt response.Â
>Â
>I see a small problem.  The answer is NO to the key question.  The records are
>unique in file1 --> Airline, Flight_Origin, Flight_Destination,
>Flight_Scheduled_Departure_Time.  However, the match statement is by -->Â
>Airline Flight_Origin Flight_Destination.
>Â
>There is the possiblility of having more than one flight in file1 with the same
>Airline Flight_Origin Flight_Destination but different
>Flight_Scheduled_Departure_Time.
>Â
>We could have a file1 that looks like this:
>Â
>CO MIA IAH 1215
>CO MIA IAH 1745
>RU IAH DFW 1458
>Â
>and a file2 that looks like this:
>Â
>CO MIA IAH 01/10/2011 1205
>CO MIA IAH 01/11/2011 1220
>CO MIA IAH 01/12/2011 1300
>RU IAH DFW 01/10/2011 1045
>Â
>Creating a conflict with the match command...
>Â
>Thank you for your help,
>
>Enrique Berrios
>Houston, Texas
>
>
>
>
>________________________________
>From: Gene Maguin [via SPSSX Discussion]
><[hidden email]>
>To: walden2000 <[hidden email]>
>Sent: Fri, January 7, 2011 12:51:30 PM
>Subject: Re: Flag Records in one file that meet criteria in records in another
>file
>
>Hi Enrique,
>
>Ok. Much better! Thank you. A key question is whether there are multiple
>records in file1 with the same values for Airline, Flight_Origin,
>Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer
>is NO. If the answer is YES, then something different will need to be done.
>
>Then, the problem is just an 'add more variables to an existing file'
>problem. Some assumptions:
>1) File1 and file2 are sorted by the match variables.
>2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock.
>
>
>Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time=
>  Ã‚ Flight_Scheduled_Departure_TimeFile1)/
>  Ã‚ by Airline Flight_Origin Flight_Destination.
>Execute. /* Richard will disagree on an Execute here, I think.
>
>Compute flag=0.
>If (abs(Flight_Scheduled_Departure_TimeFile1-
>  Ã‚ Flight_Scheduled_Departure_Time) le 30) flag=1.
>
>Do if (flag eq 1).
>Print / Airline Flight_Origin Flight_Destination Flight_Date
>  Ã‚ Flight_Scheduled_Departure_Time.
>End if.
>Execute.
>
>
>
>
>
>Given:
>
>File1:  Chronically delayed flights from previous month  (about 64 records)
>Airline
>Flight_Origin
>Flight_Destination
>Flight_Scheduled_Departure_Time
>
>File2: Scheduled flights from current or future month (about 80000 records)
>Airline
>Flight_Origin
>Flight_Destination
>Flight_Date
>Flight_Scheduled_Departure_Time
>
>I need to flag the records on File2 with matching Airline, Flight Origin,
>Flight Destination and with a Flight_Scheduled_Departure_Time that is within
>30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1.
>
>I was using a MATRIX but the arrays appear to loose their value once they
>exit the MATRIX. Â Any hints? Â Thank you,
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>
>________________________________
>
>View message @
>http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-
criteria-in-records-in-another-file-tp3331997p3332314.html
>
>To unsubscribe from Flag Records in one file that meet criteria in records in
>another file, click here.
>
>
>
>
>--
>View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-
in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332516.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

walden2000

I apologize for doing a poor job describing my task.  Let me try again,

 

I have a number of flights that were identified last month as chronically late (by DOT standards).  This is what I know about these flights:

 

1 – Airline

2 – Origen

3 – Destination

4 – Schedule departure time

 

The task is to identify the flights scheduled in a future month with:

 

Same:

1 – Airline

2 – Origen

3 – Destination

and,

4 – Scheduled departure time within 30 minutes (plus or minus) of the Scheduled departure time of the identified flight (previous month).

 

The list of flights on the previous month file has unique flights.  Each data record represents aggregated data about a flight. 

The list of flights on the current a future month is structured such a way that each data record represents one flight for one day. 

 

This is what the files could look like:

 

FILE1:
CO MIA IAH 1215
CO MIA IAH 1745
RU IAH DFW 1458
 
and FILE2: (notice a date variable is included)
CO MIA IAH 1205 01/10/2011
CO MIA IAH 1220 01/11/2011
CO MIA IAH 1220 01/12/2011

RU IAH DFW 1045 01/10/2011

 

The task is to identify the flights on FILE2 that have a scheduled departure time within 30 minutes of a flight identified on FILE1.  Thank you for your help.

   

Enrique Berrios  



From: David Marso [via SPSSX Discussion] <[hidden email]>
To: walden2000 <[hidden email]>
Sent: Sat, January 8, 2011 11:46:22 PM
Subject: Re: Flag Records in one file that meet criteria in records in another file

Looks like MATCH FILES will be problematic.
Instead, Use ADD FILES and LAG to drag the target arrival time from the "TABLE/CRITERIA" file into
the appropriate records.
Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME.

ADD FILES / FILE ="TARGET"
                   / IN    =TARGET
                   / FILE = "TESTDATA"
                  / IN     =TEST
                  / BY AIRLINE ORIG DEST ARRIVTIME.
IF TARGET  CHECKTIME=ARRIVTIME.
IF TEST CHECKTIME=LAG(CHECKTIME).
HTH, David
---


On Fri, 7 Jan 2011 13:50:47 -0800, walden2000 <[hidden email]> wrote:

>
>Thank you for your prompt response.Â
>Â
>I see a small problem.  The answer is NO to the key question.  The records are
>unique in file1 --> Airline, Flight_Origin, Flight_Destination,
>Flight_Scheduled_Departure_Time.  However, the match statement is by -->Â
>Airline Flight_Origin Flight_Destination.
>Â
>There is the possiblility of having more than one flight in file1 with the same
>Airline Flight_Origin Flight_Destination but different
>Flight_Scheduled_Departure_Time.
>Â
>We could have a file1 that looks like this:
>Â
>CO MIA IAH 1215
>CO MIA IAH 1745
>RU IAH DFW 1458
>Â
>and a file2 that looks like this:
>Â
>CO MIA IAH 01/10/2011 1205
>CO MIA IAH 01/11/2011 1220
>CO MIA IAH 01/12/2011 1300
>RU IAH DFW 01/10/2011 1045
>Â
>Creating a conflict with the match command...
>Â
>Thank you for your help,
>
>Enrique Berrios
>Houston, Texas
>
>
>
>
>________________________________
>From: Gene Maguin [via SPSSX Discussion]
><[hidden email]>
>To: walden2000 <[hidden email]>
>Sent: Fri, January 7, 2011 12:51:30 PM
>Subject: Re: Flag Records in one file that meet criteria in records in another
>file
>
>Hi Enrique,
>
>Ok. Much better! Thank you. A key question is whether there are multiple
>records in file1 with the same values for Airline, Flight_Origin,
>Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer
>is NO. If the answer is YES, then something different will need to be done.
>
>Then, the problem is just an 'add more variables to an existing file'
>problem. Some assumptions:
>1) File1 and file2 are sorted by the match variables.
>2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock.
>
>
>Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time=
>  Ã‚ Flight_Scheduled_Departure_TimeFile1)/
>  Ã‚ by Airline Flight_Origin Flight_Destination.
>Execute. /* Richard will disagree on an Execute here, I think.
>
>Compute flag=0.
>If (abs(Flight_Scheduled_Departure_TimeFile1-
>  Ã‚ Flight_Scheduled_Departure_Time) le 30) flag=1.
>
>Do if (flag eq 1).
>Print / Airline Flight_Origin Flight_Destination Flight_Date
>  Ã‚ Flight_Scheduled_Departure_Time.
>End if.
>Execute.
>
>
>
>
>
>Given:
>
>File1:  Chronically delayed flights from previous month  (about 64 records)
>Airline
>Flight_Origin
>Flight_Destination
>Flight_Scheduled_Departure_Time
>
>File2: Scheduled flights from current or future month (about 80000 records)
>Airline
>Flight_Origin
>Flight_Destination
>Flight_Date
>Flight_Scheduled_Departure_Time
>
>I need to flag the records on File2 with matching Airline, Flight Origin,
>Flight Destination and with a Flight_Scheduled_Departure_Time that is within
>30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1.
>
>I was using a MATRIX but the arrays appear to loose their value once they
>exit the MATRIX. Â Any hints? Â Thank you,
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>
>________________________________
>
>View message @
>http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-?by-user=t
criteria-in-records-in-another-file-tp3331997p3332314.html
>
>To unsubscribe from Flag Records in one file that meet criteria in records in
>another file, click here.
>
>
>
>
>--
>View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-?by-user=t
in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332516.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3333621.html
To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here.

Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

David Marso
Administrator
In reply to this post by walden2000
Your description is just fine.  I believe we al understand the data situation.
What part of the SORT / ADD /LAG proposal doesn't  lead you towards a solution?
You really need to try out what people suggest and if you get stuck be specific about any problems or
ambiguities.  If you wish for someone to create your entire solution you may be looking in the wrong
place.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

walden2000
Hello David,
 
Thank you for your prompt reply.  The problem I see with the ADD FILES suggestion is that it matches the data records by:  AIRLINE, ORIG DEST and ARRVTIME.  The two files don't have matching ARRVTIME's.  The schedules are different from day to day - not all fligths operate at the same time everyday. 
 
The future file has a date variable that does not exist in the current file.  However, I could add a dummy date to the current file to combine the two files.
 
The task is to identify the records in the future file that are within 30 minutes (up or down) from any record in the current file - on a day by day bases.   
 
I appologize if I am asking for too much.  If that is the case - just ignore my question. 
 
Thank you,
 
Enrique Berrios
 


From: David Marso [via SPSSX Discussion] <[hidden email]>
To: walden2000 <[hidden email]>
Sent: Mon, January 10, 2011 4:20:29 PM
Subject: Re: Flag Records in one file that meet criteria in records in another file

Your description is just fine.  I believe we al understand the data situation.
What part of the SORT / ADD /LAG proposal doesn't  lead you towards a solution?
You really need to try out what people suggest and if you get stuck be specific about any problems or
ambiguities.  If you wish for someone to create your entire solution you may be looking in the wrong
place.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3335587.html
To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here.

Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in another

Richard Ristow
In reply to this post by David Marso
At 12:43 AM 1/9/2011, David Marso wrote:

>Use ADD FILES and LAG to drag the target arrival time from the
>"TABLE/CRITERIA" file into the appropriate records.
>Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME.
>
>ADD FILES / FILE ="TARGET"
>                    / IN    = TARGET
>                    / FILE = "TESTDATA"
>                    / IN     =TEST
>                    / BY AIRLINE ORIG DEST DepartTime.
>IF TARGET  CHECKTIME=DepartTime.
>IF TEST CHECKTIME=LAG(CHECKTIME).

I think this is definitely the approach, but suggest a modification.
The inquirer is looking for a match of
>. Airline,
>. Flight Origin,
>. Flight Destination, and a
>. Flight_Scheduled_Departure_Time that is within 30 minutes (+/-) of
>Flight_Scheduled_Departure_Time in File1..

To pick up all cases where the times match approximately like this,
you can start by setting the TEST (File1) times back by 30 minutes.
This is very rough code, and among other things must be modified if
two File1 times are within an hour of each other, but something like:

DATASET ACTIVATE File1   WINDOW=FRONT.
COMPUTE MatchTime = DepartTime - TIME.HMS(0,30).
ADD FILES
    / FILE   = "File1" / IN    = FromFile1
    / RENAME = (DepartTime=Drop_It)
    / FILE   = "File2" / IN    = FromFile2
    / RENAME = (DepartTime=MatchTime)
    / BY AIRLINE ORIG DEST MatchTime
    / DROP   = Drop_It.

NUMERIC DepartTime1 DepartTime2 (TIME5).
VAR LABELS
         DepartTime1 'Departure time from file 1'
         DepartTime2 'Departure time from file 2'.

IF      FromFile1 DepartTime1 = MatchTime + TIME.HMS(0,30).
IF      FromFile2 DepartTime2 = MatchTime.
IF     (FromFile2
    AND  AIRLINE = LAG(AIRLINE)
    AND  ORIG    = LAG(ORIG)
    AND  DEST    = LAG(DEST)   )
                   DepartTime1 = LAG(DepartTime1).

*   Select only File2 records matching a File1 record  .
*   by the specified criteria                          .

SELECT IF NOT MISSING (DepartTime1)
    AND ABS(DepartTime1 - DepartTime2) LE TIME.HMS(0,30).

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

Jon K Peck
In reply to this post by walden2000
A different way to approach this problem would be to use fuzzy matching as provided by the FUZZY extension command available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral).

FUZZY allows you to match "cases" and "controls" based on any number of criterion variables.  These can be specified to match exactly or within a per-variable tolerance.  No sorting is required.  If there are multiple eligible candidates, by default one is picked at random, but you can specify the number of matches to select.

FUZZY provides standard SPSS Statistics syntax, but it does not have a dialog box interface.

FUZZY requires the Python programmability plugin and at least Statistics version 17.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        walden2000 <[hidden email]>
To:        [hidden email]
Date:        01/11/2011 11:51 AM
Subject:        Re: [SPSSX-L] Flag Records in one file that meet criteria in              records in              another file
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hello David,
 
Thank you for your prompt reply.  The problem I see with the ADD FILES suggestion is that it matches the data records by:  AIRLINE, ORIG DEST and ARRVTIME.  The two files don't have matching ARRVTIME's.  The schedules are different from day to day - not all fligths operate at the same time everyday.  
 
The future file has a date variable that does not exist in the current file.  However, I could add a dummy date to the current file to combine the two files.
 
The task is to identify the records in the future file that are within 30 minutes (up or down) from any record in the current file - on a day by day bases.    

I appologize if I am asking for too much.  If that is the case - just ignore my question.  
 
Thank you,
 
Enrique Berrios
 


From: David Marso [via SPSSX Discussion] <[hidden email]>
To:
walden2000 <
[hidden email]>
Sent:
Mon, January 10, 2011 4:20:29 PM
Subject:
Re: Flag Records in one file that meet criteria in records in another file


Your description is just fine.  I believe we al understand the data situation.
What part of the SORT / ADD /LAG proposal doesn't  lead you towards a solution?
You really need to try out what people suggest and if you get stuck be specific about any problems or
ambiguities.  If you wish for someone to create your entire solution you may be looking in the wrong
place.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3335587.html
To unsubscribe from Flag Records in one file that meet criteria in records in another file,
click here.



View this message in context: Re: Flag Records in one file that meet criteria in records in another file
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in records in another file

Maguin, Eugene
In reply to this post by walden2000
Enrique,

A while back I offered a piece of code based on match files and table
command/subcommands. You replied that it didn't work and then others offer
pieces of code that haven't worked either. Maybe I misunderstand and
misremember (and I can't look back because I've deleted the old messages)
but the problem seems to be that there is no way to point at a flight (I'd
think that airline, flight number, destination would be sufficient to
uniquely identify the scheduled arrival time for any flight) and and say how
many times in the past that flight has been 'late'.

One question for you is how would you do this by hand? If you can't point to
a specific flight how can you tell how often that flight has been late in
the past?

Gene Maguin






________________________________

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
walden2000
Sent: Tuesday, January 11, 2011 10:33 AM
To: [hidden email]
Subject: Re: Flag Records in one file that meet criteria in records in
another file


Hello David,

Thank you for your prompt reply.  The problem I see with the ADD FILES
suggestion is that it matches the data records by:  AIRLINE, ORIG DEST and
ARRVTIME.  The two files don't have matching ARRVTIME's.  The schedules are
different from day to day - not all fligths operate at the same time
everyday.

The future file has a date variable that does not exist in the current file.
However, I could add a dummy date to the current file to combine the two
files.

The task is to identify the records in the future file that are within 30
minutes (up or down) from any record in the current file - on a day by day
bases.

I appologize if I am asking for too much.  If that is the case - just ignore
my question.

Thank you,

Enrique Berrios


________________________________

From: David Marso [via SPSSX Discussion] <[hidden email]>
To: walden2000 <[hidden email]>
Sent: Mon, January 10, 2011 4:20:29 PM
Subject: Re: Flag Records in one file that meet criteria in records in
another file

Your description is just fine.  I believe we al understand the data
situation.
What part of the SORT / ADD /LAG proposal doesn't  lead you towards a
solution?
You really need to try out what people suggest and if you get stuck be
specific about any problems or
ambiguities.  If you wish for someone to create your entire solution you may
be looking in the wrong
place.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



________________________________

View message @
http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-
meet-criteria-in-records-in-another-file-tp3331997p3335587.html
To unsubscribe from Flag Records in one file that meet criteria in records
in another file, click here
<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macr
o=unsubscribe_by_code&node=3331997&code=YmVycmlvc2Nhc3RlbGxvbkB5YWhvby5jb218
MzMzMTk5N3wtNTAxOTgyOTA=&by-user=t> .


________________________________

View this message in context: Re: Flag Records in one file that meet
criteria in records in another file
<http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that
-meet-criteria-in-records-in-another-file-tp3331997p3336612.html>
Sent from the SPSSX Discussion mailing list archive
<http://spssx-discussion.1045642.n5.nabble.com/>  at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Flag Records in one file that meet criteria in another

Richard Ristow
At 02:36 PM 1/11/2011, Gene Maguin wrote:

>I'd think that airline, flight number, destination would be
>sufficient to uniquely identify the scheduled arrival time for any flight

Bingo; and, for that, MATCH FILES with TABLE for what's described as
'File1' would work.

But it may be (this is how I read the inquiries, following David
Marso) that flight numbers aren't available, or perhaps are subject
to change from month to month; and that the departure time is also
subject to change, and so can't serve as an exact-match key to
identify the 'same' flight. From walden2000's post 11:15 AM 1/10/2011:

>I have a number of flights that were identified last month as
>chronically late.  This is what I know about these flights:
>
>1 Airline
>2 Origin
>3 Destination
>4 Schedule departure time

I'm surprised at there not being a flight number; but, that's the description.

>The task is to identify the flights scheduled in a future month with same
>1 Airline
>2 Origin
>3 Destination
>and,
>4 Scheduled departure time within 30 minutes of the Scheduled
>departure time of the identified flight (previous month).

Hence, the suggested more-complicated ADD FILES/LAG logic.

I don't know how the Python code from Developer Central would handle
this, but it certainly could be done as a fuzzy-logic problem:
require exact match on airline, origin and destination, and evaluate
on closeness of departure time. That could also be done in native
SPSS: many-to-many match File1 and File2 with Airline, Origin and
Destination as key, and screen the resulting matches by proximity of
departure times.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD