I would like to flag records in File1 that meet criteria in File2 records. For Example:
File1: Name City Income File 2: City Desired_Income_Low Desired_Income_High Something like this: File1: Name1 Atlanta 79500 Name2 New Orleans 80000 Name3 San Francisco 91000 Name4 New York 105000 Name5 Atlanta 81000 File2: Atlanta 70000 80000 Houston 60000 75000 New York 100000 110000 Thank you, |
Walden,
What you want to do makes sense to you but not to me. You say: >>I would like to flag records in File1 that meet criteria in File2 records. For Example: File1: Name City Income File 2: City Desired_Income_Low Desired_Income_High Something like this: File1: Name1 Atlanta 79500 Name2 New Orleans 80000 Name3 San Francisco 91000 Name4 New York 105000 Name5 Atlanta 81000 File2: Atlanta 70000 80000 Houston 60000 75000 New York 100000 110000 I have no idea what 'criteria' means so I'll guess that you do this first. I suggest a match files command with the table subcommand. Like this. Match files file='file1'/table='file2'/by city. The result is ResultFile: Name1 Atlanta 79500 70000 80000 Name2 New Orleans 80000 Name3 San Francisco 91000 Name4 New York 105000 100000 110000 Name5 Atlanta 81000 70000 80000 From there you can go on and do whatever you want. You don't say what 'criteria' are so you are on your own now. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
You are correct. I used a bad example. This is what I am trying to do: Given: File1: Chronically delayed flights from previous month (about 64 records) Airline Flight_Origen Fligth_Destination Fligth_Scheduled_Departure_Time File2: Scheduled flights from current or future month (about 80000 records) Airline Flight_Origen Fligth_Destination Flight_Date Fligth_Scheduled_Departure_Time I need to flag the records on File2 with matching Airline, Flight Origen, Flight Destination and with a Fligth_Scheduled_Departure_Time that is within 30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1. I was using a MATRIX but the arrays appear to loose their value once they exit the MATRIX. Any hints? Thank you, Enrique Berrios Houston, Texas From: Gene Maguin [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Fri, January 7, 2011 10:44:45 AM Subject: Re: Flag Records in one file that meet criteria in records in another file Walden, What you want to do makes sense to you but not to me. You say: >>I would like to flag records in File1 that meet criteria in File2 records. For Example: File1: Name City Income File 2: City Desired_Income_Low Desired_Income_High Something like this: File1: Name1 Atlanta 79500 Name2 New Orleans 80000 Name3 San Francisco 91000 Name4 New York 105000 Name5 Atlanta 81000 File2: Atlanta 70000 80000 Houston 60000 75000 New York 100000 110000 I have no idea what 'criteria' means so I'll guess that you do this first. I suggest a match files command with the table subcommand. Like this. Match files file='file1'/table='file2'/by city. The result is ResultFile: Name1 Atlanta 79500 70000 80000 Name2 New Orleans 80000 Name3 San Francisco 91000 Name4 New York 105000 100000 110000 Name5 Atlanta 81000 70000 80000 From there you can go on and do whatever you want. You don't say what 'criteria' are so you are on your own now. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332121.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here. |
Hi Enrique,
Ok. Much better! Thank you. A key question is whether there are multiple records in file1 with the same values for Airline, Flight_Origin, Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer is NO. If the answer is YES, then something different will need to be done. Then, the problem is just an 'add more variables to an existing file' problem. Some assumptions: 1) File1 and file2 are sorted by the match variables. 2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock. Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time= Flight_Scheduled_Departure_TimeFile1)/ by Airline Flight_Origin Flight_Destination. Execute. /* Richard will disagree on an Execute here, I think. Compute flag=0. If (abs(Flight_Scheduled_Departure_TimeFile1- Flight_Scheduled_Departure_Time) le 30) flag=1. Do if (flag eq 1). Print / Airline Flight_Origin Flight_Destination Flight_Date Flight_Scheduled_Departure_Time. End if. Execute. Given: File1: Chronically delayed flights from previous month (about 64 records) Airline Flight_Origin Flight_Destination Flight_Scheduled_Departure_Time File2: Scheduled flights from current or future month (about 80000 records) Airline Flight_Origin Flight_Destination Flight_Date Flight_Scheduled_Departure_Time I need to flag the records on File2 with matching Airline, Flight Origin, Flight Destination and with a Flight_Scheduled_Departure_Time that is within 30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1. I was using a MATRIX but the arrays appear to loose their value once they exit the MATRIX. Any hints? Thank you, ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you for your prompt response. I see a small problem. The answer is NO to the key question. The records are unique in file1 --> Airline, Flight_Origin, Flight_Destination, Flight_Scheduled_Departure_Time. However, the match statement is by --> Airline Flight_Origin Flight_Destination. There is the possiblility of having more than one flight in file1 with the same Airline Flight_Origin Flight_Destination but different Flight_Scheduled_Departure_Time. We could have a file1 that looks like this: CO MIA IAH 1215 CO MIA IAH 1745 RU IAH DFW 1458 and a file2 that looks like this: CO MIA IAH 01/10/2011 1205 CO MIA IAH 01/11/2011 1220 CO MIA IAH 01/12/2011 1300 RU IAH DFW 01/10/2011 1045 Creating a conflict with the match command...
Thank you for your help, From: Gene Maguin [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Fri, January 7, 2011 12:51:30 PM Subject: Re: Flag Records in one file that meet criteria in records in another file Hi Enrique, Ok. Much better! Thank you. A key question is whether there are multiple records in file1 with the same values for Airline, Flight_Origin, Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer is NO. If the answer is YES, then something different will need to be done. Then, the problem is just an 'add more variables to an existing file' problem. Some assumptions: 1) File1 and file2 are sorted by the match variables. 2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock. Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time= Flight_Scheduled_Departure_TimeFile1)/ by Airline Flight_Origin Flight_Destination. Execute. /* Richard will disagree on an Execute here, I think. Compute flag=0. If (abs(Flight_Scheduled_Departure_TimeFile1- Flight_Scheduled_Departure_Time) le 30) flag=1. Do if (flag eq 1). Print / Airline Flight_Origin Flight_Destination Flight_Date Flight_Scheduled_Departure_Time. End if. Execute. Given: File1: Chronically delayed flights from previous month (about 64 records) Airline Flight_Origin Flight_Destination Flight_Scheduled_Departure_Time File2: Scheduled flights from current or future month (about 80000 records) Airline Flight_Origin Flight_Destination Flight_Date Flight_Scheduled_Departure_Time I need to flag the records on File2 with matching Airline, Flight Origin, Flight Destination and with a Flight_Scheduled_Departure_Time that is within 30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1. I was using a MATRIX but the arrays appear to loose their value once they exit the MATRIX. Any hints? Thank you, ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332314.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here. |
Administrator
|
In reply to this post by walden2000
Looks like MATCH FILES will be problematic.
Instead, Use ADD FILES and LAG to drag the target arrival time from the "TABLE/CRITERIA" file into the appropriate records. Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME. ADD FILES / FILE ="TARGET" / IN =TARGET / FILE = "TESTDATA" / IN =TEST / BY AIRLINE ORIG DEST ARRIVTIME. IF TARGET CHECKTIME=ARRIVTIME. IF TEST CHECKTIME=LAG(CHECKTIME). HTH, David --- On Fri, 7 Jan 2011 13:50:47 -0800, walden2000 <[hidden email]> wrote: > >Thank you for your prompt response. > >I see a small problem. The answer is NO to the key question.  The records are >unique in file1 --> Airline, Flight_Origin, Flight_Destination, >Flight_Scheduled_Departure_Time. However, the match statement is by --> >Airline Flight_Origin Flight_Destination. > >There is the possiblility of having more than one flight in file1 with the same >Airline Flight_Origin Flight_Destination but different >Flight_Scheduled_Departure_Time. > >We could have a file1 that looks like this: > >CO MIA IAH 1215 >CO MIA IAH 1745 >RU IAH DFW 1458 > >and a file2 that looks like this: > >CO MIA IAH 01/10/2011 1205 >CO MIA IAH 01/11/2011 1220 >CO MIA IAH 01/12/2011 1300 >RU IAH DFW 01/10/2011 1045 > >Creating a conflict with the match command... > >Thank you for your help, > >Enrique Berrios >Houston, Texas > > > > >________________________________ >From: Gene Maguin [via SPSSX Discussion] ><[hidden email]> >To: walden2000 <[hidden email]> >Sent: Fri, January 7, 2011 12:51:30 PM >Subject: Re: Flag Records in one file that meet criteria in records in another >file > >Hi Enrique, > >Ok. Much better! Thank you. A key question is whether there are multiple >records in file1 with the same values for Airline, Flight_Origin, >Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer >is NO. If the answer is YES, then something different will need to be done. > >Then, the problem is just an 'add more variables to an existing file' >problem. Some assumptions: >1) File1 and file2 are sorted by the match variables. >2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock. > > >Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time= >  Flight_Scheduled_Departure_TimeFile1)/ >  by Airline Flight_Origin Flight_Destination. >Execute. /* Richard will disagree on an Execute here, I think. > >Compute flag=0. >If (abs(Flight_Scheduled_Departure_TimeFile1- >  Flight_Scheduled_Departure_Time) le 30) flag=1. > >Do if (flag eq 1). >Print / Airline Flight_Origin Flight_Destination Flight_Date >  Flight_Scheduled_Departure_Time. >End if. >Execute. > > > > > >Given: > >File1:  Chronically delayed flights from previous month  (about 64 records) >Airline >Flight_Origin >Flight_Destination >Flight_Scheduled_Departure_Time > >File2: Scheduled flights from current or future month (about 80000 records) >Airline >Flight_Origin >Flight_Destination >Flight_Date >Flight_Scheduled_Departure_Time > >I need to flag the records on File2 with matching Airline, Flight Origin, >Flight Destination and with a Flight_Scheduled_Departure_Time that is within >30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1. > >I was using a MATRIX but the arrays appear to loose their value once they >exit the MATRIX.  Any hints?  Thank you, > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD > > >________________________________ > >View message @ >http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet- > >To unsubscribe from Flag Records in one file that meet criteria in records in >another file, click here. > > > > >-- >View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Flag-Records- in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3332516.html >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I apologize for doing a poor job describing my task. Let me try again, I have a number of flights that were identified last month as chronically late (by DOT standards). This is what I know about these flights: 1 – Airline 2 – Origen 3 – Destination 4 – Schedule departure time The task is to identify the flights scheduled in a future month with: Same: 1 – Airline 2 – Origen 3 – Destination and, 4 – Scheduled departure time within 30 minutes (plus or minus) of the Scheduled departure time of the identified flight (previous month). The list of flights on the previous month file has unique flights. Each data record represents aggregated data about a flight. The list of flights on the current a future month is structured such a way that each data record represents one flight for one day. This is what the files could look like: FILE1: RU IAH DFW 1045 01/10/2011 The task is to identify the flights on FILE2 that have a scheduled departure time within 30 minutes of a flight identified on FILE1. Thank you for your help. Enrique Berrios From: David Marso [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Sat, January 8, 2011 11:46:22 PM Subject: Re: Flag Records in one file that meet criteria in records in another file Looks like MATCH FILES will be problematic. Instead, Use ADD FILES and LAG to drag the target arrival time from the "TABLE/CRITERIA" file into the appropriate records. Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME. ADD FILES / FILE ="TARGET" / IN =TARGET / FILE = "TESTDATA" / IN =TEST / BY AIRLINE ORIG DEST ARRIVTIME. IF TARGET CHECKTIME=ARRIVTIME. IF TEST CHECKTIME=LAG(CHECKTIME). HTH, David --- On Fri, 7 Jan 2011 13:50:47 -0800, walden2000 <[hidden email]> wrote: > >Thank you for your prompt response. > >I see a small problem. The answer is NO to the key question.  The records are >unique in file1 --> Airline, Flight_Origin, Flight_Destination, >Flight_Scheduled_Departure_Time. However, the match statement is by --> >Airline Flight_Origin Flight_Destination. > >There is the possiblility of having more than one flight in file1 with the same >Airline Flight_Origin Flight_Destination but different >Flight_Scheduled_Departure_Time. > >We could have a file1 that looks like this: > >CO MIA IAH 1215 >CO MIA IAH 1745 >RU IAH DFW 1458 > >and a file2 that looks like this: > >CO MIA IAH 01/10/2011 1205 >CO MIA IAH 01/11/2011 1220 >CO MIA IAH 01/12/2011 1300 >RU IAH DFW 01/10/2011 1045 > >Creating a conflict with the match command... > >Thank you for your help, > >Enrique Berrios >Houston, Texas > > > > >________________________________ >From: Gene Maguin [via SPSSX Discussion] ><[hidden email]> >To: walden2000 <[hidden email]> >Sent: Fri, January 7, 2011 12:51:30 PM >Subject: Re: Flag Records in one file that meet criteria in records in another >file > >Hi Enrique, > >Ok. Much better! Thank you. A key question is whether there are multiple >records in file1 with the same values for Airline, Flight_Origin, >Flight_Destination, Flight_Scheduled_Departure_Time. I'll assume the answer >is NO. If the answer is YES, then something different will need to be done. > >Then, the problem is just an 'add more variables to an existing file' >problem. Some assumptions: >1) File1 and file2 are sorted by the match variables. >2) Flight_Scheduled_Departure_Time are both time values on a 24 hour clock. > > >Match files file=file2/table=file1/rename=(Flight_Scheduled_Departure_Time= >  Flight_Scheduled_Departure_TimeFile1)/ >  by Airline Flight_Origin Flight_Destination. >Execute. /* Richard will disagree on an Execute here, I think. > >Compute flag=0. >If (abs(Flight_Scheduled_Departure_TimeFile1- >  Flight_Scheduled_Departure_Time) le 30) flag=1. > >Do if (flag eq 1). >Print / Airline Flight_Origin Flight_Destination Flight_Date >  Flight_Scheduled_Departure_Time. >End if. >Execute. > > > > > >Given: > >File1:  Chronically delayed flights from previous month  (about 64 records) >Airline >Flight_Origin >Flight_Destination >Flight_Scheduled_Departure_Time > >File2: Scheduled flights from current or future month (about 80000 records) >Airline >Flight_Origin >Flight_Destination >Flight_Date >Flight_Scheduled_Departure_Time > >I need to flag the records on File2 with matching Airline, Flight Origin, >Flight Destination and with a Flight_Scheduled_Departure_Time that is within >30 minutes (+/-) of Flight_Scheduled_Departure_Time in File1. > >I was using a MATRIX but the arrays appear to loose their value once they >exit the MATRIX.  Any hints?  Thank you, > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD > > >________________________________ > >View message @ >http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-?by-user=t > >To unsubscribe from Flag Records in one file that meet criteria in records in >another file, click here. > > > > >-- >View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-?by-user=t >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3333621.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here. |
Administrator
|
In reply to this post by walden2000
Your description is just fine. I believe we al understand the data situation.
What part of the SORT / ADD /LAG proposal doesn't lead you towards a solution? You really need to try out what people suggest and if you get stuck be specific about any problems or ambiguities. If you wish for someone to create your entire solution you may be looking in the wrong place. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hello David,
Thank you for your prompt reply. The problem I see with the ADD FILES suggestion is that it matches the data records by: AIRLINE, ORIG DEST and ARRVTIME. The two files don't have matching ARRVTIME's. The schedules are different from day to day - not all fligths operate at the same time everyday.
The future file has a date variable that does not exist in the current file. However, I could add a dummy date to the current file to combine the two files.
The task is to identify the records in the future file that are within 30 minutes (up or down) from any record in the current file - on a day by day bases.
I appologize if I am asking for too much. If that is the case - just ignore my question.
Thank you,
From: David Marso [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Mon, January 10, 2011 4:20:29 PM Subject: Re: Flag Records in one file that meet criteria in records in another file Your description is just fine. I believe we al understand the data situation. What part of the SORT / ADD /LAG proposal doesn't lead you towards a solution? You really need to try out what people suggest and if you get stuck be specific about any problems or ambiguities. If you wish for someone to create your entire solution you may be looking in the wrong place. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3335587.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here. |
In reply to this post by David Marso
At 12:43 AM 1/9/2011, David Marso wrote:
>Use ADD FILES and LAG to drag the target arrival time from the >"TABLE/CRITERIA" file into the appropriate records. >Presuming both files are sorted by AIRLINE ORIG DEST ARRIVTIME. > >ADD FILES / FILE ="TARGET" > / IN = TARGET > / FILE = "TESTDATA" > / IN =TEST > / BY AIRLINE ORIG DEST DepartTime. >IF TARGET CHECKTIME=DepartTime. >IF TEST CHECKTIME=LAG(CHECKTIME). I think this is definitely the approach, but suggest a modification. The inquirer is looking for a match of >. Airline, >. Flight Origin, >. Flight Destination, and a >. Flight_Scheduled_Departure_Time that is within 30 minutes (+/-) of >Flight_Scheduled_Departure_Time in File1.. To pick up all cases where the times match approximately like this, you can start by setting the TEST (File1) times back by 30 minutes. This is very rough code, and among other things must be modified if two File1 times are within an hour of each other, but something like: DATASET ACTIVATE File1 WINDOW=FRONT. COMPUTE MatchTime = DepartTime - TIME.HMS(0,30). ADD FILES / FILE = "File1" / IN = FromFile1 / RENAME = (DepartTime=Drop_It) / FILE = "File2" / IN = FromFile2 / RENAME = (DepartTime=MatchTime) / BY AIRLINE ORIG DEST MatchTime / DROP = Drop_It. NUMERIC DepartTime1 DepartTime2 (TIME5). VAR LABELS DepartTime1 'Departure time from file 1' DepartTime2 'Departure time from file 2'. IF FromFile1 DepartTime1 = MatchTime + TIME.HMS(0,30). IF FromFile2 DepartTime2 = MatchTime. IF (FromFile2 AND AIRLINE = LAG(AIRLINE) AND ORIG = LAG(ORIG) AND DEST = LAG(DEST) ) DepartTime1 = LAG(DepartTime1). * Select only File2 records matching a File1 record . * by the specified criteria . SELECT IF NOT MISSING (DepartTime1) AND ABS(DepartTime1 - DepartTime2) LE TIME.HMS(0,30). ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by walden2000
A different way to approach this problem
would be to use fuzzy matching as provided by the FUZZY extension command
available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral).
FUZZY allows you to match "cases" and "controls" based on any number of criterion variables. These can be specified to match exactly or within a per-variable tolerance. No sorting is required. If there are multiple eligible candidates, by default one is picked at random, but you can specify the number of matches to select. FUZZY provides standard SPSS Statistics syntax, but it does not have a dialog box interface. FUZZY requires the Python programmability plugin and at least Statistics version 17. HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: walden2000 <[hidden email]> To: [hidden email] Date: 01/11/2011 11:51 AM Subject: Re: [SPSSX-L] Flag Records in one file that meet criteria in records in another file Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello David, Thank you for your prompt reply. The problem I see with the ADD FILES suggestion is that it matches the data records by: AIRLINE, ORIG DEST and ARRVTIME. The two files don't have matching ARRVTIME's. The schedules are different from day to day - not all fligths operate at the same time everyday. The future file has a date variable that does not exist in the current file. However, I could add a dummy date to the current file to combine the two files. The task is to identify the records in the future file that are within 30 minutes (up or down) from any record in the current file - on a day by day bases. I appologize if I am asking for too much. If that is the case - just ignore my question. Thank you, Enrique Berrios From: David Marso [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Mon, January 10, 2011 4:20:29 PM Subject: Re: Flag Records in one file that meet criteria in records in another file Your description is just fine. I believe we al understand the data situation. What part of the SORT / ADD /LAG proposal doesn't lead you towards a solution? You really need to try out what people suggest and if you get stuck be specific about any problems or ambiguities. If you wish for someone to create your entire solution you may be looking in the wrong place. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that-meet-criteria-in-records-in-another-file-tp3331997p3335587.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here. View this message in context: Re: Flag Records in one file that meet criteria in records in another file Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
In reply to this post by walden2000
Enrique,
A while back I offered a piece of code based on match files and table command/subcommands. You replied that it didn't work and then others offer pieces of code that haven't worked either. Maybe I misunderstand and misremember (and I can't look back because I've deleted the old messages) but the problem seems to be that there is no way to point at a flight (I'd think that airline, flight number, destination would be sufficient to uniquely identify the scheduled arrival time for any flight) and and say how many times in the past that flight has been 'late'. One question for you is how would you do this by hand? If you can't point to a specific flight how can you tell how often that flight has been late in the past? Gene Maguin ________________________________ From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of walden2000 Sent: Tuesday, January 11, 2011 10:33 AM To: [hidden email] Subject: Re: Flag Records in one file that meet criteria in records in another file Hello David, Thank you for your prompt reply. The problem I see with the ADD FILES suggestion is that it matches the data records by: AIRLINE, ORIG DEST and ARRVTIME. The two files don't have matching ARRVTIME's. The schedules are different from day to day - not all fligths operate at the same time everyday. The future file has a date variable that does not exist in the current file. However, I could add a dummy date to the current file to combine the two files. The task is to identify the records in the future file that are within 30 minutes (up or down) from any record in the current file - on a day by day bases. I appologize if I am asking for too much. If that is the case - just ignore my question. Thank you, Enrique Berrios ________________________________ From: David Marso [via SPSSX Discussion] <[hidden email]> To: walden2000 <[hidden email]> Sent: Mon, January 10, 2011 4:20:29 PM Subject: Re: Flag Records in one file that meet criteria in records in another file Your description is just fine. I believe we al understand the data situation. What part of the SORT / ADD /LAG proposal doesn't lead you towards a solution? You really need to try out what people suggest and if you get stuck be specific about any problems or ambiguities. If you wish for someone to create your entire solution you may be looking in the wrong place. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ________________________________ View message @ http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that- meet-criteria-in-records-in-another-file-tp3331997p3335587.html To unsubscribe from Flag Records in one file that meet criteria in records in another file, click here <http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macr o=unsubscribe_by_code&node=3331997&code=YmVycmlvc2Nhc3RlbGxvbkB5YWhvby5jb218 MzMzMTk5N3wtNTAxOTgyOTA=&by-user=t> . ________________________________ View this message in context: Re: Flag Records in one file that meet criteria in records in another file <http://spssx-discussion.1045642.n5.nabble.com/Flag-Records-in-one-file-that -meet-criteria-in-records-in-another-file-tp3331997p3336612.html> Sent from the SPSSX Discussion mailing list archive <http://spssx-discussion.1045642.n5.nabble.com/> at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
At 02:36 PM 1/11/2011, Gene Maguin wrote:
>I'd think that airline, flight number, destination would be >sufficient to uniquely identify the scheduled arrival time for any flight Bingo; and, for that, MATCH FILES with TABLE for what's described as 'File1' would work. But it may be (this is how I read the inquiries, following David Marso) that flight numbers aren't available, or perhaps are subject to change from month to month; and that the departure time is also subject to change, and so can't serve as an exact-match key to identify the 'same' flight. From walden2000's post 11:15 AM 1/10/2011: >I have a number of flights that were identified last month as >chronically late. This is what I know about these flights: > >1 Airline >2 Origin >3 Destination >4 Schedule departure time I'm surprised at there not being a flight number; but, that's the description. >The task is to identify the flights scheduled in a future month with same >1 Airline >2 Origin >3 Destination >and, >4 Scheduled departure time within 30 minutes of the Scheduled >departure time of the identified flight (previous month). Hence, the suggested more-complicated ADD FILES/LAG logic. I don't know how the Python code from Developer Central would handle this, but it certainly could be done as a fuzzy-logic problem: require exact match on airline, origin and destination, and evaluate on closeness of departure time. That could also be done in native SPSS: many-to-many match File1 and File2 with Airline, Origin and Destination as key, and screen the resulting matches by proximity of departure times. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |