|
Hi All,
I have been trying to merge to dataset without success. Data1 (conteins the records of a specific action) Date patientid userid dose 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:23:20 01 20 .5 01/01/2006 08:20:18 02 21 .6 Data2 (contains a list of multiple actions) Date patientid userid 01/01/2006 08:20:13 01 20 01/01/2006 08:20:15 01 20 01/01/2006 08:20:16 01 20 01/01/2006 08:20:18 01 20 01/01/2006 08:20:19 01 20 01/01/2006 08:23:18 01 20 01/01/2006 08:23:29 01 20 01/01/2006 08:23:22 01 20 01/01/2006 08:20:17 02 21 01/01/2006 08:20:19 02 21 01/01/2006 08:20:20 02 21 I would like to add the dose to data2 but the times are not exactly the same. Desired result Date patientid userid dose 01/01/2006 08:20:13 01 20 .2 01/01/2006 08:20:15 01 20 .2 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:20:18 01 20 .2 01/01/2006 08:20:19 01 20 .2 01/01/2006 08:23:18 01 20 .5 01/01/2006 08:23:29 01 20 .5 01/01/2006 08:23:22 01 20 .5 01/01/2006 08:20:17 02 21 .6 01/01/2006 08:20:19 02 21 .6 01/01/2006 08:20:20 02 21 .6 Help would be greatly appreciated. Jose |
|
Jose,
It's not clear to me from your message exactly how you are doing the match files. In other words, what will be your BY variable(s)? It sounds like time might be. Is that true or does patient id or user id also figure in to the by variable list? It also might help to explain a tiny bit more about the meaning of the datasets and variables. Most important is to explain the By variable list. Gene Maguin |
|
In reply to this post by Jose Gerardo Benuzillo
Jose,
I've reformatted the data you sent so that I can read it easily and now have some questions. >>The line of data from the small dataset containing the prescription info should be contained in the bigger file. >>There are times when the patient/doctor interaction did not result in prescription. (I would like to select out these records). I don't understand this statement because in the final dataset every record has a prescription. >>90% of the time, the timestamps are exactly the same in both datasets but for the remaining 10% the times are slightly different 1 or two seconds apart. Given the way your final dataset looks, I would do the following in both datasets. Convert the m/d/y part of the date variable to a string. Extract the hr and min part of the date variable, convert to a 24 hour clock and then convert to a string variable. Do a match files using the Table subcommand and the first file as the Table file and patientid userid datestring and timestring as the by variables. So what does this do. It says that all Data2 records with the same values for the by variables will get the dose variable for the file 1 record with matching values for the by variables. I'd like to acknowledge that another way to do the same thing but without creating the string variables would be to copy the date variable in both files to a new name and then convert the seconds part of the date-time value to 0. I probably have a misplaced fear of the representational accuracy of the number of seconds since midnight of October 14, 14something, i.e., the start of the Gregorian calendar. Let me also say that my scheme will probably flop because you may have duplicate keys on the table file or the categorization of time into minutes doesn't account for your considering 2/1/2007 12:58:57 to be the same transaction as one at 2/1/2007 12:59:01. So you have to tell us why these records all get the same dose info even though there is an 11 second time span. 01/01/2006 08:23:18 01 20 .5 01/01/2006 08:23:29 01 20 .5 01/01/2006 08:23:22 01 20 .5 Date patientid userid dose 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:23:20 01 20 .5 01/01/2006 08:20:18 02 21 .6 Data2 (contains a list of multiple actions) Date patientid userid 01/01/2006 08:20:13 01 20 01/01/2006 08:20:15 01 20 01/01/2006 08:20:16 01 20 01/01/2006 08:20:18 01 20 01/01/2006 08:20:19 01 20 01/01/2006 08:23:18 01 20 01/01/2006 08:23:29 01 20 01/01/2006 08:23:22 01 20 01/01/2006 08:20:17 02 21 01/01/2006 08:20:19 02 21 01/01/2006 08:20:20 02 21 Desired result Date patientid userid dose 01/01/2006 08:20:13 01 20 .2 01/01/2006 08:20:15 01 20 .2 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:20:18 01 20 .2 01/01/2006 08:20:19 01 20 .2 01/01/2006 08:23:18 01 20 .5 01/01/2006 08:23:29 01 20 .5 01/01/2006 08:23:22 01 20 .5 01/01/2006 08:20:17 02 21 .6 01/01/2006 08:20:19 02 21 .6 01/01/2006 08:20:20 02 21 .6 |
|
Since this has come up -
At 02:55 PM 3/9/2007, Gene Maguin wrote ("Re: dificult merge"): >I probably have a misplaced fear of the representational accuracy of >the number of seconds since midnight of October 14, 14something, i.e., >the start of the Gregorian calendar. October 14, 1582. I have to look it up every time, myself. And you don't need to worry. SPSS numbers (which are 64-bit floating-point numbers following standard IEEE 754) have 53 bits of precision. There've been only about 1.3E10 seconds, less than 2**34 since October 14, 1582. An integer that size can be represented exactly with 9 bits to spare; or, full resolution is 2**-9 seconds, about 2 microseconds. Below is SPSS 15 draft output <WRR-not saved separately>: INPUT PROGRAM. . NUMERIC RightNow (DateTime26.4). . COMPUTE RightNow = $TIME. . END CASE. END FILE. END INPUT PROGRAM. NUMERIC Seconds (COMMA26.4). COMPUTE Seconds = RightNow. NUMERIC LogTime Log2Time (F7.4). COMPUTE LogTime = LG10(Seconds). COMPUTE Log2Time = LN(Seconds)/LN(2). LIST. List |-----------------------------|---------------------------| |Output Created |12-MAR-2007 20:33:22 | |-----------------------------|---------------------------| RightNow Seconds LogTime Log2Time 12-MAR-2007 20:33:22.4990 13,393,110,802.4990 10.1269 33.6408 Number of cases read: 1 Number of cases listed: 1 |
|
In reply to this post by Jose Gerardo Benuzillo
At 12:23 AM 3/9/2007, Jose Gerardo Benuzillo wrote:
>I have been trying to merge to dataset without success. With Gene Maguin, I've had a hard time knowing what keys you want to use, and what you want. You have two files, which I'm calling Dose - your "Data1 (conteins the records of a specific action)" Action - your "Data2 (contains a list of multiple actions)" I'm taking it that, a. The key is patient ID and date - that's fairly usual. (Your inputs aren't sorted that way, though.) b. You want to keep only 'Action' records c. To each 'Action' record, you want to attach the dose from the latest earlier 'dose' record; or, if there's no earlier dose record, from the first dose record. The key to 'merging' data where times are approximate, is to interleave (ADD FILES) rather than MATCH FILES, and then carry values across records. The following follows this logic; it is SPSS draft output. As always, syntax to load the test data follows the main listing (thanks, Gene). The result appears to match what is desired, except in the following instance: 01-JAN-2006 08:23:18 01 20 .5 Desired 01-JAN-2006 08:23:18 01 20 .2 Action I venture to suggest that, in this case, the 'desired' record was entered wrong. DATASET ACTIVATE Dose. LIST. List |-----------------------------|---------------------------| |Output Created |13-MAR-2007 12:16:19 | |-----------------------------|---------------------------| [Dose] Date patientid userid dose 01-JAN-2006 08:20:16 01 20 .2 01-JAN-2006 08:23:20 01 20 .5 01-JAN-2006 08:20:18 02 21 .6 Number of cases read: 3 Number of cases listed: 3 DATASET ACTIVATE Action. LIST. List |-----------------------------|---------------------------| |Output Created |13-MAR-2007 12:16:19 | |-----------------------------|---------------------------| [Action] Date patientid userid 01-JAN-2006 08:20:13 01 20 01-JAN-2006 08:20:15 01 20 01-JAN-2006 08:20:16 01 20 01-JAN-2006 08:20:18 01 20 01-JAN-2006 08:20:19 01 20 01-JAN-2006 08:23:18 01 20 01-JAN-2006 08:23:22 01 20 01-JAN-2006 08:23:29 01 20 01-JAN-2006 08:20:17 02 21 01-JAN-2006 08:20:19 02 21 01-JAN-2006 08:20:20 02 21 Number of cases read: 11 Number of cases listed: 11 * To 'carry backward' the earliest dose, create an . * artificial really early dose, with that value. . DATASET ACTIVATE Dose. DATASET DECLARE StrtDose. AGGREGATE OUTFILE = StrtDose /BREAK = patientid /Dose = FIRST(Dose). DATASET ACTIVATE StrtDose. NUMERIC Date (DATETIME20). COMPUTE Date = DATE.MDY(01,01,1900). . /*-- LIST /*-*/. * Interleave dose and event records, including the . * artificial starting dose . NEW FILE. ADD FILES /FILE=Dose /IN=Is_Dose /FILE=StrtDose /IN=Is_Start /FILE=Action /IN=Is_Act /BY patientid Date. STRING WhereFrom (A15). COMPUTE WhereFrom = '????????'. DO IF Is_Start. . COMPUTE WhereFrom = ' Start dose'. ELSE IF Is_Dose. . COMPUTE WhereFrom = ' Dose'. ELSE IF Is_Act. . COMPUTE WhereFrom = ' Action'. END IF. * Carry forward dose from 'dose' to 'action' records . IF Is_Act AND patientid EQ LAG(patientid) dose = LAG(dose). DATASET NAME AllEvent. LIST VARIABLES=Date to dose, Wherefrom. List |-----------------------------|---------------------------| |Output Created |13-MAR-2007 12:16:22 | |-----------------------------|---------------------------| [AllEvent] Date patientid userid dose WhereFrom 01-JAN-1900 00:00:00 01 . .2 Start dose 01-JAN-2006 08:20:13 01 20 .2 Action 01-JAN-2006 08:20:15 01 20 .2 Action 01-JAN-2006 08:20:16 01 20 .2 Dose 01-JAN-2006 08:20:16 01 20 .2 Action 01-JAN-2006 08:20:18 01 20 .2 Action 01-JAN-2006 08:20:19 01 20 .2 Action 01-JAN-2006 08:23:18 01 20 .2 Action 01-JAN-2006 08:23:20 01 20 .5 Dose 01-JAN-2006 08:23:22 01 20 .5 Action 01-JAN-2006 08:23:29 01 20 .5 Action 01-JAN-1900 00:00:00 02 . .6 Start dose 01-JAN-2006 08:20:17 02 21 .6 Action 01-JAN-2006 08:20:18 02 21 .6 Dose 01-JAN-2006 08:20:19 02 21 .6 Action 01-JAN-2006 08:20:20 02 21 .6 Action Number of cases read: 16 Number of cases listed: 16 * Keep only 'action' records, and compare with desired . NEW FILE. ADD FILES /FILE=Desired /IN=Is_Wantd /FILE=AllEvent /BY patientid Date. IF IS_Wantd EQ 1 WhereFrom = 'Desired'. SELECT IF Is_Wantd OR Is_Act. LIST VARIABLES=Date to dose, Wherefrom. List |-----------------------------|---------------------------| |Output Created |13-MAR-2007 12:16:22 | |-----------------------------|---------------------------| Date patientid userid dose WhereFrom 01-JAN-2006 08:20:13 01 20 .2 Desired 01-JAN-2006 08:20:13 01 20 .2 Action 01-JAN-2006 08:20:15 01 20 .2 Desired 01-JAN-2006 08:20:15 01 20 .2 Action 01-JAN-2006 08:20:16 01 20 .2 Desired 01-JAN-2006 08:20:16 01 20 .2 Action 01-JAN-2006 08:20:18 01 20 .2 Desired 01-JAN-2006 08:20:18 01 20 .2 Action 01-JAN-2006 08:20:19 01 20 .2 Desired 01-JAN-2006 08:20:19 01 20 .2 Action 01-JAN-2006 08:23:18 01 20 .5 Desired 01-JAN-2006 08:23:18 01 20 .2 Action 01-JAN-2006 08:23:22 01 20 .5 Desired 01-JAN-2006 08:23:22 01 20 .5 Action 01-JAN-2006 08:23:29 01 20 .5 Desired 01-JAN-2006 08:23:29 01 20 .5 Action 01-JAN-2006 08:20:17 02 21 .6 Desired 01-JAN-2006 08:20:17 02 21 .6 Action 01-JAN-2006 08:20:19 02 21 .6 Desired 01-JAN-2006 08:20:19 02 21 .6 Action 01-JAN-2006 08:20:20 02 21 .6 Desired 01-JAN-2006 08:20:20 02 21 .6 Action Number of cases read: 22 Number of cases listed: 22 =================== APPENDIX: Test data =================== * ............................................................. . * ....... Test data, reformatted by Gene Maguin: ........... . * ............................................................. . * Dose, was "Data1 (contains the records of a specific action)" . DATA LIST FIXED SKIP=2 / Date 05-23 (DATETIME) patientid 26-27 (F) userid 39-40 (F) dose 46-48 (F). Data List will read 1 records from the command file Variable Rec Start End Format Date 1 5 23 Datetime19.0 patientid 1 26 27 F2.0 userid 1 39 40 F2.0 dose 1 46 48 F3.0 BEGIN DATA Date patientid userid dose ----|---10----|---20----|---30----|---40----|---50----| 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:23:20 01 20 .5 01/01/2006 08:20:18 02 21 .6 END DATA. FORMATS Date (DATETIME20) /patientid (N2) /userid (N2) /dose (F4.1). SORT CASES BY patientid Date. DATASET NAME Dose. . /*-- LIST /*-*/. * ............................................................. . * Action, was "Data2 (contains a list of multiple actions)" . DATA LIST FIXED SKIP=2 / Date 05-23 (DATETIME) patientid 26-27 (F) userid 38-39 (F). Data List will read 1 records from the command file Variable Rec Start End Format Date 1 5 23 Datetime19.0 patientid 1 26 27 F2.0 userid 1 38 39 F2.0 BEGIN DATA. Date patientid userid ----|---10----|---20----|---30----|---40----|---50----| 01/01/2006 08:20:13 01 20 01/01/2006 08:20:15 01 20 01/01/2006 08:20:16 01 20 01/01/2006 08:20:18 01 20 01/01/2006 08:20:19 01 20 01/01/2006 08:23:18 01 20 01/01/2006 08:23:29 01 20 01/01/2006 08:23:22 01 20 01/01/2006 08:20:17 02 21 01/01/2006 08:20:19 02 21 01/01/2006 08:20:20 02 21 END DATA. FORMATS Date (DATETIME20) /patientid (N2) /userid (N2). SORT CASES BY patientid Date. DATASET NAME Action. . /*-- LIST /*-*/. * ............................................................. . * Desired result . DATA LIST FIXED SKIP=2 / Date 05-23 (DATETIME) patientid 31-32 (F) userid 44-45 (F) dose 50-53 (F). Data List will read 1 records from the command file Variable Rec Start End Format Date 1 5 23 Datetime19.0 patientid 1 31 32 F2.0 userid 1 44 45 F2.0 dose 1 50 53 F4.0 BEGIN DATA Date patientid userid dose ----|---10----|---20----|---30----|---40----|---50----| 01/01/2006 08:20:13 01 20 .2 01/01/2006 08:20:15 01 20 .2 01/01/2006 08:20:16 01 20 .2 01/01/2006 08:20:18 01 20 .2 01/01/2006 08:20:19 01 20 .2 01/01/2006 08:23:18 01 20 .5 01/01/2006 08:23:29 01 20 .5 01/01/2006 08:23:22 01 20 .5 01/01/2006 08:20:17 02 21 .6 01/01/2006 08:20:19 02 21 .6 01/01/2006 08:20:20 02 21 .6 END DATA. FORMATS Date (DATETIME20) /patientid (N2) /userid (N2) /dose (F4.1). SORT CASES BY patientid Date. DATASET NAME Desired. . /*-- LIST /*-*/. |
|
In reply to this post by Richard Ristow
Hi Richard
2**-9 is about 2 nanoseconds; milliseconds are thousandths of seconds You were closer the first time ;-) Regards Adrian Barnett On Tue, 13 Mar 2007 00:46:20 -0400, Richard Ristow <[hidden email]> wrote: >Correction. Bother. At 08:38 PM 3/12/2007, I wrote: > >>SPSS numbers (which are 64-bit floating-point numbers following >>standard IEEE 754) have 53 bits of precision. There've been only about >>1.3E10, or less than 2**34, seconds, since October 14, 1582. An >>integer that size can be represented exactly with 9 bits to spare. > >So far so good. > >>Full resolution is 2**-9 seconds, about 2 microseconds. > >"2**-9" is right, but 2**-9 seconds is about 2 MILLIseconds. |
|
At 01:44 AM 3/16/2007, Adrian Barnett wrote:
>2**-9 is about 2 nanoseconds; milliseconds are thousandths of seconds > >You were closer the first time ;-) Actually, I stumbled over the rules of thumb, the first time. The easy rule is, 2**10 ~= 10**3 (d1024 vs. d1000); hence, the common '1K' for both. Somehow I managed to think of 2**-9 as about two of those 'cycles' of 2**10; of course, it's about one of them. 2**-10=1/1024, or about a millisecond; 2**-9 is about twice that, about two milliseconds |
|
In reply to this post by Richard Ristow
On Fri, 16 Mar 2007 09:46:27 -0400, Richard Ristow <[hidden email]>
wrote: >At 01:44 AM 3/16/2007, Adrian Barnett wrote: > >>2**-9 is about 2 nanoseconds; milliseconds are thousandths of seconds >> >>You were closer the first time ;-) > >Actually, I stumbled over the rules of thumb, the first time. > >The easy rule is, > >2**10 ~= 10**3 (d1024 vs. d1000); hence, the common '1K' for both. > >Somehow I managed to think of 2**-9 as about two of those 'cycles' of >2**10; of course, it's about one of them. > >2**-10=1/1024, or about a millisecond; 2**-9 is about twice that, about >two milliseconds Hi Richard Time for some humble pie - I used the wrong function on my calculator - what I thought was raising 2 to the power of minus 9 was in fact multiplying 2 by 10 to the minus 9. As you pointed out, 2**-10 is approximately equal to 10 to the -3.0103. Regards Adrian Barnett |
| Free forum by Nabble | Edit this page |
