dificult merge

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

dificult merge

Jose Gerardo Benuzillo
Hi All,

 

I have been trying to merge to dataset without success.

 

Data1 (conteins the records of a specific action)

Date                             patientid            userid
dose

01/01/2006 08:20:16       01                     20
.2

01/01/2006 08:23:20       01                     20
.5

01/01/2006 08:20:18       02                     21
.6

 

Data2 (contains a list of multiple actions)

Date                             patientid            userid

01/01/2006 08:20:13       01                     20

01/01/2006 08:20:15       01                     20

01/01/2006 08:20:16       01                     20

01/01/2006 08:20:18       01                     20

01/01/2006 08:20:19       01                     20

 

01/01/2006 08:23:18       01                     20

01/01/2006 08:23:29       01                     20

01/01/2006 08:23:22       01                     20

 

01/01/2006 08:20:17       02                     21

01/01/2006 08:20:19       02                     21

01/01/2006 08:20:20       02                     21

 

I would like to add the dose to data2 but the times are not exactly the
same.

 

Desired result

Date                             patientid            userid
dose

01/01/2006 08:20:13       01                     20
.2

01/01/2006 08:20:15       01                     20
.2

01/01/2006 08:20:16       01                     20
.2

01/01/2006 08:20:18       01                     20
.2

01/01/2006 08:20:19       01                     20
.2

 

01/01/2006 08:23:18       01                     20
.5

01/01/2006 08:23:29       01                     20
.5

01/01/2006 08:23:22       01                     20
.5

 

01/01/2006 08:20:17       02                     21
.6

01/01/2006 08:20:19       02                     21
.6

01/01/2006 08:20:20       02                     21
.6

 

Help would be greatly appreciated.

 

Jose

 

 

 

 

 

 

 

 

 

 

 
Reply | Threaded
Open this post in threaded view
|

Re: dificult merge

Maguin, Eugene
Jose,

It's not clear to me from your message exactly how you are doing the match
files. In other words, what will be your BY variable(s)? It sounds like time
might be. Is that true or does patient id or user id also figure in to the
by variable list? It also might help to explain a tiny bit more about the
meaning of the datasets and variables. Most important is to explain the By
variable list.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: dificult merge

Maguin, Eugene
In reply to this post by Jose Gerardo Benuzillo
Jose,

I've reformatted the data you sent so that I can read it easily and now have
some questions.

>>The line of data from the small dataset containing the prescription info
should be contained in the bigger file.

>>There are times when the patient/doctor interaction did not result in
prescription. (I would like to select out these records).

I don't understand this statement because in the final dataset every record
has a prescription.

>>90% of the time, the timestamps are exactly the same in both datasets
but for the remaining 10% the times are slightly different 1 or two
seconds apart.

Given the way your final dataset looks, I would do the following in both
datasets. Convert the m/d/y part of the date variable to a string. Extract
the hr and min part of the date variable, convert to a 24 hour clock and
then convert to a string variable. Do a match files using the Table
subcommand and the first file as the Table file and patientid userid
datestring and timestring as the by variables. So what does this do. It says
that all Data2 records with the same values for the by variables will get
the dose variable for the file 1 record with matching values for the by
variables. I'd like to acknowledge that another way to do the same thing but
without creating the string variables would be to copy the date variable in
both files to a new name and then convert the seconds part of the date-time
value to 0. I probably have a misplaced fear of the representational
accuracy of the number of seconds since midnight of October 14, 14something,
i.e., the start of the Gregorian calendar. Let me also say that my scheme
will probably flop because you may have duplicate keys on the table file or
the categorization of time into minutes doesn't account for your considering
2/1/2007 12:58:57 to be the same transaction as one at 2/1/2007 12:59:01. So
you have to tell us why these records all get the same dose info even though
there is an 11 second time span.

01/01/2006 08:23:18       01           20      .5
01/01/2006 08:23:29       01           20      .5
01/01/2006 08:23:22       01           20      .5




Date                 patientid  userid    dose
01/01/2006 08:20:16  01           20      .2
01/01/2006 08:23:20  01           20      .5
01/01/2006 08:20:18  02           21      .6

Data2 (contains a list of multiple actions)
Date                 patientid userid
01/01/2006 08:20:13  01          20
01/01/2006 08:20:15  01          20
01/01/2006 08:20:16  01          20
01/01/2006 08:20:18  01          20
01/01/2006 08:20:19  01          20
01/01/2006 08:23:18  01          20
01/01/2006 08:23:29  01          20
01/01/2006 08:23:22  01          20
01/01/2006 08:20:17  02          21
01/01/2006 08:20:19  02          21
01/01/2006 08:20:20  02          21


Desired result

Date                  patientid  userid       dose
01/01/2006 08:20:13       01           20      .2
01/01/2006 08:20:15       01           20      .2
01/01/2006 08:20:16       01           20      .2
01/01/2006 08:20:18       01           20      .2
01/01/2006 08:20:19       01           20      .2
01/01/2006 08:23:18       01           20      .5
01/01/2006 08:23:29       01           20      .5
01/01/2006 08:23:22       01           20      .5
01/01/2006 08:20:17       02           21      .6
01/01/2006 08:20:19       02           21      .6
01/01/2006 08:20:20       02           21      .6
Reply | Threaded
Open this post in threaded view
|

Precision of SPSS time

Richard Ristow
Since this has come up -

At 02:55 PM 3/9/2007, Gene Maguin wrote ("Re: dificult merge"):


>I probably have a misplaced fear of the representational accuracy of
>the number of seconds since midnight of October 14, 14something, i.e.,
>the start of the Gregorian calendar.

October 14, 1582. I have to look it up every time, myself.

And you don't need to worry. SPSS numbers (which are 64-bit
floating-point numbers following standard IEEE 754) have 53 bits of
precision. There've been only about 1.3E10 seconds, less than 2**34
since October 14, 1582. An integer that size can be represented exactly
with 9 bits to spare; or, full resolution is 2**-9 seconds, about 2
microseconds. Below is SPSS 15 draft output <WRR-not saved separately>:

INPUT PROGRAM.
.  NUMERIC RightNow  (DateTime26.4).
.  COMPUTE RightNow = $TIME.
.  END CASE.
END FILE.
END INPUT PROGRAM.

NUMERIC Seconds  (COMMA26.4).
COMPUTE Seconds = RightNow.

NUMERIC LogTime Log2Time    (F7.4).

COMPUTE LogTime  = LG10(Seconds).
COMPUTE Log2Time = LN(Seconds)/LN(2).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |12-MAR-2007 20:33:22       |
|-----------------------------|---------------------------|

                   RightNow                    Seconds LogTime Log2Time

  12-MAR-2007 20:33:22.4990        13,393,110,802.4990 10.1269  33.6408

Number of cases read:  1    Number of cases listed:  1
Reply | Threaded
Open this post in threaded view
|

Re: dificult merge

Richard Ristow
In reply to this post by Jose Gerardo Benuzillo
At 12:23 AM 3/9/2007, Jose Gerardo Benuzillo wrote:

>I have been trying to merge to dataset without success.

With Gene Maguin, I've had a hard time knowing what keys you want to
use, and what you want.

You have two files, which I'm calling
Dose   - your "Data1 (conteins the records of a specific action)"
Action - your "Data2 (contains a list of multiple actions)"


I'm taking it that,
a. The key is patient ID and date - that's fairly usual. (Your inputs
aren't sorted that way, though.)
b. You want to keep only 'Action' records
c. To each 'Action' record, you want to attach the dose from the latest
earlier 'dose' record; or, if there's no earlier dose record, from the
first dose record.

The key to 'merging' data where times are approximate, is to interleave
(ADD FILES) rather than MATCH FILES, and then carry values across
records.

The following follows this logic; it is SPSS draft output. As always,
syntax to load the test data follows the main listing (thanks, Gene).

The result appears to match what is desired, except in the following
instance:

01-JAN-2006 08:23:18     01      20     .5 Desired
01-JAN-2006 08:23:18     01      20     .2    Action

I venture to suggest that, in this case, the 'desired' record was
entered wrong.

DATASET ACTIVATE Dose.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |13-MAR-2007 12:16:19       |
|-----------------------------|---------------------------|
[Dose]

                 Date patientid userid dose

01-JAN-2006 08:20:16     01      20     .2
01-JAN-2006 08:23:20     01      20     .5
01-JAN-2006 08:20:18     02      21     .6


Number of cases read:  3    Number of cases listed:  3


DATASET ACTIVATE Action.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |13-MAR-2007 12:16:19       |
|-----------------------------|---------------------------|
[Action]

                 Date patientid userid

01-JAN-2006 08:20:13     01      20
01-JAN-2006 08:20:15     01      20
01-JAN-2006 08:20:16     01      20
01-JAN-2006 08:20:18     01      20
01-JAN-2006 08:20:19     01      20
01-JAN-2006 08:23:18     01      20
01-JAN-2006 08:23:22     01      20
01-JAN-2006 08:23:29     01      20
01-JAN-2006 08:20:17     02      21
01-JAN-2006 08:20:19     02      21
01-JAN-2006 08:20:20     02      21

Number of cases read:  11    Number of cases listed:  11


* To 'carry backward' the earliest dose, create an    .
* artificial really early dose, with that value.      .

DATASET ACTIVATE Dose.
DATASET DECLARE  StrtDose.

AGGREGATE OUTFILE = StrtDose
    /BREAK = patientid
    /Dose  = FIRST(Dose).

DATASET ACTIVATE    StrtDose.
NUMERIC Date       (DATETIME20).
COMPUTE Date = DATE.MDY(01,01,1900).
.  /*--  LIST   /*-*/.

* Interleave dose and event records, including the    .
* artificial starting dose                            .
NEW FILE.
ADD FILES
    /FILE=Dose     /IN=Is_Dose
    /FILE=StrtDose /IN=Is_Start
    /FILE=Action   /IN=Is_Act
    /BY patientid Date.

STRING WhereFrom (A15).
COMPUTE    WhereFrom = '????????'.
DO IF      Is_Start.
.  COMPUTE WhereFrom = '   Start dose'.
ELSE IF    Is_Dose.
.  COMPUTE WhereFrom = '   Dose'.
ELSE IF    Is_Act.
.  COMPUTE WhereFrom = '   Action'.
END IF.

* Carry forward dose from 'dose' to 'action' records   .
IF     Is_Act
    AND patientid EQ LAG(patientid)   dose = LAG(dose).

DATASET NAME AllEvent.

LIST VARIABLES=Date to dose, Wherefrom.

List
|-----------------------------|---------------------------|
|Output Created               |13-MAR-2007 12:16:22       |
|-----------------------------|---------------------------|
[AllEvent]

                 Date patientid userid dose WhereFrom

01-JAN-1900 00:00:00     01       .     .2    Start dose
01-JAN-2006 08:20:13     01      20     .2    Action
01-JAN-2006 08:20:15     01      20     .2    Action
01-JAN-2006 08:20:16     01      20     .2    Dose
01-JAN-2006 08:20:16     01      20     .2    Action
01-JAN-2006 08:20:18     01      20     .2    Action
01-JAN-2006 08:20:19     01      20     .2    Action
01-JAN-2006 08:23:18     01      20     .2    Action
01-JAN-2006 08:23:20     01      20     .5    Dose
01-JAN-2006 08:23:22     01      20     .5    Action
01-JAN-2006 08:23:29     01      20     .5    Action
01-JAN-1900 00:00:00     02       .     .6    Start dose
01-JAN-2006 08:20:17     02      21     .6    Action
01-JAN-2006 08:20:18     02      21     .6    Dose
01-JAN-2006 08:20:19     02      21     .6    Action
01-JAN-2006 08:20:20     02      21     .6    Action

Number of cases read:  16    Number of cases listed:  16


* Keep only 'action' records, and compare with desired .
NEW FILE.
ADD FILES
    /FILE=Desired   /IN=Is_Wantd
    /FILE=AllEvent
    /BY patientid Date.

IF IS_Wantd EQ 1   WhereFrom = 'Desired'.

SELECT IF Is_Wantd OR Is_Act.

LIST VARIABLES=Date to dose, Wherefrom.

List
|-----------------------------|---------------------------|
|Output Created               |13-MAR-2007 12:16:22       |
|-----------------------------|---------------------------|
                 Date patientid userid dose WhereFrom

01-JAN-2006 08:20:13     01      20     .2 Desired
01-JAN-2006 08:20:13     01      20     .2    Action
01-JAN-2006 08:20:15     01      20     .2 Desired
01-JAN-2006 08:20:15     01      20     .2    Action
01-JAN-2006 08:20:16     01      20     .2 Desired
01-JAN-2006 08:20:16     01      20     .2    Action
01-JAN-2006 08:20:18     01      20     .2 Desired
01-JAN-2006 08:20:18     01      20     .2    Action
01-JAN-2006 08:20:19     01      20     .2 Desired
01-JAN-2006 08:20:19     01      20     .2    Action
01-JAN-2006 08:23:18     01      20     .5 Desired
01-JAN-2006 08:23:18     01      20     .2    Action
01-JAN-2006 08:23:22     01      20     .5 Desired
01-JAN-2006 08:23:22     01      20     .5    Action
01-JAN-2006 08:23:29     01      20     .5 Desired
01-JAN-2006 08:23:29     01      20     .5    Action
01-JAN-2006 08:20:17     02      21     .6 Desired
01-JAN-2006 08:20:17     02      21     .6    Action
01-JAN-2006 08:20:19     02      21     .6 Desired
01-JAN-2006 08:20:19     02      21     .6    Action
01-JAN-2006 08:20:20     02      21     .6 Desired
01-JAN-2006 08:20:20     02      21     .6    Action

Number of cases read:  22    Number of cases listed:  22
===================
APPENDIX: Test data
===================
*  ............................................................. .
*  .......   Test data, reformatted by Gene Maguin:  ........... .
*  ............................................................. .
*  Dose, was "Data1 (contains the records of a specific action)" .
DATA LIST FIXED SKIP=2 /
     Date       05-23 (DATETIME)
     patientid  26-27 (F)
     userid     39-40 (F)
     dose       46-48 (F).

Data List will read 1 records from the command file

Variable          Rec   Start     End  Format

Date                1       5      23  Datetime19.0
patientid           1      26      27  F2.0
userid              1      39      40  F2.0
dose                1      46      48  F3.0


BEGIN DATA
     Date                 patientid  userid    dose
----|---10----|---20----|---30----|---40----|---50----|
     01/01/2006 08:20:16  01           20      .2
     01/01/2006 08:23:20  01           20      .5
     01/01/2006 08:20:18  02           21      .6
END DATA.
FORMATS Date       (DATETIME20)
        /patientid  (N2)
        /userid     (N2)
        /dose       (F4.1).
SORT CASES BY patientid Date.
DATASET NAME Dose.

.  /*--  LIST   /*-*/.

*  ............................................................. .
*  Action, was "Data2 (contains a list of multiple actions)"     .
DATA LIST FIXED SKIP=2 /
     Date       05-23 (DATETIME)
     patientid  26-27 (F)
     userid     38-39 (F).

Data List will read 1 records from the command file

Variable          Rec   Start     End  Format

Date                1       5      23  Datetime19.0
patientid           1      26      27  F2.0
userid              1      38      39  F2.0

BEGIN DATA.
     Date                 patientid userid
----|---10----|---20----|---30----|---40----|---50----|
     01/01/2006 08:20:13  01          20
     01/01/2006 08:20:15  01          20
     01/01/2006 08:20:16  01          20
     01/01/2006 08:20:18  01          20
     01/01/2006 08:20:19  01          20
     01/01/2006 08:23:18  01          20
     01/01/2006 08:23:29  01          20
     01/01/2006 08:23:22  01          20
     01/01/2006 08:20:17  02          21
     01/01/2006 08:20:19  02          21
     01/01/2006 08:20:20  02          21
END DATA.
FORMATS Date       (DATETIME20)
        /patientid  (N2)
        /userid     (N2).
SORT CASES BY patientid Date.
DATASET NAME Action.

  .  /*--  LIST   /*-*/.

*  ............................................................. .
*  Desired result                                                .

DATA LIST FIXED SKIP=2 /
     Date       05-23 (DATETIME)
     patientid  31-32 (F)
     userid     44-45 (F)
     dose       50-53 (F).

Data List will read 1 records from the command file

Variable          Rec   Start     End  Format

Date                1       5      23  Datetime19.0
patientid           1      31      32  F2.0
userid              1      44      45  F2.0
dose                1      50      53  F4.0

BEGIN DATA
     Date                  patientid  userid       dose
----|---10----|---20----|---30----|---40----|---50----|
     01/01/2006 08:20:13       01           20      .2
     01/01/2006 08:20:15       01           20      .2
     01/01/2006 08:20:16       01           20      .2
     01/01/2006 08:20:18       01           20      .2
     01/01/2006 08:20:19       01           20      .2
     01/01/2006 08:23:18       01           20      .5
     01/01/2006 08:23:29       01           20      .5
     01/01/2006 08:23:22       01           20      .5
     01/01/2006 08:20:17       02           21      .6
     01/01/2006 08:20:19       02           21      .6
     01/01/2006 08:20:20       02           21      .6
END DATA.

FORMATS Date       (DATETIME20)
        /patientid  (N2)
        /userid     (N2)
        /dose       (F4.1).
SORT CASES BY patientid Date.
DATASET NAME Desired.

.  /*--  LIST   /*-*/.
Reply | Threaded
Open this post in threaded view
|

Re: Precision of SPSS time

Barnett, Adrian (HEALTH)-2
In reply to this post by Richard Ristow
Hi Richard
2**-9 is about 2 nanoseconds;  milliseconds are thousandths of seconds

You were closer the first time ;-)

Regards

Adrian Barnett

On Tue, 13 Mar 2007 00:46:20 -0400, Richard Ristow <[hidden email]>
wrote:

>Correction. Bother. At 08:38 PM 3/12/2007, I wrote:
>
>>SPSS numbers (which are 64-bit floating-point numbers following
>>standard IEEE 754) have 53 bits of precision. There've been only about
>>1.3E10, or less than 2**34, seconds, since October 14, 1582. An
>>integer that size can be represented exactly with 9 bits to spare.
>
>So far so good.
>
>>Full resolution is 2**-9 seconds, about 2 microseconds.
>
>"2**-9" is right, but 2**-9 seconds is about 2 MILLIseconds.
Reply | Threaded
Open this post in threaded view
|

Re: Precision of SPSS time

Richard Ristow
At 01:44 AM 3/16/2007, Adrian Barnett wrote:

>2**-9 is about 2 nanoseconds;  milliseconds are thousandths of seconds
>
>You were closer the first time ;-)

Actually, I stumbled over the rules of thumb, the first time.

The easy rule is,

2**10 ~= 10**3 (d1024 vs. d1000); hence, the common '1K' for both.

Somehow I managed to think of 2**-9 as about two of those 'cycles' of
2**10; of course, it's about one of them.

2**-10=1/1024, or about a millisecond; 2**-9 is about twice that, about
two milliseconds
Reply | Threaded
Open this post in threaded view
|

Re: Precision of SPSS time

Barnett, Adrian (HEALTH)-2
In reply to this post by Richard Ristow
On Fri, 16 Mar 2007 09:46:27 -0400, Richard Ristow <[hidden email]>
wrote:

>At 01:44 AM 3/16/2007, Adrian Barnett wrote:
>
>>2**-9 is about 2 nanoseconds;  milliseconds are thousandths of seconds
>>
>>You were closer the first time ;-)
>
>Actually, I stumbled over the rules of thumb, the first time.
>
>The easy rule is,
>
>2**10 ~= 10**3 (d1024 vs. d1000); hence, the common '1K' for both.
>
>Somehow I managed to think of 2**-9 as about two of those 'cycles' of
>2**10; of course, it's about one of them.
>
>2**-10=1/1024, or about a millisecond; 2**-9 is about twice that, about
>two milliseconds

Hi Richard
Time for some humble pie - I used the wrong function on my calculator - what
I thought was raising 2 to the power of minus 9 was in fact multiplying 2 by
10 to the minus 9. As you pointed out, 2**-10 is approximately equal to 10
to the -3.0103.

Regards
Adrian Barnett