Selecting the Most Recent Case Data

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Selecting the Most Recent Case Data

bdates
I have a data set, which is derived from a commercial database on which clinicians enter information at patient intake and every three months thereafter until discharge. The administration number (1st administration, 2nd administration, etc. are numbered.) However, due to patient ‘no shows’, cancellations, and/or clinician error, some of the administrations are not included in a download from the commercial data base. The total number of possible administrations is 40. When I import into SPSS some of the admission numbers may be missing for all patients, resulting in a situation like the one below in which there is no data for Time 2, Time 4, or Time 5. However, there may be some data for those administration times the next import from the database. I’m trying to write syntax which will locate the first (initial) scale scores and the most recent scale scores, and at the same time allow for the fact that some of the administration times will be missing for some imports and not for others. I’ve tried vectoring, using do if’s and so on, pretty much exhausting my understanding of the FM. Any time a variable is mentioned, e.g. Date2 or 4 or 5 in the example below, the syntax stops. If the data below are like what I need to deal with, and the second set is what I want to end up with, how might I get there without having to write syntax for each download?

PatientID   Date1      School1   Home1   Date3      School3   Home3   Date6       School6   Home6
129564     12/01/11    20 10       03/15/12    10      10    06/13/12 10 0
131873     03/22/13    30 20       07/01/13   20      20    ……………. …. ….

PatientID   Date1st      School1st   Home1st   DateLast      SchooLast   HomeLast  
129564     12/01/11         20     10            06/13/12           10             0
131873     03/22/13         30     20            07/01/13      20           20

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

Maguin, Eugene
Ok, so I assume that repeated downloads are cumulative in that each download includes all records in the database at that time, even if a given record has been downloaded multiple times. The other thing I assume is that each record is a transaction: patientid, date, school, home, etc. The data organization is long format, not wide format. Let's also assume that data values once entered are not subject to future editing.

If you stack the downloads, you have multiple copies of some transactions and one copy of others. The problem is to get rid of duplicates and there is an Identify Duplicates function in the data dropdown, data window. Or, you could write your own.

If the data comes out in wide format, then that is just an intervening varstocases command transform. Long format gives more flexibility for this problem, I think.
Gene Maguin






-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Brian Dates
Sent: Tuesday, October 25, 2016 3:56 PM
To: [hidden email]
Subject: Selecting the Most Recent Case Data

I have a data set, which is derived from a commercial database on which clinicians enter information at patient intake and every three months thereafter until discharge. The administration number (1st administration, 2nd administration, etc. are numbered.) However, due to patient ‘no shows’, cancellations, and/or clinician error, some of the administrations are not included in a download from the commercial data base. The total number of possible administrations is 40. When I import into SPSS some of the admission numbers may be missing for all patients, resulting in a situation like the one below in which there is no data for Time 2, Time 4, or Time 5. However, there may be some data for those administration times the next import from the database. I’m trying to write syntax which will locate the first (initial) scale scores and the most recent scale scores, and at the same time allow for the fact that some of the administration times will be missing for some imports and not for others. I’ve tried vectoring, using do if’s and so on, pretty much exhausting my understanding of the FM. Any time a variable is mentioned, e.g. Date2 or 4 or 5 in the example below, the syntax stops. If the data below are like what I need to deal with, and the second set is what I want to end up with, how might I get there without having to write syntax for each download?

PatientID   Date1      School1   Home1   Date3      School3   Home3   Date6       School6   Home6
129564     12/01/11    20 10       03/15/12    10      10    06/13/12 10 0
131873     03/22/13    30 20       07/01/13   20      20    ……………. …. ….

PatientID   Date1st      School1st   Home1st   DateLast      SchooLast   HomeLast  
129564     12/01/11         20     10            06/13/12           10             0
131873     03/22/13         30     20            07/01/13      20           20

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

Mike
In reply to this post by bdates
I had a situation somewhat situation to yours where
I was dealing with drug test results from weekly urine
samples.  In the context of the research intervention
I was involved in, we would use the first week of
participation in the research as "week1" and continue
for a year to "week52". I created a vector week1 to week52
for weekly test results even though few if any participants
had the full 52 weeks. Some skipped a week, some
skipped months, some just stopped after a while (some
of participants wound up in jail which made data collection,
uh, difficult). Regardless of which weeks were missing,
we wanted to see if the number of negative drug results
increased over time, presumably as a result of the
treatment intervention.

Working with different vectors of 52 vars for different
drug test results was fairly easy once the data was placed
in the appropriate week in the appropriate 52 week vector.

From the first set of numbers you provide below, it looks
like you have not set up a 40 administration date vector,
instead you only have variables for the administration
dates that you apparently already have.  Is this correct?
If it is, I can understand why vectors would fail.
For the data you have below you really should have six
dates (date1 to date6) even though date2, date4, date5,
are currently empty.  Presumably, those dates will be
filled in when new data come in.  So it seems to me
that you want three vectors: date1 to date 40, home1 to
home40, and school1 to school40.  This should allow
you to do all of the analyses you want (i.e., first date,
last date, number of missing dates, etc.).

Does this make any sense to you?

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Brian Dates" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, October 25, 2016 3:56 PM
Subject: Selecting the Most Recent Case Data


I have a data set, which is derived from a commercial database on which
clinicians enter information at patient intake and every three months
thereafter until discharge. The administration number (1st
administration, 2nd administration, etc. are numbered.) However, due to
patient ‘no shows’, cancellations, and/or clinician error, some of the
administrations are not included in a download from the commercial data
base. The total number of possible administrations is 40. When I import
into SPSS some of the admission numbers may be missing for all patients,
resulting in a situation like the one below in which there is no data
for Time 2, Time 4, or Time 5. However, there may be some data for those
administration times the next import from the database. I’m trying to
write syntax which will locate the first (initial) scale scores and the
most recent scale scores, and at the same time allow for the fact that
some of the administration times will be missing for some imports and
not for others. I’ve tried vectoring, using do if’s and so on, pretty
much exhausting my understanding of the FM. Any time a variable is
mentioned, e.g. Date2 or 4 or 5 in the example below, the syntax stops.
If the data below are like what I need to deal with, and the second set
is what I want to end up with, how might I get there without having to
write syntax for each download?

PatientID   Date1      School1   Home1   Date3      School3   Home3
Date6       School6   Home6
129564      12/01/11    20 10       03/15/12    10       10     06/13/12
10 0
131873      03/22/13    30 20       07/01/13    20
      20     ……………. …. ….

PatientID   Date1st      School1st   Home1st   DateLast      SchooLast
HomeLast
129564      12/01/11         20      10            06/13/12           10
0
131873      03/22/13         30      20            07/01/13       20
20

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

David Marso
Administrator
In reply to this post by bdates

First create a dummy dataset with DATE1 TO Date40, School1 TO School40, Home1 TO Home40 then ADD it to real data.
I would then splatter it with VARSTOCASES then use MATCH FILES with FIRST and LAST functions,
SELECT the FIRST and LAST and then use CASESTOVARS.

/* Untested code follows */.
MATRIX.
SAVE MAKE(1,120,-999)
 / OUTFILE *
 / VARIABLES DATE1 TO Date40  School1 TO School40  Home1 TO Home40 .
END MATRIX.

COMPUTE PatientID =-99999999.
DATASET NAME FakeData.
GET FILE <real data>.
DATASET NAME RealData.
ADD FILES / FILE FakeData / FILE RealData .
SELECT IF PatientID  NE -99999999.

VARSTOCASES
 /MAKE Date FROM  DATE1 TO Date40
 /MAKE School FROM School1 TO School40
 /MAKE Home FROM Home1 TO Home40.

MATCH FILES FILE *
  / FIRST=FirstDate
  / LAST=LastDate
  / BY PatientID  .

SELECT IF FirstDate OR LastDate .
EXECUTE.

DELETE VARIABLES FirstDate  LastDate .
CASESTOVARS/ID=PatientID  .

HTH, David

bdates wrote
I have a data set, which is derived from a commercial database on which clinicians enter information at patient intake and every three months thereafter until discharge. The administration number (1st administration, 2nd administration, etc. are numbered.) However, due to patient ‘no shows’, cancellations, and/or clinician error, some of the administrations are not included in a download from the commercial data base. The total number of possible administrations is 40. When I import into SPSS some of the admission numbers may be missing for all patients, resulting in a situation like the one below in which there is no data for Time 2, Time 4, or Time 5. However, there may be some data for those administration times the next import from the database. I’m trying to write syntax which will locate the first (initial) scale scores and the most recent scale scores, and at the same time allow for the fact that some of the administration times will be missing for some imports and not for others. I’ve tried vectoring, using do if’s and so on, pretty much exhausting my understanding of the FM. Any time a variable is mentioned, e.g. Date2 or 4 or 5 in the example below, the syntax stops. If the data below are like what I need to deal with, and the second set is what I want to end up with, how might I get there without having to write syntax for each download?

PatientID   Date1      School1   Home1   Date3      School3   Home3   Date6       School6   Home6
129564     12/01/11    20 10       03/15/12    10      10    06/13/12 10 0
131873     03/22/13    30 20       07/01/13   20      20    ……………. …. ….

PatientID   Date1st      School1st   Home1st   DateLast      SchooLast   HomeLast  
129564     12/01/11         20     10            06/13/12           10             0
131873     03/22/13         30     20            07/01/13      20           20

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

David Marso
Administrator

And here is a way to do it in place without restructuring the data.
---
/* Simulate some data */.
NEW FILE.
INPUT PROGRAM.
+  LOOP ID=1 TO 20.
+    LOOP T=1 TO 40.
+      END CASE.
+      LEAVE ID.
+    END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

COMPUTE Random=RV.UNIFORM(0,1).
SORT CASES BY ID Random.
IF $CASENUM EQ 1 OR ID NE LAG(ID) Counter = 1.
IF ID EQ LAG(ID) Counter=SUM(LAG(Counter),1).
EXECUTE.
SELECT IF Counter LE 20.
COMPUTE Date=TRUNC(RV.UNIFORM(1,50)).
COMPUTE School=TRUNC(RV.UNIFORM(1,50)).
COMPUTE Home=TRUNC(RV.UNIFORM(1,50)).
DATASET NAME Raw WINDOW=FRONT.
SORT CASES BY ID T.
DELETE VARIABLES Random Counter.
CASESTOVARS ID=ID /INDEX=T /GROUPBY Index.

/* Data simulated hopefully consistent with Brian's description */.

DEFINE !Core (  Vars !CHAREND ("/")  /Offset !CHAREND ("/")  / Vec   !CHAREND ("/")  /Index !CMDEND )

+  DO IF  NOT MISSING(!Vec(!Index  )).
+    DO REPEAT Var=!Vars / Offset=!Offset.
+      COMPUTE Var   = !Vec(!Index  + Offset).
+    END REPEAT.
+    BREAK.
+  END IF.

!ENDDEFINE .

COMPUTE @=1.
MATCH FILES /FILE * / KEEP ID @ ALL.
COMPUTE @@=1.
COUNT VarCount= @ TO @@ (LO THRU HI, MISSING).
VECTOR @@@=@ TO @@.

LOOP #=2 TO VarCount BY 3.
!Core Vars=DateFirst SchoolFirst HomeFirst /Offset 0 To 2 / Vec=@@@ /Index=# .
END LOOP.
LOOP #=VarCount-3 TO 1 BY -3.
!Core Vars=DateLast SchoolLast HomeLast   /Offset=0 To 2 / Vec=@@@   /Index=# .
END LOOP.
MATCH FILES
  /FILE *
  /KEEP ID DateFirst SchoolFirst HomeFirst DateLast SchoolLast HomeLast.
EXECUTE.

==============

David Marso wrote
First create a dummy dataset with DATE1 TO Date40, School1 TO School40, Home1 TO Home40 then ADD it to real data.
I would then splatter it with VARSTOCASES then use MATCH FILES with FIRST and LAST functions,
SELECT the FIRST and LAST and then use CASESTOVARS.

/* Untested code follows */.
MATRIX.
SAVE MAKE(1,120,-999)
 / OUTFILE *
 / VARIABLES DATE1 TO Date40  School1 TO School40  Home1 TO Home40 .
END MATRIX.

COMPUTE PatientID =-99999999.
DATASET NAME FakeData.
GET FILE <real data>.
DATASET NAME RealData.
ADD FILES / FILE FakeData / FILE RealData .
SELECT IF PatientID  NE -99999999.

VARSTOCASES
 /MAKE Date FROM  DATE1 TO Date40
 /MAKE School FROM School1 TO School40
 /MAKE Home FROM Home1 TO Home40.

MATCH FILES FILE *
  / FIRST=FirstDate
  / LAST=LastDate
  / BY PatientID  .

SELECT IF FirstDate OR LastDate .
EXECUTE.

DELETE VARIABLES FirstDate  LastDate .
CASESTOVARS/ID=PatientID  .

HTH, David

bdates wrote
I have a data set, which is derived from a commercial database on which clinicians enter information at patient intake and every three months thereafter until discharge. The administration number (1st administration, 2nd administration, etc. are numbered.) However, due to patient ‘no shows’, cancellations, and/or clinician error, some of the administrations are not included in a download from the commercial data base. The total number of possible administrations is 40. When I import into SPSS some of the admission numbers may be missing for all patients, resulting in a situation like the one below in which there is no data for Time 2, Time 4, or Time 5. However, there may be some data for those administration times the next import from the database. I’m trying to write syntax which will locate the first (initial) scale scores and the most recent scale scores, and at the same time allow for the fact that some of the administration times will be missing for some imports and not for others. I’ve tried vectoring, using do if’s and so on, pretty much exhausting my understanding of the FM. Any time a variable is mentioned, e.g. Date2 or 4 or 5 in the example below, the syntax stops. If the data below are like what I need to deal with, and the second set is what I want to end up with, how might I get there without having to write syntax for each download?

PatientID   Date1      School1   Home1   Date3      School3   Home3   Date6       School6   Home6
129564     12/01/11    20 10       03/15/12    10      10    06/13/12 10 0
131873     03/22/13    30 20       07/01/13   20      20    ……………. …. ….

PatientID   Date1st      School1st   Home1st   DateLast      SchooLast   HomeLast  
129564     12/01/11         20     10            06/13/12           10             0
131873     03/22/13         30     20            07/01/13      20           20

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

Mike
Nicely done.  The only difficulty that I see is whether
the new data received doesn't have anything "funny"
about it.  In the situation I described, the lab that
did the drug testing was pretty good at providing
structured data, so it was important to get the
start date correct, create the new vector and then
either match to an existing case or create a new
case.

It would be nice, however, if the OP would let us
know whether our suggestions are on point or not
(what is frustrating about some posters asking
advice is that they provide too little information at
first possibly because they think details are not
important or don't want to provide too much info
about their data for whatever reason).  It's hard
to tell if one is on the right track without feedback.

-Mike Palij
New York University
[hidden email]

P.S. Oh yeah, show off. ;-)


----- Original Message -----
From: "David Marso" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, October 26, 2016 2:22 AM
Subject: Re: Selecting the Most Recent Case Data


And here is a way to do it in place without restructuring the data.
---
/* Simulate some data */.
NEW FILE.
INPUT PROGRAM.
+  LOOP ID=1 TO 20.
+    LOOP T=1 TO 40.
+      END CASE.
+      LEAVE ID.
+    END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

COMPUTE Random=RV.UNIFORM(0,1).
SORT CASES BY ID Random.
IF $CASENUM EQ 1 OR ID NE LAG(ID) Counter = 1.
IF ID EQ LAG(ID) Counter=SUM(LAG(Counter),1).
EXECUTE.
SELECT IF Counter LE 20.
COMPUTE Date=TRUNC(RV.UNIFORM(1,50)).
COMPUTE School=TRUNC(RV.UNIFORM(1,50)).
COMPUTE Home=TRUNC(RV.UNIFORM(1,50)).
DATASET NAME Raw WINDOW=FRONT.
SORT CASES BY ID T.
DELETE VARIABLES Random Counter.
CASESTOVARS ID=ID /INDEX=T /GROUPBY Index.

/* Data simulated hopefully consistent with Brian's description */.

DEFINE !Core (  Vars !CHAREND ("/")  /Offset !CHAREND ("/")  / Vec
!CHAREND ("/")  /Index !CMDEND )

+  DO IF  NOT MISSING(!Vec(!Index  )).
+    DO REPEAT Var=!Vars / Offset=!Offset.
+      COMPUTE Var   = !Vec(!Index  + Offset).
+    END REPEAT.
+    BREAK.
+  END IF.

!ENDDEFINE .

COMPUTE @=1.
MATCH FILES /FILE * / KEEP ID @ ALL.
COMPUTE @@=1.
COUNT VarCount= @ TO @@ (LO THRU HI, MISSING).
VECTOR @@@=@ TO @@.

LOOP #=2 TO VarCount BY 3.
!Core Vars=DateFirst SchoolFirst HomeFirst /Offset 0 To 2 / Vec=@@@
/Index=#
.
END LOOP.
LOOP #=VarCount-3 TO 1 BY -3.
!Core Vars=DateLast SchoolLast HomeLast   /Offset=0 To 2 / Vec=@@@
/Index=# .
END LOOP.
MATCH FILES
  /FILE *
  /KEEP ID DateFirst SchoolFirst HomeFirst DateLast SchoolLast HomeLast.
EXECUTE.

==============


David Marso wrote

> First create a dummy dataset with DATE1 TO Date40, School1 TO
> School40,
> Home1 TO Home40 then ADD it to real data.
> I would then splatter it with VARSTOCASES then use MATCH FILES with
> FIRST
> and LAST functions,
> SELECT the FIRST and LAST and then use CASESTOVARS.
>
> /* Untested code follows */.
> MATRIX.
> SAVE MAKE(1,120,-999)
>  / OUTFILE *
>  / VARIABLES DATE1 TO Date40  School1 TO School40  Home1 TO Home40 .
> END MATRIX.
>
> COMPUTE PatientID =-99999999.
> DATASET NAME FakeData.
> GET FILE
> <real data>
> .
> DATASET NAME RealData.
> ADD FILES / FILE FakeData / FILE RealData .
> SELECT IF PatientID  NE -99999999.
>
> VARSTOCASES
>  /MAKE Date FROM  DATE1 TO Date40
>  /MAKE School FROM School1 TO School40
>  /MAKE Home FROM Home1 TO Home40.
>
> MATCH FILES FILE *
>   / FIRST=FirstDate
>   / LAST=LastDate
>   / BY PatientID  .
>
> SELECT IF FirstDate OR LastDate .
> EXECUTE.
>
> DELETE VARIABLES FirstDate  LastDate .
> CASESTOVARS/ID=PatientID  .
>
> HTH, David
> bdates wrote
>> I have a data set, which is derived from a commercial database on
>> which
>> clinicians enter information at patient intake and every three months
>> thereafter until discharge. The administration number (1st
>> administration, 2nd administration, etc. are numbered.) However, due
>> to
>> patient ‘no shows’, cancellations, and/or clinician error, some of
>> the
>> administrations are not included in a download from the commercial
>> data
>> base. The total number of possible administrations is 40. When I
>> import
>> into SPSS some of the admission numbers may be missing for all
>> patients,
>> resulting in a situation like the one below in which there is no data
>> for
>> Time 2, Time 4, or Time 5. However, there may be some data for those
>> administration times the next import from the database. I’m trying to
>> write syntax which will locate the first (initial) scale scores and
>> the
>> most recent scale scores, and at the same time allow for the fact
>> that
>> some of the administration times will be missing for some imports and
>> not
>> for others. I’ve tried vectoring, using do if’s and so on, pretty
>> much
>> exhausting my understanding of the FM. Any time a variable is
>> mentioned,
>> e.g. Date2 or 4 or 5 in the example below, the syntax stops. If the
>> data
>> below are like what I need to deal with, and the second set is what I
>> want to end up with, how might I get there without having to write
>> syntax
>> for each download?
>>
>> PatientID   Date1      School1   Home1   Date3      School3   Home3
>> Date6       School6   Home6
>> 129564      12/01/11    20 10       03/15/12    10       10
>> 06/13/12
>> 10 0
>> 131873      03/22/13    30 20       07/01/13    20
>>   20     ……………. ….
>> ….
>>
>> PatientID   Date1st      School1st   Home1st   DateLast
>> SchooLast
>> HomeLast
>> 129564      12/01/11         20      10            06/13/12
>> 10
>> 0
>> 131873      03/22/13         30      20            07/01/13       20
>> 20
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to

>> LISTSERV@.UGA

>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante
porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff
in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Selecting-the-Most-Recent-Case-Data-tp5733385p5733392.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Selecting the Most Recent Case Data

bdates
In reply to this post by David Marso
Gene, Mike, David...

Perfect! Thank-you very much.

B

Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1906 25th Street, Detroit, MI 48216
313-297-1391 office | 313-849-2702 fax
[hidden email] | www.swsol.org


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, October 26, 2016 2:23 AM
To: [hidden email]
Subject: Re: Selecting the Most Recent Case Data

And here is a way to do it in place without restructuring the data.
---
/* Simulate some data */.
NEW FILE.
INPUT PROGRAM.
+  LOOP ID=1 TO 20.
+    LOOP T=1 TO 40.
+      END CASE.
+      LEAVE ID.
+    END LOOP.
+  END LOOP.
+  END FILE.
END INPUT PROGRAM.

COMPUTE Random=RV.UNIFORM(0,1).
SORT CASES BY ID Random.
IF $CASENUM EQ 1 OR ID NE LAG(ID) Counter = 1.
IF ID EQ LAG(ID) Counter=SUM(LAG(Counter),1).
EXECUTE.
SELECT IF Counter LE 20.
COMPUTE Date=TRUNC(RV.UNIFORM(1,50)).
COMPUTE School=TRUNC(RV.UNIFORM(1,50)).
COMPUTE Home=TRUNC(RV.UNIFORM(1,50)).
DATASET NAME Raw WINDOW=FRONT.
SORT CASES BY ID T.
DELETE VARIABLES Random Counter.
CASESTOVARS ID=ID /INDEX=T /GROUPBY Index.

/* Data simulated hopefully consistent with Brian's description */.

DEFINE !Core (  Vars !CHAREND ("/")  /Offset !CHAREND ("/")  / Vec !CHAREND ("/")  /Index !CMDEND )

+  DO IF  NOT MISSING(!Vec(!Index  )).
+    DO REPEAT Var=!Vars / Offset=!Offset.
+      COMPUTE Var   = !Vec(!Index  + Offset).
+    END REPEAT.
+    BREAK.
+  END IF.

!ENDDEFINE .

COMPUTE @=1.
MATCH FILES /FILE * / KEEP ID @ ALL.
COMPUTE @@=1.
COUNT VarCount= @ TO @@ (LO THRU HI, MISSING).
VECTOR @@@=@ TO @@.

LOOP #=2 TO VarCount BY 3.
!Core Vars=DateFirst SchoolFirst HomeFirst /Offset 0 To 2 / Vec=@@@ /Index=# .
END LOOP.
LOOP #=VarCount-3 TO 1 BY -3.
!Core Vars=DateLast SchoolLast HomeLast   /Offset=0 To 2 / Vec=@@@  
/Index=# .
END LOOP.
MATCH FILES
  /FILE *
  /KEEP ID DateFirst SchoolFirst HomeFirst DateLast SchoolLast HomeLast.
EXECUTE.

==============


David Marso wrote

> First create a dummy dataset with DATE1 TO Date40, School1 TO
> School40,
> Home1 TO Home40 then ADD it to real data.
> I would then splatter it with VARSTOCASES then use MATCH FILES with
> FIRST and LAST functions, SELECT the FIRST and LAST and then use
> CASESTOVARS.
>
> /* Untested code follows */.
> MATRIX.
> SAVE MAKE(1,120,-999)
>  / OUTFILE *
>  / VARIABLES DATE1 TO Date40  School1 TO School40  Home1 TO Home40 .
> END MATRIX.
>
> COMPUTE PatientID =-99999999.
> DATASET NAME FakeData.
> GET FILE
> <real data>
> .
> DATASET NAME RealData.
> ADD FILES / FILE FakeData / FILE RealData .
> SELECT IF PatientID  NE -99999999.
>
> VARSTOCASES
>  /MAKE Date FROM  DATE1 TO Date40
>  /MAKE School FROM School1 TO School40  /MAKE Home FROM Home1 TO
> Home40.
>
> MATCH FILES FILE *
>   / FIRST=FirstDate
>   / LAST=LastDate
>   / BY PatientID  .
>
> SELECT IF FirstDate OR LastDate .
> EXECUTE.
>
> DELETE VARIABLES FirstDate  LastDate .
> CASESTOVARS/ID=PatientID  .
>
> HTH, David
> bdates wrote
>> I have a data set, which is derived from a commercial database on
>> which clinicians enter information at patient intake and every three
>> months thereafter until discharge. The administration number (1st
>> administration, 2nd administration, etc. are numbered.) However, due
>> to patient ‘no shows’, cancellations, and/or clinician error, some of
>> the administrations are not included in a download from the
>> commercial data base. The total number of possible administrations is
>> 40. When I import into SPSS some of the admission numbers may be
>> missing for all patients, resulting in a situation like the one below
>> in which there is no data for Time 2, Time 4, or Time 5. However,
>> there may be some data for those administration times the next import
>> from the database. I’m trying to write syntax which will locate the
>> first (initial) scale scores and the most recent scale scores, and at
>> the same time allow for the fact that some of the administration
>> times will be missing for some imports and not for others. I’ve tried
>> vectoring, using do if’s and so on, pretty much exhausting my
>> understanding of the FM. Any time a variable is mentioned, e.g. Date2
>> or 4 or 5 in the example below, the syntax stops. If the data below
>> are like what I need to deal with, and the second set is what I want
>> to end up with, how might I get there without having to write syntax for each download?
>>
>> PatientID   Date1      School1   Home1   Date3      School3   Home3  
>> Date6       School6   Home6
>> 129564     12/01/11    20 10       03/15/12    10      10    06/13/12
>> 10 0
>> 131873     03/22/13    30 20       07/01/13   20      20    ……………. ….
>> ….
>>
>> PatientID   Date1st      School1st   Home1st   DateLast      SchooLast  
>> HomeLast  
>> 129564     12/01/11         20     10            06/13/12           10            
>> 0
>> 131873     03/22/13         30     20            07/01/13      20          
>> 20
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to

>> LISTSERV@.UGA

>>  (not to SPSSX-L), with no body text except the command. To leave the
>> list, send the command SIGNOFF SPSSX-L For a list of commands to
>> manage subscriptions, send the command INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Selecting-the-Most-Recent-Case-Data-tp5733385p5733392.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD