SPSSX Discussion

Aggregating observations if within a time frame

Classic

List

Threaded

12 messages Options

TheGhazi

Aggregating observations if within a time frame

Hello Listers,

I hope you'll doing well. I searched everywhere to find an answer to this question but i was unlucky.

I have a data that has more than 11 million observation (hospital visits) and i want to choose a sample of cases if certain binary variable is there.

The problem is that, i want to make sure that this is a true case. The way to do so is to use both ( var1=1) and (var2=1). Both variables don't have to be =1 in the same observation. They can be =1 within 7 days window (+/- 7 days from each others). Example:

ID Var1 Var1_date Var2 Var2_date
1 1 1/Jan/2015 0 0
1 0 0 1 27/DEC/2014
1 0 0 1 20/DEC/1995
2 1 1/June/2010 0 0
2 0 0 1 1/JUL/2005

As you can see, patient number 1 is a confirmed case because he has both var1 and var2 =1 within 7 days period. And patient 2 is not a case.

How to select only patient 1? and aggregate both visits? My iam is to look at annual incidence.

Thank you,

Jignesh Sutar

Re: Aggregating observations if within a time frame

Is there ever only one date available for each patient under Var1_Date? And is this date always prior to any date seen under Var2_Date?

It look like there can be multiple dates under Var2_Date, is the 7 day criteria to be judged between dates between Var1_Date and Var2_Date or could it be within Var2_Dates also?

Could you explain what the variables represent, what each record of row represents please.

TheGhazi

Re: Aggregating observations if within a time frame

Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row). Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

David Marso

Re: Aggregating observations if within a time frame

Administrator

In reply to this post by TheGhazi

As a start I would suggest looking at the SELECT IF command, the LAG function and CREATE or SHIFT VALUES commands (also SPLIT FILE and or CASESTOVARS might be useful).
Going one level deeper also see VECTOR and LOOP?
Meanwhile perhaps post additional sample records and the nature of the proposed AGGREGATE?
--

TheGhazi wrote

Hello Listers,

I hope you'll doing well. I searched everywhere to find an answer to this question but i was unlucky.

I have a data that has more than 11 million observation (hospital visits) and i want to choose a sample of cases if certain binary variable is there.

The problem is that, i want to make sure that this is a true case. The way to do so is to use both ( var1=1) and (var2=1). Both variables don't have to be =1 in the same observation. They can be =1 within 7 days window (+/- 7 days from each others). Example:

ID Var1 Var1_date Var2 Var2_date
1 1 1/Jan/2015 0 0
1 0 0 1 27/DEC/2014
1 0 0 1 20/DEC/1995
2 1 1/June/2010 0 0
2 0 0 1 1/JUL/2005

As you can see, patient number 1 is a confirmed case because he has both var1 and var2 =1 within 7 days period. And patient 2 is not a case.

How to select only patient 1? and aggregate both visits? My iam is to look at annual incidence.

Thank you,

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Jignesh Sutar

Re: Aggregating observations if within a time frame

In reply to this post by TheGhazi

How can a procedure take place (Var2) before a diagnosis is identified (Var1)?

On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote:

Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row).
Each patients should have a disease code (var1) and a procedure code (var2)
within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it
should be within 7 days before or after, and the 7 day criteria is to be
judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

TheGhazi

RE: Aggregating observations if within a time frame

In reply to this post by David Marso

Great ideas, David.

I will think about it.

Thank you so much.

Date: Tue, 10 Mar 2015 11:02:01 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

As a start I would suggest looking at the SELECT IF command, the LAG function and CREATE or SHIFT VALUES commands (also SPLIT FILE and or CASESTOVARS might be useful).
Going one level deeper also see VECTOR and LOOP?
Meanwhile perhaps post additional sample records and the nature of the proposed AGGREGATE?
--

TheGhazi wrote

Hello Listers,

I hope you'll doing well. I searched everywhere to find an answer to this question but i was unlucky.

I have a data that has more than 11 million observation (hospital visits) and i want to choose a sample of cases if certain binary variable is there.

The problem is that, i want to make sure that this is a true case. The way to do so is to use both ( var1=1) and (var2=1). Both variables don't have to be =1 in the same observation. They can be =1 within 7 days window (+/- 7 days from each others). Example:

ID Var1 Var1_date Var2 Var2_date
1 1 1/Jan/2015 0 0
1 0 0 1 27/DEC/2014
1 0 0 1 20/DEC/1995
2 1 1/June/2010 0 0
2 0 0 1 1/JUL/2005

As you can see, patient number 1 is a confirmed case because he has both var1 and var2 =1 within 7 days period. And patient 2 is not a case.

How to select only patient 1? and aggregate both visits? My iam is to look at annual incidence.

Thank you,

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728973.html

To unsubscribe from Aggregating observations if within a time frame, click here.
<a href="http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble:email.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble:email.naml-instant_emails%21nabble:email.naml-send_instant_email%21nabble:email.naml" rel=nofollow style="font:9px serif;" target=_blank>NAML

TheGhazi

RE: Aggregating observations if within a time frame

In reply to this post by Jignesh Sutar

It is a matter of coding, Jignesh.

The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away.

Date: Tue, 10 Mar 2015 11:14:28 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

How can a procedure take place (Var2) before a diagnosis is identified (Var1)?

On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote:
Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row).
Each patients should have a disease code (var1) and a procedure code (var2)
within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it
should be within 7 days before or after, and the 7 day criteria is to be
judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html

TheGhazi

RE: Aggregating observations if within a time frame

In reply to this post by Jignesh Sutar

It is a matter of coding, Jignesh.

The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away.

Date: Tue, 10 Mar 2015 11:14:28 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

How can a procedure take place (Var2) before a diagnosis is identified (Var1)?

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html

Maguin, Eugene

Re: Aggregating observations if within a time frame

In reply to this post by TheGhazi

I think there may be a slight language issue because you say
>>I have many observations (hospital visits for each patients in each row).
I think you mean that "... millions of records. Each record is either of this type
1 1 1/Jan/2015 0 0, where var1+var1_date= disease code+DX? date
Or this type
1 0 0 1 27/DEC/201, where var2+var2_date is procedure code+Procedure data.
But never this type
1 1 1/Jan/2015 1 7/jan/2015
Even though you said " Both variables don't have to be =1 in the same observation." I'm assuming each record is an observation.
And never this type
1 0 0 0 0

Are the only valid values for var1 and var2 0 or 1?
Let's say this is so, then I'd do this.

Save all var1=1 records to file A0 keeping only id var1 var1_date.
Save all var2=1 records to file B0 keeping only id var2 var2_date.
Sort file A0 by id and var1_date then do a casestovars (as David pointed out) and save file A1.
Repeat for file B0 using var2_date to make file B1.
You have two files with one record per id: Match files.
Vectorize (Vector) var1_date and var2_date.
Set up double loops and set marker variable=1 if ABS(datediff(var1_date,var2_date,"days")) le 7 then Break to jump out of the loop.

Gene Maguin

ID Var1 Var1_date Var2 Var2_date
1 1 1/Jan/2015 0 0
1 0 0 1 27/DEC/2014
1 0 0 1 20/DEC/1995
2 1 1/June/2010 0 0
2 0 0 1 1/JUL/2005

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of TheGhazi
Sent: Tuesday, March 10, 2015 1:45 PM
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row).
Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jignesh Sutar

Re: Aggregating observations if within a time frame

This makes a lot of assumptions about your data but may help as a stepping stone...

DATA LIST LIST / ID (f1.0) Var1 (f1.0) Var1_date (a10) Var2 (f1.0) Var2_date (a10).
BEGIN DATA.
1 1 01/01/2015 0 0
1 0 0 1 27/12/2014
1 0 0 1 20/12/1995
2 1 1/01/2010 0 0
2 0 0 1 1/12/2005
END DATA.

alter type Var1_date Var2_date (date).
numeric Type (f1.0).
value labels Type 1 "Diagnosis" 2 "Procedure".
compute Type=sum(Var1, Var2*2).
compute Date=sum(Var1_date, Var2_date).
alter type Date (date).
sort cases by ID Date.
do if ID=lag(ID) and Type<>lag(Type).
compute DayDiff=ctime.days(Date-(lag(date))).
end if.

do if DayDiff<8.
compute Filter=1.
end if.
create previousrecord= lead(Filter,1).
compute Filter=sum(Filter,previousrecord).
select if Filter=1.
exe.

David Marso

RE: Aggregating observations if within a time frame

Administrator

In reply to this post by TheGhazi

*/ So off the cuff (obviously untested but I'll bet something of this sort will work ) */.

DATASET NAME rawdata.
DATASET COPY work.
DATASET ACTIVATE work.
/* Retain only cases with var1 or var2 =1.
SELECT IF ANY(1,var1,var2) /* Could use SELECT IF var1 | var2. [ if (0,1) ]*/.
/* Presuming only one or other date is filled create a common field with both*/
COMPUTE date=MAX(date1,date2).
/* Could possibly omit this SORT if you know cases are already ordered */.
SORT CASES BY Unique_IDentifier date.
/*Compare adjacent cases and flag second matching ID with ABS(d2-d1) within 7 days*/.
COMPUTE flag=ABS((Unique_IDentifier EQ LAG(Unique_IDentifier))*DATEDIFF(date,LAG(date),'days')) LE 7.
/* Add the flag to both records */.
AGGREGATE OUTFILE * Mode ADDVARIABLES/ BREAK Unique_IDentifier / ValidCase=MAX(flag).
/* Add flagged records to raw data */.
MATCH FILES /FILE rawdata/TABLE * / BY Unique_IDentifier.
DATASET Name flagged.
SELECT IF ValidCase.

TheGhazi wrote

It is a matter of coding, Jignesh.
The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away.

Date: Tue, 10 Mar 2015 11:14:28 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

How can a procedure take place (Var2) before a diagnosis is identified (Var1)?

On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote:
Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row).

Each patients should have a disease code (var1) and a procedure code (var2)

within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it

should be within 7 days before or after, and the 7 day criteria is to be

judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html

To unsubscribe from Aggregating observations if within a time frame, click here.

NAML

David Marso

RE: Aggregating observations if within a time frame

Administrator

Actually, don't use MODE ADDVARIABLES.

DATASET NAME rawdata.
DATASET COPY work.
DATASET ACTIVATE work.
/* Retain only cases with var1 or var2 =1.
SELECT IF ANY(1,var1,var2) /* Could use SELECT IF var1 | var2. [ if (0,1) ]*/.
/* Presuming only one or other date is filled create a common field with both*/
COMPUTE date=MAX(date1,date2).
/* Could possibly omit this SORT if you know cases are already ordered */.
SORT CASES BY Unique_IDentifier date.
/*Compare adjacent cases and flag second matching ID with ABS(d2-d1) within 7 days*/.
COMPUTE flag=ABS((Unique_IDentifier EQ LAG(Unique_IDentifier)) * DATEDIFF(date,LAG(date),'days')) LE 7.
/* Add the flag to both records */.
AGGREGATE OUTFILE * / BREAK Unique_IDentifier / ValidCase=MAX(flag).
/* Add flagged records to raw data */.
MATCH FILES /FILE rawdata/TABLE * / BY Unique_IDentifier.
DATASET Name flagged.
SELECT IF ValidCase.

David Marso wrote

*/ So off the cuff (obviously untested but I'll bet something of this sort will work ) */.

DATASET NAME rawdata.
DATASET COPY work.
DATASET ACTIVATE work.
/* Retain only cases with var1 or var2 =1.
SELECT IF ANY(1,var1,var2) /* Could use SELECT IF var1 | var2. [ if (0,1) ]*/.
/* Presuming only one or other date is filled create a common field with both*/
COMPUTE date=MAX(date1,date2).
/* Could possibly omit this SORT if you know cases are already ordered */.
SORT CASES BY Unique_IDentifier date.
/*Compare adjacent cases and flag second matching ID with ABS(d2-d1) within 7 days*/.
COMPUTE flag=ABS((Unique_IDentifier EQ LAG(Unique_IDentifier))*DATEDIFF(date,LAG(date),'days')) LE 7.
/* Add the flag to both records */.
AGGREGATE OUTFILE * Mode ADDVARIABLES/ BREAK Unique_IDentifier / ValidCase=MAX(flag).
/* Add flagged records to raw data */.
MATCH FILES /FILE rawdata/TABLE * / BY Unique_IDentifier.
DATASET Name flagged.
SELECT IF ValidCase.

TheGhazi wrote

It is a matter of coding, Jignesh.
The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away.

Date: Tue, 10 Mar 2015 11:14:28 -0700
From: [hidden email]
To: [hidden email]
Subject: Re: Aggregating observations if within a time frame

How can a procedure take place (Var2) before a diagnosis is identified (Var1)?

On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote:
Hey Jignesh,

Thank you for your interest.

I have many observations (hospital visits for each patients in each row).

Each patients should have a disease code (var1) and a procedure code (var2)

within 7 days of each others to be considered a case.

Dates of var1 and var2 could be prior or after each other. In my case it

should be within 7 days before or after, and the 7 day criteria is to be

judged between dates between Var1_Date and Var2_Date only.

I hope i answered all of your questions.

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the command

INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html

To unsubscribe from Aggregating observations if within a time frame, click here.

NAML