help needed to sort longitudinal data

classic Classic list List threaded Threaded
4 messages Options
roh
Reply | Threaded
Open this post in threaded view
|

help needed to sort longitudinal data

roh
Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them. Now i need to go back to the original data and track if these patients have more than 1 entries (can track that through the unique patient ID). What is the most efficient way to do that ? so far i had been doing it manually by tracking patient ID one at a time which is obviously taking a lot of time. Please suggest.
Reply | Threaded
Open this post in threaded view
|

Re: help needed to sort longitudinal data

David Marso
Administrator
See MATCH FILES.
roh wrote
Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them. Now i need to go back to the original data and track if these patients have more than 1 entries (can track that through the unique patient ID). What is the most efficient way to do that ? so far i had been doing it manually by tracking patient ID one at a time which is obviously taking a lot of time. Please suggest.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: help needed to sort longitudinal data

Bruce Weaver
Administrator
In reply to this post by roh
roh wrote
Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them.
I think this means that you now have two datasets, the original and a smaller dataset with about 1500 patients.  Each dataset has the same unique ID variable.  In the original dataset, there can be more than one case (row) per ID; but in the smaller dataset, there is only one row per ID.  Have I got it right so far?  If so, a first step might be to use MATCH FILES, as suggested by David.  I would use the /IN sub-command to flag the 1500 patients of particular interest.  Something like:

* Ensure both datasets are sorted by ID first.
MATCH FILES
 FILE = 'Original' /
 TABLE = 'The1500' / IN = Flag1500 /
 BY = ID .
EXECUTE.
DATASET NAME Merged.
DATASET ACTIVATE Merged.


* Next, you might want to number the cases within each ID,
* and get the total number of cases per ID.

DO IF ($CASENUM EQ 1 OR (ID NE LAG(ID)).
- COMPUTE RecWithinID = 1.
ELSE.
- COMPUTE RecWithinID = LAG(RecWithinID)+1.
END IF.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=ID
  /NumRecs=MAX(RecWithinID).

FORMATS RecWithinID NumRecs (F5.0).
FREQUENCIES RecWithinID NumRecs.

All of this is untested, and may need some tweaking (plus insertion of your own variable names), but it might at least give you some idea how to proceed.

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: help needed to sort longitudinal data

Jon Peck
In reply to this post by roh
If you selected the cases based on some formula applied to the large dataset, just do that selection again and then use Data > Identify Duplicate Cases to see if any of the IDs occur more than once.

If the selection is not easily reproducible, then make the 1500-case dataset the active file and use MATCH FILES with the ID variable as the key (BY) and then use Data > Identify Duplicate Cases.  Note that both files need to be sorted by the id variable for this.

On Wed, Sep 7, 2016 at 7:33 PM, roh <[hidden email]> wrote:
Hi
I am new to spss and learning while i am working on a dataset. The data that
i have has ~ million patient entries. I first sorted out the patient cases
that were of interest to me which came out ~ 1500 patients. Each of this
1500 patients have unique ID to track them. Now i need to go back to the
original data and track if these patients have more than 1 entries (can
track that through the unique patient ID). What is the most efficient way to
do that ? so far i had been doing it manually by tracking patient ID one at
a time which is obviously taking a lot of time. Please suggest.




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/help-needed-to-sort-longitudinal-data-tp5733075.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD