SPSSX Discussion

help needed to sort longitudinal data

Classic

List

Threaded

4 messages Options

roh

help needed to sort longitudinal data

Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them. Now i need to go back to the original data and track if these patients have more than 1 entries (can track that through the unique patient ID). What is the most efficient way to do that ? so far i had been doing it manually by tracking patient ID one at a time which is obviously taking a lot of time. Please suggest.

David Marso

Re: help needed to sort longitudinal data

Administrator

See MATCH FILES.

roh wrote

Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them. Now i need to go back to the original data and track if these patients have more than 1 entries (can track that through the unique patient ID). What is the most efficient way to do that ? so far i had been doing it manually by tracking patient ID one at a time which is obviously taking a lot of time. Please suggest.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Bruce Weaver

Re: help needed to sort longitudinal data

Administrator

In reply to this post by roh

roh wrote

Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them.

I think this means that you now have two datasets, the original and a smaller dataset with about 1500 patients. Each dataset has the same unique ID variable. In the original dataset, there can be more than one case (row) per ID; but in the smaller dataset, there is only one row per ID. Have I got it right so far? If so, a first step might be to use MATCH FILES, as suggested by David. I would use the /IN sub-command to flag the 1500 patients of particular interest. Something like:

* Ensure both datasets are sorted by ID first.
MATCH FILES
FILE = 'Original' /
TABLE = 'The1500' / IN = Flag1500 /
BY = ID .
EXECUTE.
DATASET NAME Merged.
DATASET ACTIVATE Merged.

* Next, you might want to number the cases within each ID,
* and get the total number of cases per ID.

DO IF ($CASENUM EQ 1 OR (ID NE LAG(ID)).
- COMPUTE RecWithinID = 1.
ELSE.
- COMPUTE RecWithinID = LAG(RecWithinID)+1.
END IF.

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/NumRecs=MAX(RecWithinID).

FORMATS RecWithinID NumRecs (F5.0).
FREQUENCIES RecWithinID NumRecs.

All of this is untested, and may need some tweaking (plus insertion of your own variable names), but it might at least give you some idea how to proceed.

HTH.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon Peck

Re: help needed to sort longitudinal data

In reply to this post by roh

If you selected the cases based on some formula applied to the large dataset, just do that selection again and then use Data > Identify Duplicate Cases to see if any of the IDs occur more than once.

If the selection is not easily reproducible, then make the 1500-case dataset the active file and use MATCH FILES with the ID variable as the key (BY) and then use Data > Identify Duplicate Cases. Note that both files need to be sorted by the id variable for this.

On Wed, Sep 7, 2016 at 7:33 PM, roh <[hidden email]> wrote:

Hi
I am new to spss and learning while i am working on a dataset. The data that
i have has ~ million patient entries. I first sorted out the patient cases
that were of interest to me which came out ~ 1500 patients. Each of this
1500 patients have unique ID to track them. Now i need to go back to the
original data and track if these patients have more than 1 entries (can
track that through the unique patient ID). What is the most efficient way to
do that ? so far i had been doing it manually by tracking patient ID one at
a time which is obviously taking a lot of time. Please suggest.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/help-needed-to-sort-longitudinal-data-tp5733075.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD