Hi
I am new to spss and learning while i am working on a dataset. The data that i have has ~ million patient entries. I first sorted out the patient cases that were of interest to me which came out ~ 1500 patients. Each of this 1500 patients have unique ID to track them. Now i need to go back to the original data and track if these patients have more than 1 entries (can track that through the unique patient ID). What is the most efficient way to do that ? so far i had been doing it manually by tracking patient ID one at a time which is obviously taking a lot of time. Please suggest. |
Administrator
|
See MATCH FILES.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by roh
I think this means that you now have two datasets, the original and a smaller dataset with about 1500 patients. Each dataset has the same unique ID variable. In the original dataset, there can be more than one case (row) per ID; but in the smaller dataset, there is only one row per ID. Have I got it right so far? If so, a first step might be to use MATCH FILES, as suggested by David. I would use the /IN sub-command to flag the 1500 patients of particular interest. Something like: * Ensure both datasets are sorted by ID first. MATCH FILES FILE = 'Original' / TABLE = 'The1500' / IN = Flag1500 / BY = ID . EXECUTE. DATASET NAME Merged. DATASET ACTIVATE Merged. * Next, you might want to number the cases within each ID, * and get the total number of cases per ID. DO IF ($CASENUM EQ 1 OR (ID NE LAG(ID)). - COMPUTE RecWithinID = 1. ELSE. - COMPUTE RecWithinID = LAG(RecWithinID)+1. END IF. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /NumRecs=MAX(RecWithinID). FORMATS RecWithinID NumRecs (F5.0). FREQUENCIES RecWithinID NumRecs. All of this is untested, and may need some tweaking (plus insertion of your own variable names), but it might at least give you some idea how to proceed. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by roh
If you selected the cases based on some formula applied to the large dataset, just do that selection again and then use Data > Identify Duplicate Cases to see if any of the IDs occur more than once. If the selection is not easily reproducible, then make the 1500-case dataset the active file and use MATCH FILES with the ID variable as the key (BY) and then use Data > Identify Duplicate Cases. Note that both files need to be sorted by the id variable for this. On Wed, Sep 7, 2016 at 7:33 PM, roh <[hidden email]> wrote: Hi |
Free forum by Nabble | Edit this page |