At 05:58 PM 9/20/2013, devoidx wrote:
>I have a humongous database with 1 billion cases. The data is
>comprised of patient visits to doctors office and each patient has
>its own ID value and each case is the information about a particular
>patient visit. I am trying to figure out how many unique patients
>does my database have (ie. how many unique ID values I have).
>
>My database is already sorted based on ID value.
This is ideal for AGGREGATE with PRESORTED.
GET FILE=Humongous.
DATASET DECLARE Patients.
AGGREGATE OUTFILE=Patients
/PRESORTED
/BREAK=ID
/NVisits 'Number of visits for this patient' = NU.
DATASET ACTIVATE Patients WINDOW=FRONT.
* Then, you just want the number of cases in the Patients file: .
SHOW N.
* Or, if you want the value in an SPSS dataset (with one case .
* and one variable): .
DATASET DECLARE HowManyPats.
COMPUTE NoBreak = 1/* This isn't necessary for most recent SPSS */.
AGGREGATE OUTFILE=HowManyPats
/BREAK=NoBreak
/NPats 'Total number of patients' = NU.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD