SPSSX Discussion

Another Aggregate problem ( super baffling)

Classic

List

Threaded

15 messages Options

devoidx

Another Aggregate problem ( super baffling)

This post was updated on .

Hi guys, so this one is really baffling, I have ran an aggregate by the break variable CASE ID, to calculate the max of variable X in each CASEID group...as you can see (and this is actual results), for the caseID "812271" (in bold) all X values are zero, and yet, the X_max is 1!!!! how the heck is this possible??

X X_max CASEID
.00 1.00 570964
.00 1.00 570964
.00 1.00 570964
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 570965

Art Kendall

Re: Another Aggregate problem ( super baffling)

What version of SPSS are your running? On what platform?

Have you applied all patches?

What is your syntax?

What are the formats for X Caseid?
Is caseid the result of any transformations?
Is the data presorted?

Di you get the problem if you past this syntax int a syntax windown and run It?

data list list/X (f3.2) X_max (f3.2) CASEID (n6). begin data .00 1.00 570964 .00 1.00 570964 .00 1.00 570964 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 570965 end data. aggregate outfile= * mode=addvariables /break=caseid /testvar= max(x) /kount = nu. list.

Art Kendall
Social Research Consultants

On 10/6/2013 2:02 PM, devoidx [via SPSSX Discussion] wrote:

Hi guys, so this one is really baffling, I have ran an aggregate by the break variable CASE ID, to calculate the max of variable X in each CASEID group...as you can see (and this is actual results), for the caseID "812271" (in bold) all X values are zero, and yet, the X_max is 1!!!! how the heck is this possible??

X X_max CASEID
.00 1.00 570964
.00 1.00 570964
.00 1.00 570964
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 812271
.00 1.00 570965

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

devoidx

Re: Another Aggregate problem ( super baffling)

This post was updated on .

Jon K Peck

Re: Another Aggregate problem ( super baffling)

The CSR for PRESORTED says

When PRESORTED is specified, if AGGREGATE is appending new variables to the active dataset rather than
writing a new file or replacing the active dataset, the cases must be sorted in ascending order by the
BREAK variables.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: devoidx <[hidden email]>
To: [hidden email],
Date: 10/06/2013 01:06 PM
Subject: Re: [SPSSX-L] Another Aggregate problem ( super baffling)
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Thanks a bunch Art for the response, the code you told me to run, ran pretty smoothly and testvar returned zero as it should...so there must be something with the way my dataset is. the dataset i have is presorted by a different variable: PATIENTID, which i am not interested in .... in the dataset all CASEID's of the same value are grouped together but they aren't in an ascending order (which should make a difference since the example dataset you gave me wasn't either).. I am running spss 22 vs64 on 64bit windows and the CASEID is a numeric variable. the syntax i ran: AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=CASEID /X_max_1=MAX(X) Thanks -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722412.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

devoidx

Re: Another Aggregate problem ( super baffling)

Thanks but as shown in my syntax, i did not use /presorted.

Art Kendall

Re: Another Aggregate problem ( super baffling)

What are the formats for X Caseid?
Was they read in? if so by what formats?

Is caseid or x the result of any transformations? How did you get the variable CASEID? How did you get the variable X?
How is caseid related to patientid?

Art Kendall
Social Research Consultants

On 10/6/2013 3:54 PM, devoidx [via SPSSX Discussion] wrote:

Thanks but as shown in my syntax, i did not use /presorted.

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722414.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Art Kendall

Re: Another Aggregate problem ( super baffling)

In reply to this post by devoidx

have you applied all of the patches?

I ran the syntax under Version 21 with all patches.
I have not yet installed v22 and I am not sure whether or not there are any.

Art Kendall
Social Research Consultants

On 10/6/2013 5:39 PM, Art Kendall wrote:

What are the formats for X Caseid?
Was they read in? if so by what formats?

Is caseid or x the result of any transformations? How did you get the variable CASEID? How did you get the variable X?
How is caseid related to patientid?
Art Kendall
Social Research Consultants
On 10/6/2013 3:54 PM, devoidx [via SPSSX Discussion] wrote:
Thanks but as shown in my syntax, i did not use /presorted.

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722414.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Andy W

Re: Another Aggregate problem ( super baffling)

In reply to this post by devoidx

If you have a billion cases, and did not sort the dataset before the aggregate, are you sure there aren't any other cases of 812271 hanging out somewhere else in the dataset (that have a 1 for the X value?) It is clear from your data snippet the dataset isn't sorted by caseid.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

devoidx

Re: Another Aggregate problem ( super baffling)

In reply to this post by Art Kendall

Hi, to answer your questions, CASEID came with the database, X is the result of my own recoding of another variable ...so if variable y is blabla x=1 otherwise x=0. and X does look accurate when i checked it...the problem arises when for many CASEID groups, X_max returns 1 when x is actually 0 in all of the group. I do need to mentioned that X_max is always correct for X values that do contain 1 but is incorrect in many many cases that the X value is 0.

And I've tried this with other variables too..still the same problem...X_max is 1 in many cases where the x is 0 for the particular CASEID group....

and your sample syntax worked fine for me in spss22 too...its within my humongous database that it keeps failing to produce the right number...I really don't understand it

devoidx

Re: Another Aggregate problem ( super baffling)

Hi Andy, as far as I know, while the dataset isn't sorted by CASEID in an ascending fashion, all the same CASEID's are bunched together...but regardless, I don't think you need to sort the dataset before running the aggregate....only if you run /presorted, then the dataset needs to be sorted by the break variable

Andy W

Re: Another Aggregate problem ( super baffling)

I didn't say you needed it sorted, I said there might be another 812217 hanging out somewhere else in the dataset that DOES have a 1 in the X category. That would be my guess - before assuming SPSS has an error - so best to check for certain.

Try something like

temporary.
select if Caseid = 812271.
freq var X.
exe.

And see if it reports all 0's or 1's in the frequency table. Unfortunately this does cause a datapass - sorry. Your learning the hard way it is better to get the code right on a smaller subset beforehand and let the sort of a billion cases left for overnight.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

David Greenberg

Re: Another Aggregate problem ( super baffling)

In reply to this post by Art Kendall

My own leaning would be to revise the bylaws unless Jackson tells us that there are reasons for not doing so. It would be good to have committees ready to go at the start of the fall semester. David

On Sun, Oct 6, 2013 at 5:44 PM, Art Kendall <[hidden email]> wrote:

have you applied all of the patches?

I ran the syntax under Version 21 with all patches.
I have not yet installed v22 and I am not sure whether or not there are any.
Art Kendall
Social Research Consultants
On 10/6/2013 5:39 PM, Art Kendall wrote:
What are the formats for X Caseid?
Was they read in? if so by what formats?

Is caseid or x the result of any transformations? How did you get the variable CASEID? How did you get the variable X?
How is caseid related to patientid?
Art Kendall
Social Research Consultants
On 10/6/2013 3:54 PM, devoidx [via SPSSX Discussion] wrote:
Thanks but as shown in my syntax, i did not use /presorted.

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722414.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML
Art Kendall
Social Research Consultants

View this message in context: Re: Another Aggregate problem ( super baffling)
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

Rich Ulrich

Re: Another Aggregate problem ( super baffling)

In reply to this post by Andy W

"... get the code right on a smaller dataset" sounds like such a
good idea that I would put in N OF CASES <a million> and
see if the error is replicated in a tiny fraction of the time.

--
Rich Ulrich

> Date: Sun, 6 Oct 2013 16:42:05 -0700

> From: [hidden email]
> Subject: Re: Another Aggregate problem ( super baffling)
> To: [hidden email]
>
> I didn't say you needed it sorted, I said there might be another 812217
> hanging out somewhere else in the dataset that DOES have a 1 in the X
> category. That would be my guess - before assuming SPSS has an error - so
> best to check for certain.
>
> Try something like
>
> temporary.
> select if Caseid = 812271.
> freq var X.
> exe.
>
> And see if it reports all 0's or 1's in the frequency table. Unfortunately
> this does cause a datapass - sorry. Your learning the hard way it is better
> to get the code right on a smaller subset beforehand and let the sort of a
> billion cases left for overnight.
> ...

devoidx

Re: Another Aggregate problem ( super baffling)

Thanks guys My plan right now is to xsave the database into smaller chunks and then sort them by caseid and then add them up all together again and try the aggregate again..it is so freustrating to be trying trouble shoot a 1 billion case database ..the syntax runs fine on a smaller subset of my dataset

but the funny thing is that the X=1 indicates the presence of an extremely rare disease in my dataset and the way I am getting 1's for my X_max in aggregate, half of patients end of with having the rare disease which is statistically impossible ...it would be like saying half of the american population has an extra toe...so it can't be that there are x=1's in my dataset that I am not seeing , there just can't be that many x=1's to get so many X_max=1's..

sigh...

Art Kendall

Re: Another Aggregate problem ( super baffling)

If there are patches have you applied them?

Most likely it is a problem with your data or syntax

Do a search in your syntax for the variable x.
are there transforms you forgot about?

Did you make your syntax more readable by generating a variable called blabla rather than have a long IF command?

if x is a dichotomy why do you have a decimal and zeros? (shouldn't affect programming but does affect reading and understanding). It makes me suspicious you are not being careful about readability of your syntax and listings.

Try what Andy said but add some more new variables.

UNTESTED.

compute noblabla = missing(blabla).
compute check1= casedid eq 812271.
compute check2= range(caseid,812270.5,812271.99999999).
Do if checkid eq 812271.
if missing personid checkid = -1.
else.
compute checkid = personid.
else.
compute checkid = 0.
end if.
formats noblabla check1 check2 (f1) checkid (f6).
* as Rich said.
N of cases 2000000.
frequencies variables =x noblabla blabla check1 check2 checkid.
* does x come up with more than 2 values?
* what is in x when blabla is missing?
temporary.
select if xmax ne 0.
save outfile = . . .

Art Kendall
Social Research Consultants

On 10/6/2013 10:43 PM, devoidx [via SPSSX Discussion] wrote:

Thanks guys My plan right now is to xsave the database into smaller chunks and then sort them by caseid and then add them up all together again and try the aggregate again..it is so freustrating to be trying trouble shoot a 1 billion case database ..the syntax runs fine on a smaller subset of my dataset

but the funny thing is that the X=1 indicates the presence of an extremely rare disease in my dataset and the way I am getting 1's for my X_max in aggregate, half of patients end of with having the rare disease which is statistically impossible ...it would be like saying half of the american population has an extra toe...so it can't be that there are x=1's in my dataset that I am not seeing , there just can't be that many x=1's to get so many X_max=1's..

sigh...

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722424.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants