This post was updated on .
Hi guys, so this one is really baffling, I have ran an aggregate by the break variable CASE ID, to calculate the max of variable X in each CASEID group...as you can see (and this is actual results), for the caseID "812271" (in bold) all X values are zero, and yet, the X_max is 1!!!! how the heck is this possible??
X X_max CASEID .00 1.00 570964 .00 1.00 570964 .00 1.00 570964 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 570965 |
What version of SPSS
are your running? On what platform?
Have you applied all patches? What is your syntax? What are the formats for X Caseid? Is caseid the result of any transformations? Is the data presorted? Di you get the problem if you past this syntax int a syntax windown and run It? data list list/X (f3.2) X_max (f3.2) CASEID (n6). begin data .00 1.00 570964 .00 1.00 570964 .00 1.00 570964 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 812271 .00 1.00 570965 end data. aggregate outfile= * mode=addvariables /break=caseid /testvar= max(x) /kount = nu. list. Art Kendall Social Research ConsultantsOn 10/6/2013 2:02 PM, devoidx [via SPSSX Discussion] wrote: Hi guys, so this one is really baffling, I have ran an aggregate by the break variable CASE ID, to calculate the max of variable X in each CASEID group...as you can see (and this is actual results), for the caseID "812271" (in bold) all X values are zero, and yet, the X_max is 1!!!! how the heck is this possible??
Art Kendall
Social Research Consultants |
This post was updated on .
Thanks a bunch Art for the response, the code you told me to run, ran pretty smoothly and testvar returned zero as it should...so there must be something with the way my dataset is.
the dataset i have is presorted by a different variable: PATIENTID, which i am not interested in .... in the dataset all CASEID's of the same value are grouped together but they aren't in an ascending order (which shouldn't make a difference since the example dataset you gave me wasn't either).. I am running spss 22 vs64 on 64bit windows and the CASEID is a numeric variable. the syntax i ran: AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=CASEID /X_max_1=MAX(X) Thanks |
The CSR for PRESORTED says
When PRESORTED is specified, if AGGREGATE is appending new variables to the active dataset rather than writing a new file or replacing the active dataset, the cases must be sorted in ascending order by the BREAK variables. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: devoidx <[hidden email]> To: [hidden email], Date: 10/06/2013 01:06 PM Subject: Re: [SPSSX-L] Another Aggregate problem ( super baffling) Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks a bunch Art for the response, the code you told me to run, ran pretty smoothly and testvar returned zero as it should...so there must be something with the way my dataset is. the dataset i have is presorted by a different variable: PATIENTID, which i am not interested in .... in the dataset all CASEID's of the same value are grouped together but they aren't in an ascending order (which should make a difference since the example dataset you gave me wasn't either).. I am running spss 22 vs64 on 64bit windows and the CASEID is a numeric variable. the syntax i ran: AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=CASEID /X_max_1=MAX(X) Thanks -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Another-Aggregate-problem-super-baffling-tp5722410p5722412.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks but as shown in my syntax, i did not use /presorted.
|
What are the formats
for X Caseid?
Was they read in? if so by what formats? Is caseid or x the result of any transformations? How did you get the variable CASEID? How did you get the variable X? How is caseid related to patientid? Art Kendall Social Research ConsultantsOn 10/6/2013 3:54 PM, devoidx [via SPSSX Discussion] wrote: Thanks but as shown in my syntax, i did not use /presorted.
Art Kendall
Social Research Consultants |
In reply to this post by devoidx
have you applied all of
the patches?
I ran the syntax under Version 21 with all patches. I have not yet installed v22 and I am not sure whether or not there are any. Art Kendall Social Research ConsultantsOn 10/6/2013 5:39 PM, Art Kendall wrote:
Art Kendall
Social Research Consultants |
In reply to this post by devoidx
If you have a billion cases, and did not sort the dataset before the aggregate, are you sure there aren't any other cases of 812271 hanging out somewhere else in the dataset (that have a 1 for the X value?) It is clear from your data snippet the dataset isn't sorted by caseid.
|
In reply to this post by Art Kendall
Hi, to answer your questions, CASEID came with the database, X is the result of my own recoding of another variable ...so if variable y is blabla x=1 otherwise x=0. and X does look accurate when i checked it...the problem arises when for many CASEID groups, X_max returns 1 when x is actually 0 in all of the group. I do need to mentioned that X_max is always correct for X values that do contain 1 but is incorrect in many many cases that the X value is 0.
And I've tried this with other variables too..still the same problem...X_max is 1 in many cases where the x is 0 for the particular CASEID group.... and your sample syntax worked fine for me in spss22 too...its within my humongous database that it keeps failing to produce the right number...I really don't understand it |
Hi Andy, as far as I know, while the dataset isn't sorted by CASEID in an ascending fashion, all the same CASEID's are bunched together...but regardless, I don't think you need to sort the dataset before running the aggregate....only if you run /presorted, then the dataset needs to be sorted by the break variable
|
I didn't say you needed it sorted, I said there might be another 812217 hanging out somewhere else in the dataset that DOES have a 1 in the X category. That would be my guess - before assuming SPSS has an error - so best to check for certain.
Try something like temporary. select if Caseid = 812271. freq var X. exe. And see if it reports all 0's or 1's in the frequency table. Unfortunately this does cause a datapass - sorry. Your learning the hard way it is better to get the code right on a smaller subset beforehand and let the sort of a billion cases left for overnight. |
In reply to this post by Art Kendall
My own leaning would be to revise the bylaws unless Jackson tells us that there are reasons for not doing so. It would be good to have committees ready to go at the start of the fall semester. David
On Sun, Oct 6, 2013 at 5:44 PM, Art Kendall <[hidden email]> wrote:
|
In reply to this post by Andy W
"... get the code right on a smaller dataset" sounds like such a
good idea that I would put in N OF CASES <a million> and see if the error is replicated in a tiny fraction of the time. -- Rich Ulrich > Date: Sun, 6 Oct 2013 16:42:05 -0700 > From: [hidden email] > Subject: Re: Another Aggregate problem ( super baffling) > To: [hidden email] > > I didn't say you needed it sorted, I said there might be another 812217 > hanging out somewhere else in the dataset that DOES have a 1 in the X > category. That would be my guess - before assuming SPSS has an error - so > best to check for certain. > > Try something like > > temporary. > select if Caseid = 812271. > freq var X. > exe. > > And see if it reports all 0's or 1's in the frequency table. Unfortunately > this does cause a datapass - sorry. Your learning the hard way it is better > to get the code right on a smaller subset beforehand and let the sort of a > billion cases left for overnight. > ... |
Thanks guys My plan right now is to xsave the database into smaller chunks and then sort them by caseid and then add them up all together again and try the aggregate again..it is so freustrating to be trying trouble shoot a 1 billion case database ..the syntax runs fine on a smaller subset of my dataset
but the funny thing is that the X=1 indicates the presence of an extremely rare disease in my dataset and the way I am getting 1's for my X_max in aggregate, half of patients end of with having the rare disease which is statistically impossible ...it would be like saying half of the american population has an extra toe...so it can't be that there are x=1's in my dataset that I am not seeing , there just can't be that many x=1's to get so many X_max=1's.. sigh... |
If there are patches
have you applied them?
Most likely it is a problem with your data or syntax Do a search in your syntax for the variable x. are there transforms you forgot about? Did you make your syntax more readable by generating a variable called blabla rather than have a long IF command? if x is a dichotomy why do you have a decimal and zeros? (shouldn't affect programming but does affect reading and understanding). It makes me suspicious you are not being careful about readability of your syntax and listings. Try what Andy said but add some more new variables. UNTESTED. compute noblabla = missing(blabla). compute check1= casedid eq 812271. compute check2= range(caseid,812270.5,812271.99999999). Do if checkid eq 812271. if missing personid checkid = -1. else. compute checkid = personid. else. compute checkid = 0. end if. formats noblabla check1 check2 (f1) checkid (f6). * as Rich said. N of cases 2000000. frequencies variables =x noblabla blabla check1 check2 checkid. * does x come up with more than 2 values? * what is in x when blabla is missing? temporary. select if xmax ne 0. save outfile = . . . Art Kendall Social Research ConsultantsOn 10/6/2013 10:43 PM, devoidx [via SPSSX Discussion] wrote: Thanks guys My plan right now is to xsave the database into smaller chunks and then sort them by caseid and then add them up all together again and try the aggregate again..it is so freustrating to be trying trouble shoot a 1 billion case database ..the syntax runs fine on a smaller subset of my dataset
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |