Listers, I am using python to loop through files and create new
aggregated files, and then merge the results of the those files into one list. I am calculating the number of members in our program who
saw their pcp at least once in the past year for each month of a year. In other
words, including that month how many saw their pcp during that and the previous
11 months. It appears to be working, but I’m unclear why the aggregated
files do not exist after the program is done running. They must have existed at
some point for the final merged file to have all of the data, but they do not
appear in the folder in which they were created. Here is the syntax that creates the files that are later
merged. Filedate is a variable that gives each file a date-time stamp. count = "1" month1 = "8" month2 = "7" year1 = "2009" year2 = "2010" while int(count) < 13: print count, month1, year1, month2, year2 spss.Submit (""" get file
='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'. if (fromdatesvc_s >= date.moyr(%s,%s)) &
(fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s. exe. SORT CASES BY month_1011 membid. AGGREGATE
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' /PRESORTED /BREAK=month_1011 membid /medical_home_visit_sum=SUM(medical_home_visit). get file ='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'. FILTER OFF. USE ALL. SELECT IF (medical_home_visit_sum>0)&
(sysmis(month_1011) = 0). EXECUTE. SORT CASES BY month_1011. AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s' /PRESORTED /BREAK=month_1011 /N_BREAK=N. """ %(filedate, month1, year1,
month2, year2, count, count, filedate, count, filedate, count, filedate)) count = str(int(count) + 1) month1 = str(int(month1) + 1) if int(month1) > 12: month1 = "1" year1 = str(int(year1) + 1) month2 = str(int(month2) + 1) if int(month2) > 12: month2 = "1" year2 = str(int(year2) + 1) The following syntax merges the files just created above.
The files created above just have two fields with one line each containing the
month and the number of members with at least one visit for the year which
includes that month and the 11 prior months. This merge works but the files
created above do not appear in the folder where they were created. But they
must have existed at some point for this merge to work. Where’d they go? spss.Submit(""" get file =
'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'. dataset name start. """ % (filedate)) count = "2" while int(count) < 13: spss.Submit(""" get file = 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'. dataset name addition. DATASET ACTIVATE start. ADD FILES /FILE=* /FILE='addition'. EXECUTE.""" % (count, filedate)) count = str(int(count) + 1) spss.Submit(""" save outfile =
'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s' /compressed. """ % (filedate)) I know this is a lot to process but if someone could just
explain what is going on here I’d appreciate it. I would prefer to have
a complete understanding of what is happening. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 |
My guess is that the files do actually
exist but are not where you think they are.
Using the %s notation in long blocks of syntax is error prone. I would suggest that you use the named parameter notation instead. For example, this line, /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' would become /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%(filedate)s' if I counted correctly. Then your substitution syntax changes from %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate, count, filedate) to just % locals() Much more readable. Also, I don't see any reason for that exe line. You might want to assign the generated syntax to a variable and print that to check what is actually being run. HTH, Jon Peck (no "h") Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Pirritano, Matthew" <[hidden email]> To: [hidden email] Date: 08/15/2011 08:45 AM Subject: [SPSSX-L] python + aggregate loop = no new files? Sent by: "SPSSX(r) Discussion" <[hidden email]> Listers, I am using python to loop through files and create new aggregated files, and then merge the results of the those files into one list. I am calculating the number of members in our program who saw their pcp at least once in the past year for each month of a year. In other words, including that month how many saw their pcp during that and the previous 11 months. It appears to be working, but I’m unclear why the aggregated files do not exist after the program is done running. They must have existed at some point for the final merged file to have all of the data, but they do not appear in the folder in which they were created. Here is the syntax that creates the files that are later merged. Filedate is a variable that gives each file a date-time stamp. count = "1" month1 = "8" month2 = "7" year1 = "2009" year2 = "2010" while int(count) < 13: print count, month1, year1, month2, year2 spss.Submit (""" get file ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'. if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s. exe. SORT CASES BY month_1011 membid. AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' /PRESORTED /BREAK=month_1011 membid /medical_home_visit_sum=SUM(medical_home_visit). get file ='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'. FILTER OFF. USE ALL. SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0). EXECUTE. SORT CASES BY month_1011. AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s' /PRESORTED /BREAK=month_1011 /N_BREAK=N. """ %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate, count, filedate)) count = str(int(count) + 1) month1 = str(int(month1) + 1) if int(month1) > 12: month1 = "1" year1 = str(int(year1) + 1) month2 = str(int(month2) + 1) if int(month2) > 12: month2 = "1" year2 = str(int(year2) + 1) The following syntax merges the files just created above. The files created above just have two fields with one line each containing the month and the number of members with at least one visit for the year which includes that month and the 11 prior months. This merge works but the files created above do not appear in the folder where they were created. But they must have existed at some point for this merge to work. Where’d they go? spss.Submit(""" get file = 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'. dataset name start. """ % (filedate)) count = "2" while int(count) < 13: spss.Submit(""" get file = 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'. dataset name addition. DATASET ACTIVATE start. ADD FILES /FILE=* /FILE='addition'. EXECUTE.""" % (count, filedate)) count = str(int(count) + 1) spss.Submit(""" save outfile = 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s' /compressed. """ % (filedate)) I know this is a lot to process but if someone could just explain what is going on here I’d appreciate it. I would prefer to have a complete understanding of what is happening. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 |
Jon, You were right. I had thought of that,
too. I just hadn’t thought to search for my filenames. I searched and found
where they had been accumulating. Instead of the directory that I expected a
backslash had gotten changed to ‘A1’. Instead of the directory: 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'
It became: 'D:\Data\AMM\medical_home_visitsA1_month%s_medhome_individuals_%s'
What is it doing? Is ‘\1011’ code for another character? Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 From: Jon K Peck
[mailto:[hidden email]] My guess is that the files do actually exist but are
not where you think they are.
|
That's a different problem. In Python
literals, a sequence like \1 means the character with hex value 1 (STX,
or the start of heading control character in the old communications world).
It's particularly insidious when you have a filespec something like
\temp\xyz, because \t means a tab character. To suppress this behavior,
precede your literal with r, e.g.,
cmd = r""" text """ Alternatively, just use forward slashes everywhere in your file specs. Statistics is happy with either form. Regards, Jon Peck (no "h") Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Pirritano, Matthew" <[hidden email]> To: [hidden email] Date: 08/15/2011 09:13 AM Subject: Re: [SPSSX-L] python + aggregate loop = no new files? Sent by: "SPSSX(r) Discussion" <[hidden email]> Jon, You were right. I had thought of that, too. I just hadn’t thought to search for my filenames. I searched and found where they had been accumulating. Instead of the directory that I expected a backslash had gotten changed to ‘A1’. Instead of the directory: 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' It became: 'D:\Data\AMM\medical_home_visitsA1_month%s_medhome_individuals_%s' What is it doing? Is ‘\1011’ code for another character? Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 From: Jon K Peck [mailto:peck@...] Sent: Monday, August 15, 2011 7:57 AM To: Pirritano, Matthew Cc: [hidden email] Subject: Re: [SPSSX-L] python + aggregate loop = no new files? My guess is that the files do actually exist but are not where you think they are. Using the %s notation in long blocks of syntax is error prone. I would suggest that you use the named parameter notation instead. For example, this line, /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' would become /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%(filedate)s' if I counted correctly. Then your substitution syntax changes from %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate, count, filedate) to just % locals() Much more readable. Also, I don't see any reason for that exe line. You might want to assign the generated syntax to a variable and print that to check what is actually being run. HTH, Jon Peck (no "h") Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Pirritano, Matthew" <[hidden email]> To: [hidden email] Date: 08/15/2011 08:45 AM Subject: [SPSSX-L] python + aggregate loop = no new files? Sent by: "SPSSX(r) Discussion" <[hidden email]> Listers, I am using python to loop through files and create new aggregated files, and then merge the results of the those files into one list. I am calculating the number of members in our program who saw their pcp at least once in the past year for each month of a year. In other words, including that month how many saw their pcp during that and the previous 11 months. It appears to be working, but I’m unclear why the aggregated files do not exist after the program is done running. They must have existed at some point for the final merged file to have all of the data, but they do not appear in the folder in which they were created. Here is the syntax that creates the files that are later merged. Filedate is a variable that gives each file a date-time stamp. count = "1" month1 = "8" month2 = "7" year1 = "2009" year2 = "2010" while int(count) < 13: print count, month1, year1, month2, year2 spss.Submit (""" get file ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'. if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s. exe. SORT CASES BY month_1011 membid. AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s' /PRESORTED /BREAK=month_1011 membid /medical_home_visit_sum=SUM(medical_home_visit). get file ='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'. FILTER OFF. USE ALL. SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0). EXECUTE. SORT CASES BY month_1011. AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s' /PRESORTED /BREAK=month_1011 /N_BREAK=N. """ %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate, count, filedate)) count = str(int(count) + 1) month1 = str(int(month1) + 1) if int(month1) > 12: month1 = "1" year1 = str(int(year1) + 1) month2 = str(int(month2) + 1) if int(month2) > 12: month2 = "1" year2 = str(int(year2) + 1) The following syntax merges the files just created above. The files created above just have two fields with one line each containing the month and the number of members with at least one visit for the year which includes that month and the 11 prior months. This merge works but the files created above do not appear in the folder where they were created. But they must have existed at some point for this merge to work. Where’d they go? spss.Submit(""" get file = 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'. dataset name start. """ % (filedate)) count = "2" while int(count) < 13: spss.Submit(""" get file = 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'. dataset name addition. DATASET ACTIVATE start. ADD FILES /FILE=* /FILE='addition'. EXECUTE.""" % (count, filedate)) count = str(int(count) + 1) spss.Submit(""" save outfile = 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s' /compressed. """ % (filedate)) I know this is a lot to process but if someone could just explain what is going on here I’d appreciate it. I would prefer to have a complete understanding of what is happening. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 |
Administrator
|
In reply to this post by mpirritano
Matthew,
" I would prefer to have a complete understanding of what is happening." In the interests of furthering your understanding you would do well to have a better grasp of your SPSS code itself and its processing requirements!!! I know the evil dialog boxes tend to paste a lot of extraneous CRAP and people superstitiously latch on to the literal paste and develop horrible habits which haunt their code however, the following points are iterated on this list repeatedly! You probably use LARGE files and the code you are running presently could use some reflection! No idea re Python, but . 1. GET RID of the SORTS and the PRESORTED. 2. GET RID of /OUTFILE='filename' on the first aggregate and use OUTFILE=*. 3. GET RID of *ALL* of the EXECUTES, they are completely unnecessary and slow down your job. 4. ADD FILES SUPPORTS up to 50 files at a time. Your 2 at a time approach with EXECUTE is rather painful ADD FILES / FILE file1 / FILE file2 / FILE = FILE3 ....... / FILE = FILE50... ---- SELECT IF NOT SYSMIS(month_1011) . AGGREGATE /OUTFILE=* /BREAK=month_1011 membid /medical_home_visit_sum=SUM(medical_home_visit). SELECT IF (medical_home_visit_sum>0)). AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s' /BREAK=month_1011 /N_BREAK=N. IS not if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s. the SAME as if (fromdatesvc_s = date.moyr(%s,%s)) month_1011 = %s. YMMV!!! David --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David,
Thanks for the helpful pointers! : ) Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Monday, August 15, 2011 9:19 AM To: [hidden email] Subject: Re: python + aggregate loop = no new files? Matthew, " I would prefer to have a complete understanding of what is happening." In the interests of furthering your understanding you would do well to have a better grasp of your SPSS code itself and its processing requirements!!! I know the evil dialog boxes tend to paste a lot of extraneous CRAP and people superstitiously latch on to the literal paste and develop horrible habits which haunt their code however, the following points are iterated on this list repeatedly! You probably use LARGE files and the code you are running presently could use some reflection! No idea re Python, but . 1. GET RID of the SORTS and the PRESORTED. 2. GET RID of /OUTFILE='filename' on the first aggregate and use OUTFILE=*. 3. GET RID of *ALL* of the EXECUTES, they are completely unnecessary and slow down your job. 4. ADD FILES SUPPORTS up to 50 files at a time. Your 2 at a time approach with EXECUTE is rather painful ADD FILES / FILE file1 / FILE file2 / FILE = FILE3 ....... / FILE = FILE50... ---- SELECT IF NOT SYSMIS(month_1011) . AGGREGATE /OUTFILE=* /BREAK=month_1011 membid /medical_home_visit_sum=SUM(medical_home_visit). SELECT IF (medical_home_visit_sum>0)). AGGREGATE /OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_% s' /BREAK=month_1011 /N_BREAK=N. IS not if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s. the SAME as if (fromdatesvc_s = date.moyr(%s,%s)) month_1011 = %s. YMMV!!! David -- mpirritano wrote: > > Listers, > > > > I am using python to loop through files and create new aggregated files, > and then merge the results of the those files into one list. > > > > I am calculating the number of members in our program who saw their pcp > at least once in the past year for each month of a year. In other words, > including that month how many saw their pcp during that and the previous > 11 months. > > > > It appears to be working, but I'm unclear why the aggregated files do > not exist after the program is done running. They must have existed at > some point for the final merged file to have all of the data, but they > do not appear in the folder in which they were created. > > > > Here is the syntax that creates the files that are later merged. > Filedate is a variable that gives each file a date-time stamp. > > > > count = "1" > > month1 = "8" > > month2 = "7" > > year1 = "2009" > > year2 = "2010" > > while int(count) < 13: > > print count, month1, year1, month2, year2 > > spss.Submit (""" > > get file > ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'. > > > > if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= > date.moyr(%s,%s)) month_1011 = %s. > > exe. > > > > SORT CASES BY month_1011 membid. > > AGGREGATE > > > > ls_%s' > > /PRESORTED > > /BREAK=month_1011 membid > > /medical_home_visit_sum=SUM(medical_home_visit). > > > > get file > > > > > FILTER OFF. > > USE ALL. > > SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0). > > EXECUTE. > > > > SORT CASES BY month_1011. > > AGGREGATE > > > > s' > > /PRESORTED > > /BREAK=month_1011 > > /N_BREAK=N. > > > > """ %(filedate, month1, year1, month2, year2, count, count, > filedate, count, filedate, > > count, filedate)) > > count = str(int(count) + 1) > > > > month1 = str(int(month1) + 1) > > if int(month1) > 12: > > month1 = "1" > > year1 = str(int(year1) + 1) > > > > month2 = str(int(month2) + 1) > > if int(month2) > 12: > > month2 = "1" > > year2 = str(int(year2) + 1) > > > > The following syntax merges the files just created above. The files > created above just have two fields with one line each containing the > month and the number of members with at least one visit for the year > which includes that month and the 11 prior months. This merge works > the files created above do not appear in the folder where they were > created. But they must have existed at some point for this merge to > work. Where'd they go? > > > > spss.Submit(""" > > get file = > 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'. > > dataset name start. > > """ % (filedate)) > > > > count = "2" > > while int(count) < 13: > > spss.Submit(""" > > get file = > 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'. > > > > dataset name addition. > > > > DATASET ACTIVATE start. > > ADD FILES /FILE=* > > /FILE='addition'. > > EXECUTE.""" % (count, filedate)) > > count = str(int(count) + 1) > > > > spss.Submit(""" > > save outfile = > 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s' > > /compressed. > > """ % (filedate)) > > > > I know this is a lot to process but if someone could just explain what > is going on here I'd appreciate it. I would prefer to have a complete > understanding of what is happening. > > > > Thanks > > Matt > > > > > > > > Matthew Pirritano, Ph.D. > > Research Analyst IV > > Medical Services Initiative (MSI) > > Orange County Health Care Agency > > (714) 568-5648 > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/python-aggregate-loop-no-n ew-files-tp4701209p4701462.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |