SPSSX Discussion

python + aggregate loop = no new files?

Classic

List

Threaded

6 messages Options

mpirritano

python + aggregate loop = no new files?

Listers,

I am using python to loop through files and create new aggregated files, and then merge the results of the those files into one list.

I am calculating the number of members in our program who saw their pcp at least once in the past year for each month of a year. In other words, including that month how many saw their pcp during that and the previous 11 months.

It appears to be working, but I’m unclear why the aggregated files do not exist after the program is done running. They must have existed at some point for the final merged file to have all of the data, but they do not appear in the folder in which they were created.

Here is the syntax that creates the files that are later merged. Filedate is a variable that gives each file a date-time stamp.

count = "1"

month1 = "8"

month2 = "7"

year1 = "2009"

year2 = "2010"

while int(count) < 13:

print count, month1, year1, month2, year2

spss.Submit ("""

get file ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'.

if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s.

exe.

SORT CASES BY month_1011 membid.

AGGREGATE

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'

/PRESORTED

/BREAK=month_1011 membid

/medical_home_visit_sum=SUM(medical_home_visit).

get file ='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'.

FILTER OFF.

USE ALL.

SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0).

EXECUTE.

SORT CASES BY month_1011.

AGGREGATE

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'

/PRESORTED

/BREAK=month_1011

/N_BREAK=N.

""" %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate,

count, filedate))

count = str(int(count) + 1)

month1 = str(int(month1) + 1)

if int(month1) > 12:

month1 = "1"

year1 = str(int(year1) + 1)

month2 = str(int(month2) + 1)

if int(month2) > 12:

month2 = "1"

year2 = str(int(year2) + 1)

The following syntax merges the files just created above. The files created above just have two fields with one line each containing the month and the number of members with at least one visit for the year which includes that month and the 11 prior months. This merge works but the files created above do not appear in the folder where they were created. But they must have existed at some point for this merge to work. Where’d they go?

spss.Submit("""

get file = 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'.

dataset name start.

""" % (filedate))

count = "2"

while int(count) < 13:

spss.Submit("""

get file = 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'.

dataset name addition.

DATASET ACTIVATE start.

ADD FILES /FILE=*

/FILE='addition'.

EXECUTE.""" % (count, filedate))

count = str(int(count) + 1)

spss.Submit("""

save outfile = 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s'

/compressed.

""" % (filedate))

I know this is a lot to process but if someone could just explain what is going on here I’d appreciate it. I would prefer to have a complete understanding of what is happening.

Thanks

Matt

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

Jon K Peck

Re: python + aggregate loop = no new files?

My guess is that the files do actually exist but are not where you think they are.
Using the %s notation in long blocks of syntax is error prone. I would suggest that you use the named parameter notation instead. For example, this line,
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'
would become
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%(filedate)s'
if I counted correctly. Then your substitution syntax changes from
%(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate,
count, filedate)
to just
% locals()

Much more readable.
Also, I don't see any reason for that exe line.

You might want to assign the generated syntax to a variable and print that to check what is actually being run.

HTH,

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Pirritano, Matthew" <[hidden email]>
To: [hidden email]
Date: 08/15/2011 08:45 AM
Subject: [SPSSX-L] python + aggregate loop = no new files?
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Listers,

I am using python to loop through files and create new aggregated files, and then merge the results of the those files into one list.

I am calculating the number of members in our program who saw their pcp at least once in the past year for each month of a year. In other words, including that month how many saw their pcp during that and the previous 11 months.

It appears to be working, but I’m unclear why the aggregated files do not exist after the program is done running. They must have existed at some point for the final merged file to have all of the data, but they do not appear in the folder in which they were created.

Here is the syntax that creates the files that are later merged. Filedate is a variable that gives each file a date-time stamp.

count = "1"
month1 = "8"
month2 = "7"
year1 = "2009"
year2 = "2010"
while int(count) < 13:
print count, month1, year1, month2, year2
spss.Submit ("""
get file ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'.

if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <= date.moyr(%s,%s)) month_1011 = %s.
exe.

SORT CASES BY month_1011 membid.
AGGREGATE
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'
/PRESORTED
/BREAK=month_1011 membid
/medical_home_visit_sum=SUM(medical_home_visit).

get file ='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'.

FILTER OFF.
USE ALL.
SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0).
EXECUTE.

SORT CASES BY month_1011.
AGGREGATE
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'
/PRESORTED
/BREAK=month_1011
/N_BREAK=N.

""" %(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate,
count, filedate))
count = str(int(count) + 1)

month1 = str(int(month1) + 1)
if int(month1) > 12:
month1 = "1"
year1 = str(int(year1) + 1)

month2 = str(int(month2) + 1)
if int(month2) > 12:
month2 = "1"
year2 = str(int(year2) + 1)

The following syntax merges the files just created above. The files created above just have two fields with one line each containing the month and the number of members with at least one visit for the year which includes that month and the 11 prior months. This merge works but the files created above do not appear in the folder where they were created. But they must have existed at some point for this merge to work. Where’d they go?

spss.Submit("""
get file = 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'.
dataset name start.
""" % (filedate))

count = "2"
while int(count) < 13:
spss.Submit("""
get file = 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'.

dataset name addition.

DATASET ACTIVATE start.
ADD FILES /FILE=*
/FILE='addition'.
EXECUTE.""" % (count, filedate))
count = str(int(count) + 1)

spss.Submit("""
save outfile = 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s'
/compressed.
""" % (filedate))

I know this is a lot to process but if someone could just explain what is going on here I’d appreciate it. I would prefer to have a complete understanding of what is happening.

Thanks
Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

mpirritano

Re: python + aggregate loop = no new files?

Jon,

You were right. I had thought of that, too. I just hadn’t thought to search for my filenames. I searched and found where they had been accumulating. Instead of the directory that I expected a backslash had gotten changed to ‘A1’.

Instead of the directory: 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'

It became: 'D:\Data\AMM\medical_home_visitsA1_month%s_medhome_individuals_%s'

What is it doing? Is ‘\1011’ code for another character?

Thanks

Matt

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

From: Jon K Peck [mailto:[hidden email]]
Sent: Monday, August 15, 2011 7:57 AM
To: Pirritano, Matthew
Cc: [hidden email]
Subject: Re: [SPSSX-L] python + aggregate loop = no new files?

Jon K Peck

Re: python + aggregate loop = no new files?

That's a different problem. In Python literals, a sequence like \1 means the character with hex value 1 (STX, or the start of heading control character in the old communications world). It's particularly insidious when you have a filespec something like \temp\xyz, because \t means a tab character. To suppress this behavior, precede your literal with r, e.g.,
cmd = r""" text """

Alternatively, just use forward slashes everywhere in your file specs. Statistics is happy with either form.

Regards,

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Pirritano, Matthew" <[hidden email]>
To: [hidden email]
Date: 08/15/2011 09:13 AM
Subject: Re: [SPSSX-L] python + aggregate loop = no new files?
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Jon,

You were right. I had thought of that, too. I just hadn’t thought to search for my filenames. I searched and found where they had been accumulating. Instead of the directory that I expected a backslash had gotten changed to ‘A1’.

Instead of the directory: 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'

It became: 'D:\Data\AMM\medical_home_visitsA1_month%s_medhome_individuals_%s'

What is it doing? Is ‘\1011’ code for another character?

Thanks
Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648

From: Jon K Peck [mailto:peck@...]
Sent: Monday, August 15, 2011 7:57 AM
To: Pirritano, Matthew
Cc: [hidden email]
Subject: Re: [SPSSX-L] python + aggregate loop = no new files?

My guess is that the files do actually exist but are not where you think they are.
Using the %s notation in long blocks of syntax is error prone. I would suggest that you use the named parameter notation instead. For example, this line,
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'
would become
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%(filedate)s'
if I counted correctly. Then your substitution syntax changes from
%(filedate, month1, year1, month2, year2, count, count, filedate, count, filedate,
count, filedate)
to just
% locals()

Much more readable.
Also, I don't see any reason for that exe line.

You might want to assign the generated syntax to a variable and print that to check what is actually being run.

HTH,

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: "Pirritano, Matthew" <[hidden email]>
To: [hidden email]
Date: 08/15/2011 08:45 AM
Subject: [SPSSX-L] python + aggregate loop = no new files?
Sent by: "SPSSX(r) Discussion" <[hidden email]>

David Marso

Re: python + aggregate loop = no new files?

Administrator

In reply to this post by mpirritano

Matthew,
" I would prefer to have a complete understanding of what is happening."
In the interests of furthering your understanding you would do well to have a better grasp of your SPSS code itself and its processing requirements!!! I know the evil dialog boxes tend to paste a lot of extraneous CRAP and people superstitiously latch on to the literal paste and develop horrible habits which haunt their code however, the following points are iterated on this list repeatedly!
You probably use LARGE files and the code you are running presently could use some reflection!

No idea re Python, but .
1. GET RID of the SORTS and the PRESORTED.
2. GET RID of /OUTFILE='filename' on the first aggregate and use OUTFILE=*.
3. GET RID of *ALL* of the EXECUTES, they are completely unnecessary and slow down your job.
4. ADD FILES SUPPORTS up to 50 files at a time. Your 2 at a time approach with EXECUTE is rather painful
ADD FILES / FILE file1 / FILE file2 / FILE = FILE3 ....... / FILE = FILE50...

----
SELECT IF NOT SYSMIS(month_1011) .
AGGREGATE
/OUTFILE=*
/BREAK=month_1011 membid
/medical_home_visit_sum=SUM(medical_home_visit).
SELECT IF (medical_home_visit_sum>0)).

AGGREGATE
/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'
/BREAK=month_1011
/N_BREAK=N.

IS not
if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <=
date.moyr(%s,%s)) month_1011 = %s.

the SAME as
if (fromdatesvc_s = date.moyr(%s,%s)) month_1011 = %s.

YMMV!!!

David
--

mpirritano wrote

Listers,

I am using python to loop through files and create new aggregated files,
and then merge the results of the those files into one list.

I am calculating the number of members in our program who saw their pcp
at least once in the past year for each month of a year. In other words,
including that month how many saw their pcp during that and the previous
11 months.

It appears to be working, but I'm unclear why the aggregated files do
not exist after the program is done running. They must have existed at
some point for the final merged file to have all of the data, but they
do not appear in the folder in which they were created.

Here is the syntax that creates the files that are later merged.
Filedate is a variable that gives each file a date-time stamp.

count = "1"

month1 = "8"

month2 = "7"

year1 = "2009"

year2 = "2010"

while int(count) < 13:

print count, month1, year1, month2, year2

spss.Submit ("""

get file
='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'.

if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <=
date.moyr(%s,%s)) month_1011 = %s.

exe.

SORT CASES BY month_1011 membid.

AGGREGATE

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individua
ls_%s'

/PRESORTED

/BREAK=month_1011 membid

/medical_home_visit_sum=SUM(medical_home_visit).

get file
='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'.

FILTER OFF.

USE ALL.

SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) = 0).

EXECUTE.

SORT CASES BY month_1011.

AGGREGATE

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%
s'

/PRESORTED

/BREAK=month_1011

/N_BREAK=N.

""" %(filedate, month1, year1, month2, year2, count, count,
filedate, count, filedate,

count, filedate))

count = str(int(count) + 1)

month1 = str(int(month1) + 1)

if int(month1) > 12:

month1 = "1"

year1 = str(int(year1) + 1)

month2 = str(int(month2) + 1)

if int(month2) > 12:

month2 = "1"

year2 = str(int(year2) + 1)

The following syntax merges the files just created above. The files
created above just have two fields with one line each containing the
month and the number of members with at least one visit for the year
which includes that month and the 11 prior months. This merge works but
the files created above do not appear in the folder where they were
created. But they must have existed at some point for this merge to
work. Where'd they go?

spss.Submit("""

get file =
'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'.

dataset name start.

""" % (filedate))

count = "2"

while int(count) < 13:

spss.Submit("""

get file =
'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'.

dataset name addition.

DATASET ACTIVATE start.

ADD FILES /FILE=*

/FILE='addition'.

EXECUTE.""" % (count, filedate))

count = str(int(count) + 1)

spss.Submit("""

save outfile =
'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s'

/compressed.

""" % (filedate))

I know this is a lot to process but if someone could just explain what
is going on here I'd appreciate it. I would prefer to have a complete
understanding of what is happening.

Thanks

Matt

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

mpirritano

Re: python + aggregate loop = no new files?

David,

Thanks for the helpful pointers! : )

Thanks
Matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Monday, August 15, 2011 9:19 AM
To: [hidden email]
Subject: Re: python + aggregate loop = no new files?

Matthew,
" I would prefer to have a complete understanding of what is happening."
In the interests of furthering your understanding you would do well to
have
a better grasp of your SPSS code itself and its processing
requirements!!!
I know the evil dialog boxes tend to paste a lot of extraneous CRAP and
people superstitiously latch on to the literal paste and develop
horrible
habits which haunt their code however, the following points are iterated
on
this list repeatedly!
You probably use LARGE files and the code you are running presently
could
use some reflection!

No idea re Python, but .
1. GET RID of the SORTS and the PRESORTED.
2. GET RID of /OUTFILE='filename' on the first aggregate and use
OUTFILE=*.
3. GET RID of *ALL* of the EXECUTES, they are completely unnecessary and
slow down your job.
4. ADD FILES SUPPORTS up to 50 files at a time. Your 2 at a time
approach
with EXECUTE is rather painful
ADD FILES / FILE file1 / FILE file2 / FILE = FILE3 ....... / FILE =
FILE50...

----
SELECT IF NOT SYSMIS(month_1011) .
AGGREGATE
/OUTFILE=*
/BREAK=month_1011 membid
/medical_home_visit_sum=SUM(medical_home_visit).
SELECT IF (medical_home_visit_sum>0)).

AGGREGATE

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%
s'
/BREAK=month_1011
/N_BREAK=N.

IS not
if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <=
date.moyr(%s,%s)) month_1011 = %s.

the SAME as
if (fromdatesvc_s = date.moyr(%s,%s)) month_1011 = %s.

YMMV!!!

David
--

mpirritano wrote:
>
> Listers,
>
>
>
> I am using python to loop through files and create new aggregated
files,
> and then merge the results of the those files into one list.
>
>
>
> I am calculating the number of members in our program who saw their
pcp
> at least once in the past year for each month of a year. In other
words,
> including that month how many saw their pcp during that and the
previous

> 11 months.
>
>
>
> It appears to be working, but I'm unclear why the aggregated files do
> not exist after the program is done running. They must have existed at
> some point for the final merged file to have all of the data, but they
> do not appear in the folder in which they were created.
>
>
>
> Here is the syntax that creates the files that are later merged.
> Filedate is a variable that gives each file a date-time stamp.
>
>
>
> count = "1"
>
> month1 = "8"
>
> month2 = "7"
>
> year1 = "2009"
>
> year2 = "2010"
>
> while int(count) < 13:
>
> print count, month1, year1, month2, year2
>
> spss.Submit ("""
>
> get file
> ='D:\Data\AMM\medical_home_visits\medical_home_visits_individuals_%s'.
>
>
>
> if (fromdatesvc_s >= date.moyr(%s,%s)) & (fromdatesvc_s <=
> date.moyr(%s,%s)) month_1011 = %s.
>
> exe.
>
>
>
> SORT CASES BY month_1011 membid.
>
> AGGREGATE
>
>
>

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individua

> ls_%s'
>
> /PRESORTED
>
> /BREAK=month_1011 membid
>
> /medical_home_visit_sum=SUM(medical_home_visit).
>
>
>
> get file
>

='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_individuals_%s'.
>
>
>
> FILTER OFF.
>
> USE ALL.
>
> SELECT IF (medical_home_visit_sum>0)& (sysmis(month_1011) =
0).

>
> EXECUTE.
>
>
>
> SORT CASES BY month_1011.
>
> AGGREGATE
>
>
>

/OUTFILE='D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%

> s'
>
> /PRESORTED
>
> /BREAK=month_1011
>
> /N_BREAK=N.
>
>
>
> """ %(filedate, month1, year1, month2, year2, count, count,
> filedate, count, filedate,
>
> count, filedate))
>
> count = str(int(count) + 1)
>
>
>
> month1 = str(int(month1) + 1)
>
> if int(month1) > 12:
>
> month1 = "1"
>
> year1 = str(int(year1) + 1)
>
>
>
> month2 = str(int(month2) + 1)
>
> if int(month2) > 12:
>
> month2 = "1"
>
> year2 = str(int(year2) + 1)
>
>
>
> The following syntax merges the files just created above. The files
> created above just have two fields with one line each containing the
> month and the number of members with at least one visit for the year
> which includes that month and the 11 prior months. This merge works

but

> the files created above do not appear in the folder where they were
> created. But they must have existed at some point for this merge to
> work. Where'd they go?
>
>
>
> spss.Submit("""
>
> get file =
> 'D:\Data\AMM\medical_home_visits\1011_month1_medhome_visitor_%s'.
>
> dataset name start.
>
> """ % (filedate))
>
>
>
> count = "2"
>
> while int(count) < 13:
>
> spss.Submit("""
>
> get file =
> 'D:\Data\AMM\medical_home_visits\1011_month%s_medhome_visitor_%s'.
>
>
>
> dataset name addition.
>
>
>
> DATASET ACTIVATE start.
>
> ADD FILES /FILE=*
>
> /FILE='addition'.
>
> EXECUTE.""" % (count, filedate))
>
> count = str(int(count) + 1)
>
>
>
> spss.Submit("""
>
> save outfile =
> 'D:\Data\AMM\medical_home_visits\merged_medical_home_visits_%s'
>
> /compressed.
>
> """ % (filedate))
>
>
>
> I know this is a lot to process but if someone could just explain what
> is going on here I'd appreciate it. I would prefer to have a complete
> understanding of what is happening.
>
>
>
> Thanks
>
> Matt
>
>
>
>
>
>
>
> Matthew Pirritano, Ph.D.
>
> Research Analyst IV
>
> Medical Services Initiative (MSI)
>
> Orange County Health Care Agency
>
> (714) 568-5648
>

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/python-aggregate-loop-no-n
ew-files-tp4701209p4701462.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD