SPSSX Discussion

SPSS/Python Question

Classic

List

Threaded

16 messages Options

John Filben

SPSS/Python Question

I was hoping you could provide me with some advice. I saw that Python is an open source programming language intertwined with SPSS:

· Do you call Python code from within SPSS or do you call SPSS functions from within Python?

· If both are possible, then which is optimal?

· Is hard to integrate?

· Can Python code be used to create additional SPSS datasets?

Regarding Python, does it do the following:

· Read multiple dataset one record at a time and compare values from each; then base on if-then logic write to multiple output files

· Load a lookup table and then process a different file; based on if-then logic, access and lookup values in the table

· Support modular “gosub”programming

· Sort files

· Date math and conversions

· Would it be able to support the following type of logic:

o Start

o Read Record from File 1

o Read Record from File 2

o Match Logic

§ If Key 1 < Key 2, Write to output file A

§ If Key 1 = Key 2, Write to output file B

§ Key 1 > Key 2, Write to output file C

· Is there a good syntax data editor such as that supported in SAS Enterprise Guide

Can I just R to do all the above?

Any help would be greatly appreciated.

Regards,

John

John Filben
Cell Phone - 773.401.2822
Email - [hidden email]

Jon K Peck

Re: SPSS/Python Question

First, I suggest that you download the Data Management book that is linked on SPSS Developer Central (www.spss.com/devcentral) for a detailed view of how to use programmability. And/or install the Python programmability plug-in for your SPSS version. That includes all the technical documentation.

Other answers below.
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	John Filben <[hidden email]>
To:	[hidden email]
Date:	12/10/2009 11:23 AM
Subject:	[SPSSX-L] SPSS/Python Question
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

I was hoping you could provide me with some advice. I saw that Python is an open source programming language intertwined with SPSS:
· Do you call Python code from within SPSS or do you call SPSS functions from within Python?
· If both are possible, then which is optimal?
· Is hard to integrate?
· Can Python code be used to create additional SPSS datasets?
>>>You can use Python in your regular syntax stream (internal mode) or invoke SPSS from a Python program (external mode)
Either way, you can run SPSS commands from your Python code and call the apis that work with SPSS
If you are familiar with Python, using it with SPSS is very simple.
Python code can create new datasets or modify existing ones both by issuing regular SPSS commands built with Python code and through specific apis that can do these tasks.

Regarding Python, does it do the following:
· Read multiple dataset one record at a time and compare values from each; then base on if-then logic write to multiple output files
>>>You can certainly do this (V16 or later). The SPSSINC COMPARE DATASETS extension command, available from Developer Central, can serve as an example of how to do this.
· Load a lookup table and then process a different file; based on if-then logic, access and lookup values in the table
>>> General Python code can certainly do this. There is also a lookup function in the extendedTransforms module.
· Support modular “gosub”programming
>>>If you mean functions/subroutines/classes, certainly yes
· Sort files
>>>Although Python has sorting functions, you would sort by issuing appropriate SORT commands to SPSS
· Date math and conversions
>>>Yes, both through native Python code and various functions we provide to convert between SPSS and Python date/time representations
· Would it be able to support the following type of logic:
o Start
o Read Record from File 1
o Read Record from File 2
o Match Logic
§ If Key 1 < Key 2, Write to output file A
§ If Key 1 = Key 2, Write to output file B
§ Key 1 > Key 2, Write to output file C
>>> Certainly

· Is there a good syntax data editor such as that supported in SAS Enterprise Guide
>>>Besides the standard SPSS syntax and data editors, there are many IDE's available for Python.

Can I just R to do all the above?
>>>There is also an R plug-in available from Developer Central for using R within SPSS. But for controlling and interacting with SPSS, the Python plug-in is a better approach.

Any help would be greatly appreciated.

Regards,
John

John Filben
Cell Phone - 773.401.2822
Email - [hidden email]

J P-6

Re: SPSS/Python Question

Dear List,

I am struggling to format a table in a very precise way using CTABLES. The following syntax almost gives me what I need, but the SUBTOTAL gives me the overall total, not the total for two columns.

CTABLES

/TABLE (q1 + q2 + q3)[COUNT ROWPCT]

/CATEGORIES VARIABLES=q1 q2 q3 [1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied']

/CLABELS ROWLABELS=OPPOSITE.

Thanks,

John

Jon K Peck

Re: CTABLES Question

The SUBTOTAL keyword gives you the subtotal for all the categories preceding the SUBTOTAL keyword since the last subtotal. Since you have no preceding subtotal, it is adding up all the categories. It works this way, because otherwise, unless you go to the trouble of specifying a custom title (as you did here), the subtotal would be very misleading.

The simplest solution would be to insert another subtotal after the first two categories. If you don't want to do that, you can use hiding subtotals in place of categories in order to stop the subtotal of interest from adding up the others.

For the second strategy , you would change this

[1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied']

to this

[1 HSUBTOTAL='1label', 2, HSUBTOTAL='2label', 3, 4, SUBTOTAL='Satisfied + Very Satisfied']

where the hsubtotal labels are whatever identifiers you want to have appear for those categories.

Another strategy would be to put the categories of interest first in the table:

[3, 4, SUBTOTAL='Satisfied + Very Satisfied' 1 2]

or use a subtotal above

[1, 2, SUBTOTAL='Satisfied + Very Satisfied' 3 4] POSITION=BEFORE

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	J P <[hidden email]>
To:	[hidden email]
Date:	12/10/2009 02:32 PM
Subject:	Re: [SPSSX-L] SPSS/Python Question
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Dear List,

I am struggling to format a table in a very precise way using CTABLES. The following syntax almost gives me what I need, but the SUBTOTAL gives me the overall total, not the total for two columns.

CTABLES
/TABLE (q1 + q2 + q3)[COUNT ROWPCT]

/CATEGORIES VARIABLES=q1 q2 q3 [1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied']

/CLABELS ROWLABELS=OPPOSITE.

Thanks,

John

Humphrey Paulie

Changing the variables

Dear colleagues,

1. If I want to multiply or divide all the values of variables in my SPSS spreadsheet by a constant what should I do?

2. Is it mathematically OK to compute the mean of a set of coefficients of corraltion?

Thanks

Humphrey

Maguin, Eugene

Re: Changing the variables

Humphrey,

>>1. If I want to multiply or divide all the values of variables in my SPSS
spreadsheet by a constant what should I do?

I'll assume that you can use syntax. Suppose your variables are x1 to z27
and all are numeric.

Do repeat a=x1 to z27.
+ compute a=2*a.
End repeat.
Execute.

>>2. Is it mathematically OK to compute the mean of a set of coefficients of
corraltion?

Yes, why not.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Changing the variables

Administrator

In reply to this post by Humphrey Paulie

Humphrey-6 wrote

Dear colleagues,
1. If I want to multiply or divide all the values of variables in my SPSS spreadsheet by a constant what should I do?
2. Is it mathematically OK to compute the mean of a set of coefficients of corraltion?
Thanks
Humphrey

Re number 2, it's hard to say without knowing a bit more about the situation. Why do you want to compute a mean? If you have the correlation between the same 2 variables in several studies, and are trying to come up with a pooled estimate, you'd be better off using standard meta-analytic techniques, which give a sort of weighted mean. That would also entail applying Fisher's r-to-z transformation before computing the pooled estimate, and then transforming back.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Roberts, Michael

Reading NCPDP formatted data into SPSS

Hi list,

Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ruben Geert van den Berg

Re: Changing the variables

In reply to this post by Bruce Weaver

Dear Humphrey,

I think it can be somewhat tricky to compute means over correlations. For instance, if there's two groups of people in your data, then the mean correlation between v1 and v2 (calculated from each group separately) can be extremely different from the correlation between v1 and v2 disregarding group membership. For a classic example, try:

inp program.
loop id=1 to 20.
do if id le 10.
comp group=1.
comp v1=rv.nor(10,2).
comp v2=rv.nor(10,2).
else.
comp group=2.
comp v1=rv.nor(20,2).
comp v2=rv.nor(20,2).
end if.
end cas.
end loop.
end fil.
end inp pro.

*Correlation regardless of group probably very high.

cor v1 v2.

*Correlations for each group separately probably close to zero.

sor cas group.
spl fil lay group.
cor v1 v2.

Kind regards,

Ruben

> Date: Fri, 11 Dec 2009 07:47:35 -0800
> From: [hidden email]
> Subject: Re: Changing the variables
> To: [hidden email]
>
> Humphrey-6 wrote:
> >
> > Dear colleagues,
> > 1. If I � want to multiply or divide all the values of variables in my SPSS
> > spreadsheet by a constant what should I do?
> > 2. Is it mathematically OK to compute the mean of a set of coefficients of
> > corraltion?
> > Thanks
> > Humphrey
> >
>
> Re number 2, it's hard to say without knowing a bit more about the
> situation. Why do you want to compute a mean? If you have the correlation
> between the same 2 variables in several studies, and are trying to come up
> with a pooled estimate, you'd be better off using standard meta-analytic
> techniques, which give a sort of weighted mean. That would also entail
> applying Fisher's r-to-z transformation before computing the pooled
> estimate, and then transforming back.
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
> --
> View this message in context: http://old.nabble.com/SPSS-Python-Question-tp26732069p26746519.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

New Windows 7: Find the right PC for you. Learn more.

Roberts, Michael

Re: Reading NCPDP formatted data into SPSS

In reply to this post by Roberts, Michael

Hi Albert-Jan,

Thank you for the suggestion; I did change the setting to the Unicode= on, but it was not too much help. I ended up changing the various characters in a Hex editor (changed the HEX values to correspond to ASCII printing characters) , saved the file as text, and read into SPSS. Using my process there are altogether four steps (would like a more elegant method, however!) to reading in the NCPDP data file:

2. Read in the text file using the record identifier to separate the records/cases as defined in the Hex editor;

3. Save the file as an ASCII file;

4. Read in the saved file in step 3, using variable separators to identify the different variables.

I know this is a clunky way to read in these data, but I can’t think of a better way to do it! There are about 6 non-printing characters identifying the various elements of a record in this type (NCPDP) of data file: ^B, ^C, ^], ^\, ^^, and Blank.

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [mailto:[hidden email]]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

Hi,

Did you try running the command SET UNICODE = ON before opening the file? That might help deal with funny characters.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Fri, 12/11/09, Roberts, Michael <[hidden email]> wrote:

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM

Hi list,

Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: Reading NCPDP formatted data into SPSS

There is no need to suffer all that pain. If you know what the codes are for the non-printing characters, you can simply replace or delete them after reading in the data.

The only tricky part is how you represent these codes in SPSS functions such as REPLACE.
Here is code that would replace CR and LF with a blank.

compute strvar = replace(strvar, string(10, pib1),' ').
compute strvar = replace(strvar, string(13, pib1),' ').

This could also be handled, of course by Python character functions.

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	"Roberts, Michael" <[hidden email]>
To:	[hidden email]
Date:	12/14/2009 10:05 AM
Subject:	Re: [SPSSX-L] Reading NCPDP formatted data into SPSS
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Hi Albert-Jan,

1. Using a Hex editor, change the control characters (non-printing) to two distinct ASCII printing ones, and save it as an ASCII file – result is a long record with 1 case per record, but all variables are identifiable by the second distinct separator;
2. Read in the text file using the record identifier to separate the records/cases as defined in the Hex editor;
3. Save the file as an ASCII file;
4. Read in the saved file in step 3, using variable separators to identify the different variables.

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [mailto:fomcl@...]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM

Hi list,

Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Roberts, Michael

Re: Reading NCPDP formatted data into SPSS

Jon,

Thank you for the tip. My difficulty was in first reading the data into SPSS; the file is one long series of data (single line), with different Hex codes used as delimiters; oddly enough, CR and LF are not one of those used. The following definitions gives an idea of what the file layout is like:

Ctrl B (^B) indicates “start of text” which actually indicates the start of a case/record.

Ctrl C (^C) indicates “end of text” – end of the record.

Ctrl \ (^\) is a “file separator” which really separates the variable names or headers/labels.

Ctrl D (^D) separates a group of data – it is the “group separator”.

Ctrl E (^E) record separator or block-mode terminator.

Blank (20 in hex) just blank space.

I was able to gather the above by looking at a smallish file in Ultraedit, cross reference the HEX values with ASCII, replace pipe (‘|’) delimiters for ‘^B’ (start of text), to define each record and blank space the end of each record, together with some online documentation of the NCPDP data formatting. By the way, and I am sure you already know this, but with the exception of the blank value, all the other non-printing characters appear to be little square boxes in text mode. Only in HEX can I see the actual value!

I also used ‘ ” ’ (double quotes) for the ‘^\’ (header separators), and blank spaces for the other separator characters. This allowed me to read in the data so it looked like I had cases, and saved it as a text file. Then, I read it the data a second time, knowing that I could use the ‘ “ ‘ as the delimiter for the variables. There probably is a way to combine both steps into one, that you, or someone else on this list knows of, but for the life of me, I cannot do it J

Thanking You

Sincerely

Mike

From: Jon K Peck [mailto:[hidden email]]
Sent: Monday, December 14, 2009 12:33 PM
To: Roberts, Michael
Cc: [hidden email]
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

From:	"Roberts, Michael" <[hidden email]>
To:	[hidden email]
Date:	12/14/2009 10:05 AM
Subject:	Re: [SPSSX-L] Reading NCPDP formatted data into SPSS
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Hi Albert-Jan,

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [[hidden email]]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM

Hi list,

Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Steve Simon, P.Mean Consulting

Feedback wanted on SPSS videos

In reply to this post by Jon K Peck

I am using a trial version of Camtasia Studio to develop some tutorial
videos about PASW/SPSS. This would be part of a series of seminars
and/or handouts on "Jumpstart Statistics" which is my title for material
intended to help people whose research projects are stalled because they
don't know what the next step would be.

I'm not sure if the videos are worth all the time and trouble, and I'd
like some feedback from members of this list. Please send the feedback
directly to me ([hidden email]) rather than to the list.

The videos I developed show how to enter and analyze data from a two by
two table in PASW/SPSS. They can be found at:

http://www.pmean.com/09/TwoByTwo.html

If these work well, I might develop additional videos on importing data,
basic data entry and documentation, and manipulation of dates. I'm also
thinking about some simple examples of descriptive graphics and tabular
summaries (boxplots, scatterplots, and crosstabulations) and maybe a
simple introduction to regression models.

The audience for these videos would be beginning researchers about to
embark on their first research project.

The videos about two by two tables are still a bit rough around the
edges, but I would appreciate some feedback from you. In particular, do
the videos add value compared to the text pages that have static screen
shots and no audio?
---
Steve Simon, Standard Disclaimer
Two free webinars coming soon!
"What do all these numbers mean? Odds ratios,
relative risks, and number needed to treat"
Thursday, December 17, 2009, 11am-noon, CST.
"The first three steps in a descriptive
data analysis, with examples in PASW/SPSS"
Thursday, January 21, 2010, 11am-noon, CST.
Details at www.pmean.com/webinars

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: Reading NCPDP formatted data into SPSS

In reply to this post by Roberts, Michael

Hi Mike,

I was doing something similar the other day. Basically, you want to convert certain ascii signs to other ascii signs, right? For instance "end of record" to "newline". Doesn't the NCPDP organization have some utility for this? I'd try something like the following (success not guaranteed ;-). It's up to you what the translation table looks like:

BEGIN PROGRAM.

import string, random

def create_testfile():
    f = open("d:/temp/ncpdp.txt", "wb")
    for i in range(10**5):
        char = random.randint(0, 127)
        f.write(chr(char))
create_testfile()

def ditch_punctuation(infile, outfile):
    translation_table = {1:10, 2:10, 30: 10, 32: 9} # specify decimal number as 'from: to'. check e.g. www.asciitable.com
    old = "".join([chr(char) for char in translation_table.keys()])
    new = "".join([chr(char) for char in translation_table.values()])
    trans = string.maketrans(old, new)
    f_out = open(outfile, "wb")
    for line in open(infile, "rb"):
        out = string.translate(line, trans)
        f_out.write(out)

ditch_punctuation(infile="d:/temp/ncpdp.txt", outfile="d:/temp/ncpdp_converted.txt")
END PROGRAM.

From: Roberts, Michael <[hidden email]>
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Monday, December 14, 2009, 6:01 PM

Hi Albert-Jan,

Thank you for the suggestion; I did change the setting to the Unicode= on, but it was not too much help. I ended up changing the various characters in a Hex editor (changed the HEX values to correspond to ASCII printing characters) , saved the file as text, and read into SPSS. Using my process there are altogether four steps (would like a more elegant method, however!) to reading in the NCPDP data file:

1.      Using a Hex editor, change the control characters (non-printing) to two distinct ASCII printing ones, and save it as an ASCII file – result is a long record with 1 case per record, but all variables are identifiable by the second distinct separator;
2.      Read in the text file using the record identifier to separate the records/cases as defined in the Hex editor;
3.      Save the file as an ASCII file;
4.      Read in the saved file in step 3, using variable separators to identify the different variables.

I know this is a clunky way to read in these data, but I can’t think of a better way to do it! There are about 6 non-printing characters identifying the various elements of a record in this type (NCPDP) of data file: ^B, ^C, ^], ^\, ^^, and Blank.

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [mailto:[hidden email]]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

Hi,

Did you try running the command SET UNICODE = ON before opening the file? That might help deal with funny characters.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Fri, 12/11/09, Roberts, Michael <[hidden email]> wrote:

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM

Hi list,

Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Roberts, Michael

Re: Reading NCPDP formatted data into SPSS

Jon, Albert-Jan,

Thank you for the input.

Jon: Not sure how I would go about doing everything in one step since by reading in the data, I have only 1 long string, now in .sav format(?) ... perhaps, now that I am thinking about it some more, I should simply ignore the second delimiter I created in HEX to separate the record(?) . I am actually interested in the INPUT PROGRAM idea, since it looks like this will be an ongoing project for some time until we can automate a few things (long time, I think!). Any pointers from you would be very much appreciated.

Albert-Jan: The NCPDP standard is pretty comprehensive, however its implementation seems to vary depending upon who is doing the data collection, etc. As far as a utility to read in the data, I was unable to find anything other than several commercial vendors offering a sort of comprehensive solution, which we don’t really need. Also, thank you for the program code – I will give it a shot, shortly J

Thanking You

Sincerely

Mike

From: Albert-Jan Roskam [mailto:[hidden email]]
Sent: Tuesday, December 15, 2009 5:51 AM
To: [hidden email]; Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

Hi Mike,

BEGIN PROGRAM.

import string, random

def create_testfile():
    f = open("d:/temp/ncpdp.txt", "wb")
    for i in range(10**5):
        char = random.randint(0, 127)
        f.write(chr(char))
create_testfile()

ditch_punctuation(infile="d:/temp/ncpdp.txt", outfile="d:/temp/ncpdp_converted.txt")
END PROGRAM.

From: Roberts, Michael <[hidden email]>
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Monday, December 14, 2009, 6:01 PM

Hi Albert-Jan,

Thank you for the suggestion; I did change the setting to the Unicode= on, but it was not too much help. I ended up changing the various characters in a Hex editor (changed the HEX values to correspond to ASCII printing characters) , saved the file as text, and read into SPSS. Using my process there are altogether four steps (would like a more elegant method, however!) to reading in the NCPDP data file:

2. Read in the text file using the record identifier to separate the records/cases as defined in the Hex editor;

3. Save the file as an ASCII file;

4. Read in the saved file in step 3, using variable separators to identify the different variables.

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [mailto:[hidden email]]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM

Jon K Peck

Re: Reading NCPDP formatted data into SPSS

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	"Roberts, Michael" <[hidden email]>
To:	"[hidden email]" <[hidden email]>, Jon K Peck/Chicago/IBM@IBMUS
Cc:	"[hidden email]" <[hidden email]>
Date:	12/15/2009 12:26 PM
Subject:	RE: [SPSSX-L] Reading NCPDP formatted data into SPSS

Jon, Albert-Jan,

Thank you for the input.

>>>What I was thinking of was something along the lines of Albert-Jan's code. That would be a little preprocessor for the data that you would run before reading it into SPSS. Since you would already have Python installed, that could be a little bat file. That's probably easier than an INPUT PROGRAM, but INPUT programs can read records and build cases out of them based on whatever logic you need.

Regards,

Jon

Thanking You

Sincerely

Mike

From: Albert-Jan Roskam [mailto:fomcl@...]
Sent: Tuesday, December 15, 2009 5:51 AM
To: [hidden email]; Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

Hi Mike,

BEGIN PROGRAM.

import string, random

def create_testfile():
f = open("d:/temp/ncpdp.txt", "wb")
for i in range(10**5):
char = random.randint(0, 127)
f.write(chr(char))
create_testfile()

def ditch_punctuation(infile, outfile):
translation_table = {1:10, 2:10, 30: 10, 32: 9} # specify decimal number as 'from: to'. check e.g. www.asciitable.com
old = "".join([chr(char) for char in translation_table.keys()])
new = "".join([chr(char) for char in translation_table.values()])
trans = string.maketrans(old, new)
f_out = open(outfile, "wb")
for line in open(infile, "rb"):
out = string.translate(line, trans)
f_out.write(out)

ditch_punctuation(infile="d:/temp/ncpdp.txt", outfile="d:/temp/ncpdp_converted.txt")
END PROGRAM.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Mon, 12/14/09, Roberts, Michael <[hidden email]> wrote:

From: Roberts, Michael <[hidden email]>
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Monday, December 14, 2009, 6:01 PM

Hi Albert-Jan,

2. Read in the text file using the record identifier to separate the records/cases as defined in the Hex editor;

3. Save the file as an ASCII file;

4. Read in the saved file in step 3, using variable separators to identify the different variables.

We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!

TIA

Mike

From: Albert-Jan Roskam [mailto:fomcl@...]
Sent: Saturday, December 12, 2009 4:26 PM
To: Roberts, Michael
Subject: Re: [SPSSX-L] Reading NCPDP formatted data into SPSS

From: Roberts, Michael <[hidden email]>
Subject: [SPSSX-L] Reading NCPDP formatted data into SPSS
To: [hidden email]
Date: Friday, December 11, 2009, 6:00 PM