|
I was hoping you could provide me with some advice. I saw that Python is an open source programming language intertwined with SPSS: · Do you call Python code from within SPSS or do you call SPSS functions from within Python? · If both are possible, then which is optimal? · Is hard to integrate? · Can Python code be used to create additional SPSS datasets? Regarding Python, does it do the following: · Read multiple dataset one record at a time and compare values from each; then base on if-then logic write to multiple output files · Load a lookup table and then process a different file; based on if-then logic, access and lookup values in the table · Support modular “gosub”programming · Sort files · Date math and conversions · Would it be able to support the following type of logic: o Start o Read Record from File 1 o Read Record from File 2 o Match Logic § If Key 1 < Key 2, Write to output file A § If Key 1 = Key 2, Write to output file B § Key 1 > Key 2, Write to output file C · Is there a good syntax data editor such as that supported in SAS Enterprise Guide Can I just R to do all the above?
Any help would be greatly appreciated. Regards, John
John Filben |
|
First, I suggest that you download the Data Management book that is linked on SPSS Developer Central (www.spss.com/devcentral) for a detailed view of how to use programmability. And/or install the Python programmability plug-in for your SPSS version. That includes all the technical documentation. Other answers below. Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
I was hoping you could provide me with some advice. I saw that Python is an open source programming language intertwined with SPSS: · Do you call Python code from within SPSS or do you call SPSS functions from within Python? · If both are possible, then which is optimal? · Is hard to integrate? · Can Python code be used to create additional SPSS datasets? >>>You can use Python in your regular syntax stream (internal mode) or invoke SPSS from a Python program (external mode) Either way, you can run SPSS commands from your Python code and call the apis that work with SPSS If you are familiar with Python, using it with SPSS is very simple. Python code can create new datasets or modify existing ones both by issuing regular SPSS commands built with Python code and through specific apis that can do these tasks. Regarding Python, does it do the following: · Read multiple dataset one record at a time and compare values from each; then base on if-then logic write to multiple output files >>>You can certainly do this (V16 or later). The SPSSINC COMPARE DATASETS extension command, available from Developer Central, can serve as an example of how to do this. · Load a lookup table and then process a different file; based on if-then logic, access and lookup values in the table >>> General Python code can certainly do this. There is also a lookup function in the extendedTransforms module. · Support modular “gosub”programming >>>If you mean functions/subroutines/classes, certainly yes · Sort files >>>Although Python has sorting functions, you would sort by issuing appropriate SORT commands to SPSS · Date math and conversions >>>Yes, both through native Python code and various functions we provide to convert between SPSS and Python date/time representations · Would it be able to support the following type of logic: o Start o Read Record from File 1 o Read Record from File 2 o Match Logic § If Key 1 < Key 2, Write to output file A § If Key 1 = Key 2, Write to output file B § Key 1 > Key 2, Write to output file C >>> Certainly · Is there a good syntax data editor such as that supported in SAS Enterprise Guide >>>Besides the standard SPSS syntax and data editors, there are many IDE's available for Python. Can I just R to do all the above? >>>There is also an R plug-in available from Developer Central for using R within SPSS. But for controlling and interacting with SPSS, the Python plug-in is a better approach. Any help would be greatly appreciated. Regards, John
John Filben
|
|
Dear List,
I am struggling to format a table in a very precise way using CTABLES. The following syntax almost gives me what I need, but the SUBTOTAL gives me the overall total, not the total for two columns.
CTABLES
/TABLE (q1 + q2 + q3)[COUNT ROWPCT] /CATEGORIES VARIABLES=q1 q2 q3 [1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied'] /CLABELS ROWLABELS=OPPOSITE.
Thanks, John |
|
The SUBTOTAL keyword gives you the subtotal for all the categories preceding the SUBTOTAL keyword since the last subtotal. Since you have no preceding subtotal, it is adding up all the categories. It works this way, because otherwise, unless you go to the trouble of specifying a custom title (as you did here), the subtotal would be very misleading. The simplest solution would be to insert another subtotal after the first two categories. If you don't want to do that, you can use hiding subtotals in place of categories in order to stop the subtotal of interest from adding up the others. For the second strategy , you would change this [1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied'] to this [1 HSUBTOTAL='1label', 2, HSUBTOTAL='2label', 3, 4, SUBTOTAL='Satisfied + Very Satisfied'] where the hsubtotal labels are whatever identifiers you want to have appear for those categories. Another strategy would be to put the categories of interest first in the table: [3, 4, SUBTOTAL='Satisfied + Very Satisfied' 1 2] or use a subtotal above [1, 2, SUBTOTAL='Satisfied + Very Satisfied' 3 4] POSITION=BEFORE HTH,
Dear List, I am struggling to format a table in a very precise way using CTABLES. The following syntax almost gives me what I need, but the SUBTOTAL gives me the overall total, not the total for two columns. CTABLES /TABLE (q1 + q2 + q3)[COUNT ROWPCT] /CATEGORIES VARIABLES=q1 q2 q3 [1, 2, 3, 4, SUBTOTAL='Satisfied + Very Satisfied'] /CLABELS ROWLABELS=OPPOSITE.
Thanks, John
|
|
|
Humphrey,
>>1. If I want to multiply or divide all the values of variables in my SPSS spreadsheet by a constant what should I do? I'll assume that you can use syntax. Suppose your variables are x1 to z27 and all are numeric. Do repeat a=x1 to z27. + compute a=2*a. End repeat. Execute. >>2. Is it mathematically OK to compute the mean of a set of coefficients of corraltion? Yes, why not. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
In reply to this post by Humphrey Paulie
Re number 2, it's hard to say without knowing a bit more about the situation. Why do you want to compute a mean? If you have the correlation between the same 2 variables in several studies, and are trying to come up with a pooled estimate, you'd be better off using standard meta-analytic techniques, which give a sort of weighted mean. That would also entail applying Fisher's r-to-z transformation before computing the pooled estimate, and then transforming back.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Hi list,
Just wanted to know whether anyone on this list has tried reading any NCPDP (National Council for Prescription Drug Programs) formatted data files into SPSS. If you have, you know the file is loaded with non-printing characters, non-printing delimiters, and sundry other booby traps. If anyone had some experience, I would appreciate any helpful pointers you may care to suggest. TIA Mike ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bruce Weaver
Dear Humphrey,
I think it can be somewhat tricky to compute means over correlations. For instance, if there's two groups of people in your data, then the mean correlation between v1 and v2 (calculated from each group separately) can be extremely different from the correlation between v1 and v2 disregarding group membership. For a classic example, try: inp program. loop id=1 to 20. do if id le 10. comp group=1. comp v1=rv.nor(10,2). comp v2=rv.nor(10,2). else. comp group=2. comp v1=rv.nor(20,2). comp v2=rv.nor(20,2). end if. end cas. end loop. end fil. end inp pro. *Correlation regardless of group probably very high. cor v1 v2. *Correlations for each group separately probably close to zero. sor cas group. spl fil lay group. cor v1 v2. Kind regards, Ruben > Date: Fri, 11 Dec 2009 07:47:35 -0800 > From: [hidden email] > Subject: Re: Changing the variables > To: [hidden email] > > Humphrey-6 wrote: > > > > Dear colleagues, > > 1. If I � want to multiply or divide all the values of variables in my SPSS > > spreadsheet by a constant what should I do? > > 2. Is it mathematically OK to compute the mean of a set of coefficients of > > corraltion? > > Thanks > > Humphrey > > > > Re number 2, it's hard to say without knowing a bit more about the > situation. Why do you want to compute a mean? If you have the correlation > between the same 2 variables in several studies, and are trying to come up > with a pooled estimate, you'd be better off using standard meta-analytic > techniques, which give a sort of weighted mean. That would also entail > applying Fisher's r-to-z transformation before computing the pooled > estimate, and then transforming back. > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > -- > View this message in context: http://old.nabble.com/SPSS-Python-Question-tp26732069p26746519.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD New Windows 7: Find the right PC for you. Learn more. |
|
In reply to this post by Roberts, Michael
Hi Albert-Jan, Thank you for
the suggestion; I did change the setting to the Unicode= on, but it was not too
much help. I ended up changing the
various characters in a Hex editor (changed the HEX values to correspond to
ASCII printing characters) , saved the file as text, and read into SPSS. Using my process there are altogether
four steps (would like a more elegant method, however!) to reading in the NCPDP
data file: 1.
Using
a Hex editor, change the control characters (non-printing) to two distinct ASCII
printing ones, and save it as an ASCII file – result is a long record
with 1 case per record, but all variables are identifiable by the second
distinct separator; 2.
Read in the text file using the record
identifier to separate the records/cases as defined in the Hex editor; 3.
Save
the file as an ASCII file; 4.
Read
in the saved file in step 3, using variable separators to identify the
different variables. I know this is a clunky way to read in these data, but I can’t
think of a better way to do it! There are about 6 non-printing characters
identifying the various elements of a record in this type (NCPDP) of data file:
^B, ^C, ^], ^\, ^^, and Blank. We will be
working with potentially hundreds of these files, so any timesaving tips would
be very much appreciated! TIA Mike From: Albert-Jan Roskam [mailto:[hidden email]]
|
|
There is no need to suffer all that pain. If you know what the codes are for the non-printing characters, you can simply replace or delete them after reading in the data. The only tricky part is how you represent these codes in SPSS functions such as REPLACE. Here is code that would replace CR and LF with a blank. compute strvar = replace(strvar, string(10, pib1),' '). compute strvar = replace(strvar, string(13, pib1),' '). This could also be handled, of course by Python character functions. HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Hi Albert-Jan,
Thank you for the suggestion; I did change the setting to the Unicode= on, but it was not too much help. I ended up changing the various characters in a Hex editor (changed the HEX values to correspond to ASCII printing characters) , saved the file as text, and read into SPSS. Using my process there are altogether four steps (would like a more elegant method, however!) to reading in the NCPDP data file:
I know this is a clunky way to read in these data, but I can’t think of a better way to do it! There are about 6 non-printing characters identifying the various elements of a record in this type (NCPDP) of data file: ^B, ^C, ^], ^\, ^^, and Blank.
We will be working with potentially hundreds of these files, so any timesaving tips would be very much appreciated!
TIA
Mike
From: Albert-Jan Roskam [mailto:fomcl@...]
|
|
Jon, Thank you for
the tip. My difficulty was in first
reading the data into SPSS; the file is one long series of data (single line),
with different Hex codes used as delimiters; oddly enough, CR and LF are not
one of those used. The following definitions
gives an idea of what the file layout is like: Ctrl B (^B)
indicates “start of text” which actually indicates the start of a
case/record. Ctrl C (^C)
indicates “end of text” – end of the record. Ctrl \ (^\) is a “file separator”
which really separates the variable names or headers/labels. Ctrl D (^D)
separates a group of data – it is the “group separator”. Ctrl E (^E) record
separator or block-mode terminator. Blank (20 in
hex) just blank space. I was able to
gather the above by looking at a smallish file in Ultraedit, cross reference
the HEX values with ASCII, replace pipe (‘|’) delimiters for ‘^B’
(start of text), to define each
record and blank space the end of each record, together with some online documentation
of the NCPDP data formatting. By
the way, and I am sure you already know this, but with the exception of the
blank value, all the other non-printing characters appear to be little square
boxes in text mode. Only in HEX can
I see the actual value! I also used ‘
” ’ (double quotes) for the ‘^\’ (header separators),
and blank spaces for the other separator characters. This allowed me to read in the data so
it looked like I had cases, and saved it as a text file. Then, I read it the data a second time,
knowing that I could use the ‘ “ ‘ as the delimiter for the variables. There probably is a way to combine both
steps into one, that you, or someone else on this list knows of, but for the
life of me, I cannot do it J Thanking You Sincerely Mike From: Jon K Peck [mailto:[hidden email]]
Thank
you for the suggestion; I did change the setting to the Unicode= on, but it was
not too much help. I ended up changing the various characters in a Hex
editor (changed the HEX values to correspond to ASCII printing characters) ,
saved the file as text, and read into SPSS. Using my process there are
altogether four steps (would like a more elegant method, however!) to reading
in the NCPDP data file:
I know
this is a clunky way to read in these data, but I can’t think of a better
way to do it! There are about 6 non-printing characters identifying
the various elements of a record in this type (NCPDP) of data file: ^B, ^C, ^],
^\, ^^, and Blank.
We
will be working with potentially hundreds of these files, so any timesaving
tips would be very much appreciated!
TIA
Mike
From: Albert-Jan Roskam [[hidden email]]
|
|
In reply to this post by Jon K Peck
I am using a trial version of Camtasia Studio to develop some tutorial
videos about PASW/SPSS. This would be part of a series of seminars and/or handouts on "Jumpstart Statistics" which is my title for material intended to help people whose research projects are stalled because they don't know what the next step would be. I'm not sure if the videos are worth all the time and trouble, and I'd like some feedback from members of this list. Please send the feedback directly to me ([hidden email]) rather than to the list. The videos I developed show how to enter and analyze data from a two by two table in PASW/SPSS. They can be found at: http://www.pmean.com/09/TwoByTwo.html If these work well, I might develop additional videos on importing data, basic data entry and documentation, and manipulation of dates. I'm also thinking about some simple examples of descriptive graphics and tabular summaries (boxplots, scatterplots, and crosstabulations) and maybe a simple introduction to regression models. The audience for these videos would be beginning researchers about to embark on their first research project. The videos about two by two tables are still a bit rough around the edges, but I would appreciate some feedback from you. In particular, do the videos add value compared to the text pages that have static screen shots and no audio? --- Steve Simon, Standard Disclaimer Two free webinars coming soon! "What do all these numbers mean? Odds ratios, relative risks, and number needed to treat" Thursday, December 17, 2009, 11am-noon, CST. "The first three steps in a descriptive data analysis, with examples in PASW/SPSS" Thursday, January 21, 2010, 11am-noon, CST. Details at www.pmean.com/webinars ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Roberts, Michael
|
|
Jon,
Albert-Jan, Thank you for
the input. Jon: Not sure how I would go about doing everything
in one step since by reading in the data, I have only 1 long string, now in .sav
format(?) ... perhaps, now that I am
thinking about it some more, I should simply ignore the second delimiter I
created in HEX to separate the record(?) .
I am actually interested in the INPUT
PROGRAM idea, since it looks like this will be an ongoing project for some time
until we can automate a few things (long time, I think!). Any pointers from you would be very much
appreciated. Albert-Jan: The
NCPDP standard is pretty comprehensive, however its implementation seems to
vary depending upon who is doing the data collection, etc. As far as a utility to read in the data, I was
unable to find anything other than several commercial vendors offering a sort
of comprehensive solution, which we don’t really need. Also, thank you for the program code – I will
give it a shot, shortly J Thanking You Sincerely Mike From: Albert-Jan Roskam [mailto:[hidden email]]
|
|
Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Jon, Albert-Jan,
Thank you for the input.
Jon: Not sure how I would go about doing everything in one step since by reading in the data, I have only 1 long string, now in .sav format(?) ... perhaps, now that I am thinking about it some more, I should simply ignore the second delimiter I created in HEX to separate the record(?) . I am actually interested in the INPUT PROGRAM idea, since it looks like this will be an ongoing project for some time until we can automate a few things (long time, I think!). Any pointers from you would be very much appreciated. >>>What I was thinking of was something along the lines of Albert-Jan's code. That would be a little preprocessor for the data that you would run before reading it into SPSS. Since you would already have Python installed, that could be a little bat file. That's probably easier than an INPUT PROGRAM, but INPUT programs can read records and build cases out of them based on whatever logic you need. Regards, Jon Albert-Jan: The NCPDP standard is pretty comprehensive, however its implementation seems to vary depending upon who is doing the data collection, etc. As far as a utility to read in the data, I was unable to find anything other than several commercial vendors offering a sort of comprehensive solution, which we don’t really need. Also, thank you for the program code – I will give it a shot, shortly J
Thanking You Sincerely
Mike
From: Albert-Jan Roskam [mailto:fomcl@...]
|
| Free forum by Nabble | Edit this page |
