Getting stuff into XML output

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting stuff into XML output

Simon Palmer-4
Hello everyone,

I have been working with XML generated by the OMS with some success, but would like to exercise finer control. I have a couple of specific questions but would be interested in any clues about where to find this type of information generally:

1. How do I set the number of decimals, for example if I wanted to change the number of decimals from 2 to 1:

- <category text="Mean">
  <cell decimals="2" number="3.9141414141414" text="3.91" />
  </category>

I guess this might be determined directly by what is generated by the procedures (in this case descriptives) but I don't seem to be able to influence that without resorting to manual editing of the pivot table.

2. Can I insert text of my own into the structure so that it can be read by my XML parser? An obvious place (to me at least) is the comments section of the notes (apparently empty in the following snippet):

- <pivotTable subType="Notes" text="Notes">
- <dimension axis="row" text="Contents">
- <category text="Output Created">
  <cell date="2010-01-06T10:44:32.51" format="datetime" text="06-JAN-2010 10:44:32" />
  </category>
- <category text="Comments">
  <cell text="" />
  </category>

Apologies if this is all documented but I have not been able to uncover it. If it is documented I would be grateful for being pointed in the right direction. Failing that I would appreciate any advice about my specific issues.

Thanks in anticipation,
Simon
Reply | Threaded
Open this post in threaded view
|

Re: Getting stuff into XML output

Jon K Peck

Interesting questions.  See below.
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Simon Palmer <[hidden email]>
To: [hidden email]
Date: 01/07/2010 04:39 PM
Subject: [SPSSX-L] Getting stuff into XML output
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Hello everyone,

I have been working with XML generated by the OMS with some success, but would like to exercise finer control. I have a couple of specific questions but would be interested in any clues about where to find this type of information generally:

1. How do I set the number of decimals, for example if I wanted to change the number of decimals from 2 to 1:

- <category text="Mean">
  <cell decimals="2" number="3.9141414141414" text="3.91" />
  </category>

I guess this might be determined directly by what is generated by the procedures (in this case descriptives) but I don't seem to be able to influence that without resorting to manual editing of the pivot table.

>>>Number formats in output are determined in many cases from the format of the variables used.  So for your example, changing the number of decimals in the variable would affect the decimals in the output.  However, when there isn't a direct relationship, this won't help.  But see below for another approach.


2. Can I insert text of my own into the structure so that it can be read by my XML parser? An obvious place (to me at least) is the comments section of the notes (apparently empty in the following snippet):

- <pivotTable subType="Notes" text="Notes">
- <dimension axis="row" text="Contents">
- <category text="Output Created">
  <cell date="2010-01-06T10:44:32.51" format="datetime" text="06-JAN-2010 10:44:32" />
  </category>
- <category text="Comments">
  <cell text="" />
  </category>
>>>You can't do this directly, but there are two ways to insert comments in the output .  The COMMENT command, aka *, writes text to the log block, and this can be included in your OMS output.
A better way, though, so that you don't get tangled up in all the other stuff in the log, is to use the TEXT extension command.  This command, downloadable from SPSS Developer Central, www.spss.com/devcentral, creates a text block object that can even have html or rtf formatting in it.  It's quite useful for annotating output.

Here's an example:
The syntax
TEXT "This comment was generated by the TEXT extension command."
/OUTLINE HEADING="Comment"
TITLE="Comment".

resulted in this capture with OMS.

        <command command="Comment" displayOutlineValues="label" displayOutlineVariables="label" ...
                <textBlock text="Comment">
                        <line>This comment was generated by the TEXT extension command.</line>
                </textBlock>
        </command>

If you want to do more with the xml within your SPSS session, you can have OMS write the xml to the xmlworkspace and then use Python programmability, starting with spss.EvaluateXPath, to retrieve and manipulate it before writing it to a file for downstream processing.

Python has excellent facilities for working with XML.  I'd suggest ElementTree, which is part of the standard Python library as the place to start if you want to go this way.





Apologies if this is all documented but I have not been able to uncover it. If it is documented I would be grateful for being pointed in the right direction. Failing that I would appreciate any advice about my specific issues.

Thanks in anticipation,
Simon

Reply | Threaded
Open this post in threaded view
|

Re: Getting stuff into XML output

Albert-Jan Roskam
Interesting indeed. But when I was running one related example from the Spss data management book (for v17, I'm using that version) I consistently get an error related to StartPython.exe that makes spss crash.

*python_retrieve_output_value.sps.
BEGIN PROGRAM.
import spss, spssaux
spss.Submit("GET FILE='c:/program files/spss/Employee data.sav'.")
cmd="DESCRIPTIVES VARIABLES=salary,salbegin,jobtime,prevexp."
desc_table,errcode=spssaux.CreateXMLOutput(cmd, omsid="Descriptives")
meansal=spssaux.GetValuesFromXMLWorkspace(desc_table, \
  tableSubtype="Descriptive Statistics", rowCategory="Current Salary", \
  colCategory="Mean", cellAttrib="text")
if meansal:
  print "The mean salary is: ", meansal[0]
END PROGRAM.

Two notes: (1) Python 2.5 is in c:/program files/PyGTK/Python (not the default). (2) I sometimes get errors related to explorer.exe on my pc. Yes, I need to reinstall  ghost image, and no it's not much work ;-)

Any idea why this program is not working? I suspect this also explains why an extension command that I was recently making kept crashing.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Fri, 1/8/10, Jon K Peck <[hidden email]> wrote:

From: Jon K Peck <[hidden email]>
Subject: Re: [SPSSX-L] Getting stuff into XML output
To: [hidden email]
Date: Friday, January 8, 2010, 2:09 AM


Interesting questions.  See below.
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Simon Palmer <[hidden email]>
To: [hidden email]
Date: 01/07/2010 04:39 PM
Subject: [SPSSX-L] Getting stuff into XML output
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Hello everyone,

I have been working with XML generated by the OMS with some success, but would like to exercise finer control. I have a couple of specific questions but would be interested in any clues about where to find this type of information generally:

1. How do I set the number of decimals, for example if I wanted to change the number of decimals from 2 to 1:

- <category text="Mean">
  <cell decimals="2" number="3.9141414141414" text="3.91" />
  </category>

I guess this might be determined directly by what is generated by the procedures (in this case descriptives) but I don't seem to be able to influence that without resorting to manual editing of the pivot table.

>>>Number formats in output are determined in many cases from the format of the variables used.  So for your example, changing the number of decimals in the variable would affect the decimals in the output.  However, when there isn't a direct relationship, this won't help.  But see below for another approach.


2. Can I insert text of my own into the structure so that it can be read by my XML parser? An obvious place (to me at least) is the comments section of the notes (apparently empty in the following snippet):

- <pivotTable subType="Notes" text="Notes">
- <dimension axis="row" text="Contents">
- <category text="Output Created">
  <cell date="2010-01-06T10:44:32.51" format="datetime" text="06-JAN-2010 10:44:32" />
  </category>
- <category text="Comments">
  <cell text="" />
  </category>
>>>You can't do this directly, but there are two ways to insert comments in the output .  The COMMENT command, aka *, writes text to the log block, and this can be included in your OMS output.
A better way, though, so that you don't get tangled up in all the other stuff in the log, is to use the TEXT extension command.  This command, downloadable from SPSS Developer Central, www.spss.com/devcentral, creates a text block object that can even have html or rtf formatting in it.  It's quite useful for annotating output.

Here's an example:
The syntax
TEXT "This comment was generated by the TEXT extension command."
/OUTLINE HEADING="Comment"
TITLE="Comment".

resulted in this capture with OMS.

        <command command="Comment" displayOutlineValues="label" displayOutlineVariables="label" ...
                <textBlock text="Comment">
                        <line>This comment was generated by the TEXT extension command.</line>
                </textBlock>
        </command>

If you want to do more with the xml within your SPSS session, you can have OMS write the xml to the xmlworkspace and then use Python programmability, starting with spss.EvaluateXPath, to retrieve and manipulate it before writing it to a file for downstream processing.

Python has excellent facilities for working with XML.  I'd suggest ElementTree, which is part of the standard Python library as the place to start if you want to go this way.





Apologies if this is all documented but I have not been able to uncover it. If it is documented I would be grateful for being pointed in the right direction. Failing that I would appreciate any advice about my specific issues.

Thanks in anticipation,
Simon


Reply | Threaded
Open this post in threaded view
|

Re: Getting stuff into XML output

Jon K Peck

I don't get any misbehavior running this code in 17.0.2.  (And I think your extension command problems were as I diagnosed on DevCentral).

The mean salary is:  $34,419.57


Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Albert-Jan Roskam <[hidden email]>
To: [hidden email]
Date: 01/08/2010 04:28 AM
Subject: Re: [SPSSX-L] Getting stuff into XML output
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Interesting indeed. But when I was running one related example from the Spss data management book (for v17, I'm using that version) I consistently get an error related to StartPython.exe that makes spss crash.

*python_retrieve_output_value.sps.
BEGIN PROGRAM.
import spss, spssaux
spss.Submit("GET FILE='c:/program files/spss/Employee data.sav'.")
cmd="DESCRIPTIVES VARIABLES=salary,salbegin,jobtime,prevexp."
desc_table,errcode=spssaux.CreateXMLOutput(cmd, omsid="Descriptives")
meansal=spssaux.GetValuesFromXMLWorkspace(desc_table, \
 tableSubtype="Descriptive Statistics", rowCategory="Current Salary", \
 colCategory="Mean", cellAttrib="text")
if meansal:
 print "The mean salary is: ", meansal[0]
END PROGRAM.

Two notes: (1) Python 2.5 is in c:/program files/PyGTK/Python (not the default). (2) I sometimes get errors related to explorer.exe on my pc. Yes, I need to reinstall  ghost image, and no it's not much work ;-)

Any idea why this program is not working? I suspect this also explains why an extension command that I was recently making kept crashing.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Fri, 1/8/10, Jon K Peck <[hidden email]> wrote:


From: Jon K Peck <[hidden email]>
Subject: Re: [SPSSX-L] Getting stuff into XML output
To: [hidden email]
Date: Friday, January 8, 2010, 2:09 AM


Interesting questions.  See below.

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Simon Palmer <[hidden email]>
To: [hidden email]
Date: 01/07/2010 04:39 PM
Subject: [SPSSX-L] Getting stuff into XML output
Sent by: "SPSSX(r) Discussion" <[hidden email]>






Hello everyone,

I have been working with XML generated by the OMS with some success, but would like to exercise finer control. I have a couple of specific questions but would be interested in any clues about where to find this type of information generally:

1. How do I set the number of decimals, for example if I wanted to change the number of decimals from 2 to 1:

-
<category text="Mean">
 <cell decimals="2" number="3.9141414141414" text="3.91" />
 </category>

I guess this might be determined directly by what is generated by the procedures (in this case descriptives) but I don't seem to be able to influence that without resorting to manual editing of the pivot table.

>>>Number formats in output are determined in many cases from the format of the variables used.  So for your example, changing the number of decimals in the variable would affect the decimals in the output.  However, when there isn't a direct relationship, this won't help.  But see below for another approach.


2. Can I insert text of my own into the structure so that it can be read by my XML parser? An obvious place (to me at least) is the comments section of the notes (apparently empty in the following snippet):

-
<pivotTable subType="Notes" text="Notes">
-
<dimension axis="row" text="Contents">
-
<category text="Output Created">
 <cell date="2010-01-06T10:44:32.51" format="datetime" text="06-JAN-2010 10:44:32" />
 </category>
-
<category text="Comments">
 <cell text="" />
 </category>
>>>You can't do this directly, but there are two ways to insert comments in the output .  The COMMENT command, aka *, writes text to the log block, and this can be included in your OMS output.

A better way, though, so that you don't get tangled up in all the other stuff in the log, is to use the TEXT extension command.  This command, downloadable from SPSS Developer Central,
www.spss.com/devcentral, creates a text block object that can even have html or rtf formatting in it.  It's quite useful for annotating output.

Here's an example:

The syntax

TEXT "This comment was generated by the TEXT extension command."

/OUTLINE HEADING="Comment"

TITLE="Comment".


resulted in this capture with OMS.


       
<command command="Comment" displayOutlineValues="label" displayOutlineVariables="label" ...
               
<textBlock text="Comment">
                       
<line>This comment was generated by the TEXT extension command.</line>
               
</textBlock>
       
</command>

If you want to do more with the xml within your SPSS session, you can have OMS write the xml to the xmlworkspace and then use Python programmability, starting with spss.EvaluateXPath, to retrieve and manipulate it before writing it to a file for downstream processing.


Python has excellent facilities for working with XML.  I'd suggest ElementTree, which is part of the standard Python library as the place to start if you want to go this way.






Apologies if this is all documented but I have not been able to uncover it. If it is documented I would be grateful for being pointed in the right direction. Failing that I would appreciate any advice about my specific issues.

Thanks in anticipation,
Simon



Reply | Threaded
Open this post in threaded view
|

Help with errors

Dean Tindall

Hey Guys,

 

I have a slight problem with a data file and am coming up against a brick wall, your help will be most appreciated!!!

 

Ok....

 

 I received a data file with some errors present in one of the variables, I have now also been sent an excel file with corrections for the data along with the unique identifier for the both files.  However this new data only corrects the mistakes in the old file, and is only composed of the variable needed and the unique identifier,

 

This prevents me from using the add variable and add cases commands as replacing the specified variable will then render some cases data less, and adding cases will result in cases with only single variable responses,

 

Therefore, my question is, is there a way to “overwrite” specific cases within a variable, i.e. based upon a common unique identifier, and leave the rest untouched, so as to rectify my problem.....

 

I hope this is clear, although I have a feeling it is not,

 

Thanks in advance!

 

Dean

This email was sent from:- Nunwood Consulting Ltd. (registered in England and Wales no. 3135953) whose head office is based at:- 7, Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA. Tel +44 (0) 845 372 0101 Fax +44 (0) 845 372 0102 Web http://www.nunwood.com Email [hidden email] This e-mail is confidential and intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, copying of, or any action taken in reliance upon it, is strictly prohibited and may be illegal. To review our privacy policy please visit: www.nunwood.com/privacypolicy.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with errors

Art Kendall
Look up "UPDATE" in <help>.
Be sure not to overwrite any of the input files.

Art Kendall
Social Research Consultants

Dean Tindall wrote:

Hey Guys,

 

I have a slight problem with a data file and am coming up against a brick wall, your help will be most appreciated!!!

 

Ok....

 

 I received a data file with some errors present in one of the variables, I have now also been sent an excel file with corrections for the data along with the unique identifier for the both files.  However this new data only corrects the mistakes in the old file, and is only composed of the variable needed and the unique identifier,

 

This prevents me from using the add variable and add cases commands as replacing the specified variable will then render some cases data less, and adding cases will result in cases with only single variable responses,

 

Therefore, my question is, is there a way to “overwrite” specific cases within a variable, i.e. based upon a common unique identifier, and leave the rest untouched, so as to rectify my problem.....

 

I hope this is clear, although I have a feeling it is not,

 

Thanks in advance!

 

Dean

This email was sent from:- Nunwood Consulting Ltd. (registered in England and Wales no. 3135953) whose head office is based at:- 7, Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA. Tel +44 (0) 845 372 0101 Fax +44 (0) 845 372 0102 Web http://www.nunwood.com Email [hidden email] This e-mail is confidential and intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, copying of, or any action taken in reliance upon it, is strictly prohibited and may be illegal. To review our privacy policy please visit: www.nunwood.com/privacypolicy.html
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Help with errors

Albert-Jan Roskam
In reply to this post by Dean Tindall
Hi,

Have a look under the UPDATE command in the syntax reference guide.

update file = 'oldfile.sav' / file = 'newfile.sav' / by=id.


Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Fri, 1/8/10, Dean Tindall <[hidden email]> wrote:

From: Dean Tindall <[hidden email]>
Subject: [SPSSX-L] Help with errors
To: [hidden email]
Date: Friday, January 8, 2010, 4:05 PM

Hey Guys,

 

I have a slight problem with a data file and am coming up against a brick wall, your help will be most appreciated!!!

 

Ok....

 

 I received a data file with some errors present in one of the variables, I have now also been sent an excel file with corrections for the data along with the unique identifier for the both files.  However this new data only corrects the mistakes in the old file, and is only composed of the variable needed and the unique identifier,

 

This prevents me from using the add variable and add cases commands as replacing the specified variable will then render some cases data less, and adding cases will result in cases with only single variable responses,

 

Therefore, my question is, is there a way to “overwrite” specific cases within a variable, i.e. based upon a common unique identifier, and leave the rest untouched, so as to rectify my problem.....

 

I hope this is clear, although I have a feeling it is not,

 

Thanks in advance!

 

Dean

This email was sent from:- Nunwood Consulting Ltd. (registered in England and Wales no. 3135953) whose head office is based at:- 7, Airport West, Lancaster Way, Yeadon, Leeds, LS19 7ZA. Tel +44 (0) 845 372 0101 Fax +44 (0) 845 372 0102 Web http://www.nunwood.com Email [hidden email] This e-mail is confidential and intended solely for the use of the individual(s) to whom it is addressed. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, copying of, or any action taken in reliance upon it, is strictly prohibited and may be illegal. To review our privacy policy please visit: www.nunwood.com/privacypolicy.html

Reply | Threaded
Open this post in threaded view
|

Re: Getting stuff into XML output

Simon Palmer-4
In reply to this post by Jon K Peck
Hello Jon,

Thank you for your response. It has been a while I know but I wanted to test this all out before posting again.

A better way, though, so that you don't get tangled up in all the other stuff in the log, is to use the TEXT extension command.

This works very nicely thanks, and I expect will be useful in future. Unfortunately I did not provide enough context to the problem I was trying to solve and I don't think TEXT will work in my current situation. To be specific, I have to report statistics on a set of variables for this year and last year. The command to run them is the same and the variables have the same name, so in my XML this year and last year are not readily distinguishable. My experience to date is that the XML has been produced in the order of the commands in the syntax, so I can work out which block of XML relates to which year by order, however I can find nothing in any documentation that guarantees me this is the case. So I figured if I could get something into the Notes Pivot Table that distinguished this year from last year I could trawl up and down the path via the common ancestor (the command node) to make sure I really was getting the data from the year I thought I was.

The TEXT extension you pointed me at produces a separate block with no unique ancestor for the two time periods, so it doesn't provide the guarantee I'm looking for. I expect if I were to go one step further and use your second suggestion...

If you want to do more with the xml within your SPSS session, you can have OMS write the xml to the xmlworkspace and then use Python programmability, starting with spss.EvaluateXPath, to retrieve and manipulate it before writing it to a file for downstream processing. 

... I could probably add in the annotations I require, but it seems like a big hammer for a little nail.

I have found a simple work-round, which is to rename the dataset to "CurrentYear" or "PreviousYear" as appropriate because this is automatically passed through. But it's not conceptually ideal. The other approach I contemplated was to use CTABLES rather than DESCRIPTIVES, because I can pass through a title. But again, using CTABLES to get out some means and Ns seems unnecessarily heavy.

Anyway I wanted to thank you for your input.

Cheers,
Simon
Reply | Threaded
Open this post in threaded view
|

Re: Getting stuff into XML output

Jon K Peck

See below.
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Simon Palmer <[hidden email]>
To: [hidden email]
Date: 01/18/2010 08:55 PM
Subject: Re: [SPSSX-L] Getting stuff into XML output
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Hello Jon,

Thank you for your response. It has been a while I know but I wanted to test this all out before posting again.

A better way, though, so that you don't get tangled up in all the other stuff in the log, is to use the TEXT extension command.

This works very nicely thanks, and I expect will be useful in future. Unfortunately I did not provide enough context to the problem I was trying to solve and I don't think TEXT will work in my current situation. To be specific, I have to report statistics on a set of variables for this year and last year. The command to run them is the same and the variables have the same name, so in my XML this year and last year are not readily distinguishable. My experience to date is that the XML has been produced in the order of the commands in the syntax, so I can work out which block of XML relates to which year by order, however I can find nothing in any documentation that guarantees me this is the case. So I figured if I could get something into the Notes Pivot Table that distinguished this year from last year I could trawl up and down the path via the common ancestor (the command node) to make sure I really was getting the data from the year I thought I was.
>>>
The order of the XML will always match the order of command execution.  You will also always have an Active Dataset text block that identifies the file being used, so that might help.  Of course, the data file name will also be in the Notes table.  However, there is no direct way to add text to the Notes table.  You could generate a pivot table in your syntax with whatever identifying information you want by using the programmability pivot table apis in your job stream, but that wouldn't really be different from including a text block with that information.  The TEXT command can already be captured in the xml request as can the Active Dataset block.


The TEXT extension you pointed me at produces a separate block with no unique ancestor for the two time periods, so it doesn't provide the guarantee I'm looking for. I expect if I were to go one step further and use your second suggestion...

If you want to do more with the xml within your SPSS session, you can have OMS write the xml to the xmlworkspace and then use Python programmability, starting with spss.EvaluateXPath, to retrieve and manipulate it before writing it to a file for downstream processing.

... I could probably add in the annotations I require, but it seems like a big hammer for a little nail.

I have found a simple work-round, which is to rename the dataset to "CurrentYear" or "PreviousYear" as appropriate because this is automatically passed through. But it's not conceptually ideal. The other approach I contemplated was to use CTABLES rather than DESCRIPTIVES, because I can pass through a title. But again, using CTABLES to get out some means and Ns seems unnecessarily heavy.

Anyway I wanted to thank you for your input.

Cheers,
Simon