Computation of Variable

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Computation of Variable

Swetal Sindhvad
Dear SPSS users:
I have used Transform - Compute Variable function to compute a variable and
now I want to see the computation for that variable, but I am unable to do
this. How does one see the actual computation, or formula, used to compute
the variable?
Thank you!

- Swetal Sindhvad

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Computation of Variable

Spousta Jan
Dear Swetal,

You can see the transformations in the Journal (=log), of course only if you have it on and in the Append mode.

In the menu, go Edit -> Options and find there, where the Journal is (tabs General or File locations, depending on your version). Then open the log in a text editor and find the commands.

Best regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swetal Sindhvad
Sent: Tuesday, March 11, 2008 12:23 AM
To: [hidden email]
Subject: Computation of Variable

Dear SPSS users:
I have used Transform - Compute Variable function to compute a variable and now I want to see the computation for that variable, but I am unable to do this. How does one see the actual computation, or formula, used to compute the variable?
Thank you!

- Swetal Sindhvad

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Reading text data with

Roberts, Michael
In reply to this post by Swetal Sindhvad
Good Morning List,

I would appreciate some advice on how to read-in text data where the
data has header information followed by the related data like so:

Header information:
Line1: Var1 value, var2 value
Line2: Var3 value
Line3: var4 value
Line4: var5 value, var6 value
Line5: var6 value, var7 value, var8 value

Data:
Var9 var10, var11,...,varn

The pattern repeats after about 21 lines of data, with different header
information each time.  I have already tried the multi-line read-in, but
because of the differing formats, the resulting data is ridiculously
difficult to work with.  I would appreciate any suggestions.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Computation of Variable

Melissa Ives
In reply to this post by Swetal Sindhvad
Two other options,
1) Use 'paste' instead of run, to paste the syntax, or
2) If you set your Viewer options (Edit--Options, Viewer tab) to have the "Display commands in log" (bottom left check box), then the syntax will automatically be included in your output file.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swetal Sindhvad
Sent: Monday, March 10, 2008 6:23 PM
To: [hidden email]
Subject: [SPSSX-L] Computation of Variable

Dear SPSS users:
I have used Transform - Compute Variable function to compute a variable and now I want to see the computation for that variable, but I am unable to do this. How does one see the actual computation, or formula, used to compute the variable?
Thank you!

- Swetal Sindhvad

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Pearmain, Michael
In reply to this post by Roberts, Michael
My advice would be to use a python script to with to read the lines of data and create a list, containing all values,
i.e[2,3,4,ade,45,er]
from here you could  loop over the list to write out a new file with one entry per line.
Then use a standard import from SPSS to read the data

Mike


-----Original Message-----
From: SPSSX(r) Discussion on behalf of Roberts, Michael
Sent: Tue 3/11/2008 10:26 AM
To: [hidden email]
Subject:      Reading text data with
 
Good Morning List,

I would appreciate some advice on how to read-in text data where the
data has header information followed by the related data like so:

Header information:
Line1: Var1 value, var2 value
Line2: Var3 value
Line3: var4 value
Line4: var5 value, var6 value
Line5: var6 value, var7 value, var8 value

Data:
Var9 var10, var11,...,varn

The pattern repeats after about 21 lines of data, with different header
information each time.  I have already tried the multi-line read-in, but
because of the differing formats, the resulting data is ridiculously
difficult to work with.  I would appreciate any suggestions.

TIA

Mike

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Maguin, Eugene
In reply to this post by Roberts, Michael
Michael,

Perhaps others will better understand what you are describing. I have a
couple of clarifying questions.

You say the structure is the following.

Header information:
Line1: Var1 value, var2 value
Line2: Var3 value
Line3: var4 value
Line4: var5 value, var6 value
Line5: var6 value, var7 value, var8 value

Data:
Var9 var10, var11,...,varn

May I assume that the data actually look like this?

Var1 2.073, var2 1.999
Var3 .0087
var4 99999
var5 234.5, var6 -12.89
var6 -1.00, var7 873.2, var8 10000
2.78 3.238, -1.34,..., 3.4

That is, the header section contains a combination of text that is the names
of the variables followed by the value of the variables.

Do you want to keep the header information? Or, can it be discarded?

Is there a single line of data containing the values for var9 to varn with a
header, data sequence?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Roberts, Michael
Gene,

Thank you for the question.

Your reading of the data layout for the header is right-on, although the
data is alphanumeric.  The second set of data is on two lines (two lines
make a case), and includes a field header for each variable, however.
One twist is that the header data is not consistent - while I show five
lines of data, sometimes one of those lines is not included - sort of
like an additional address element that does not exist for an address.
This layout has played havoc with my attempts to read the data into
SPSS!!!

The problem is that I need to keep the header information, since each of
the subsequent data cases are associated with the header data.  This
data file was generated by our systems persons from a mainframe as a
report, but is practically useless in its present form, and any help
would be very, very appreciated!

TIA

Mike


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: Tuesday, March 11, 2008 6:20 PM
To: [hidden email]
Subject: Re: Reading text data with

Michael,

Perhaps others will better understand what you are describing. I have a
couple of clarifying questions.

You say the structure is the following.

Header information:
Line1: Var1 value, var2 value
Line2: Var3 value
Line3: var4 value
Line4: var5 value, var6 value
Line5: var6 value, var7 value, var8 value

Data:
Var9 var10, var11,...,varn

May I assume that the data actually look like this?

Var1 2.073, var2 1.999
Var3 .0087
var4 99999
var5 234.5, var6 -12.89
var6 -1.00, var7 873.2, var8 10000
2.78 3.238, -1.34,..., 3.4

That is, the header section contains a combination of text that is the
names
of the variables followed by the value of the variables.

Do you want to keep the header information? Or, can it be discarded?

Is there a single line of data containing the values for var9 to varn
with a
header, data sequence?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Richard Ristow
At 07:44 PM 3/11/2008, Roberts, Michael wrote:

[notes on data layout omitted]
>The second set of data is on two lines (two lines make a case), and
>includes a field header for each variable, however. One twist is
>that the header data is not consistent - while I show five lines of
>data, sometimes one of those lines is not included - sort of like an
>additional address element that does not exist for an address. [And]
>I need to keep the header information,

It looks like Gene's onto this one; but, it could be clearer if you'd
post a few cases, including ones with differing numbers of header lines.

Right now, I'd think about using an INPUT PROGRAM, but I haven't seen
your data, nor looked at your problem as hard as Gene has.


--
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date: 3/10/2008 7:27 PM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Maguin, Eugene
In reply to this post by Roberts, Michael
Michael,

When you look at your data file, how do you know 1) where one case ends and
another begins and 2) within a case, how do you know where the header ends
and the two data lines begin. I think the problem is being able to find a
pattern in the data structure that you use to structure the data reading
operation. It also sounds as if the data structure varies. That will pose
additional problems. While you are working on using spss to read this,
another possibility to investigate is going back to the systems people and
asking them if they would write a post-processing program to restructure the
data more to spss' liking. You might also being reading up on the Input
Program. I'm almost certain that you will need a custom input procedure.

Also, I agree with Richard on the usefulness of posting some data. I'd
suggest that you post 3-4 cases of data. I also suggest that you select
cases that illustrate the variability in the data structure.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Roberts, Michael
In reply to this post by Richard Ristow
Thank you all for responding.  Here is what the simulated data looks
like, warts and all!  The second set of header data is missing the suite
information, so there are only 4 lines of data:

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0100053  blah blah INC.

 1234 WEST blah STREET

 SUITE 123

 TAMPA             ,FL 33607-4173

-blah: 010005300     COUNTY: 29     COUNTY DESCRIPTION: HILLSBOROUGH

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 123456700   blah blah INC                blah blah      1234 W blah
AVENUE   TAMPA              FL     33000-0000
 90          1598841579                          1       555-555-5555
07/01/06  99/99/99
 543210101   Doe, John J., Md.          1234 N blah STREE
PLANT CITY         FL     33555-4302
 27          1003804675                          1       123-456-7890
07/01/06  99/99/99

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1185
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0150009  more blah blah, INC

 5678 SW ATH STREET

0CORAL GABLES      ,FL 33134

-blah: 015000900     COUNTY: 06     COUNTY DESCRIPTION: BROWARD

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 011234450   Junior, John DO             blah blah      3456 S Blah ROAD
FT LAUDERDALE      FL     33333-0000
 26          1235128794                          1      777-123-3456
07/01/06  99/99/99
 023456700   Brown, John A            The blah Group  1000 W Anywhere RD
CORAL SPRINGS      FL     33000-0000
 25          1740275528                          1      555-444-5555
07/01/06  99/99/99


TIA

Mike

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Tuesday, March 11, 2008 8:46 PM
To: [hidden email]
Subject: Re: Reading text data with

At 07:44 PM 3/11/2008, Roberts, Michael wrote:

[notes on data layout omitted]
>The second set of data is on two lines (two lines make a case), and
>includes a field header for each variable, however. One twist is
>that the header data is not consistent - while I show five lines of
>data, sometimes one of those lines is not included - sort of like an
>additional address element that does not exist for an address. [And]
>I need to keep the header information,

It looks like Gene's onto this one; but, it could be clearer if you'd
post a few cases, including ones with differing numbers of header lines.

Right now, I'd think about using an INPUT PROGRAM, but I haven't seen
your data, nor looked at your problem as hard as Gene has.


--
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date:
3/10/2008 7:27 PM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Pearmain, Michael
I've said it before and i'll say it again...
use python a small program will do this






-----Original Message-----
From: SPSSX(r) Discussion on behalf of Roberts, Michael
Sent: Wed 3/12/2008 5:57 PM
To: [hidden email]
Subject:      Re: Reading text data with
 
Thank you all for responding.  Here is what the simulated data looks
like, warts and all!  The second set of header data is missing the suite
information, so there are only 4 lines of data:

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0100053  blah blah INC.

 1234 WEST blah STREET

 SUITE 123

 TAMPA             ,FL 33607-4173

-blah: 010005300     COUNTY: 29     COUNTY DESCRIPTION: HILLSBOROUGH

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 123456700   blah blah INC                blah blah      1234 W blah
AVENUE   TAMPA              FL     33000-0000
 90          1598841579                          1       555-555-5555
07/01/06  99/99/99
 543210101   Doe, John J., Md.          1234 N blah STREE
PLANT CITY         FL     33555-4302
 27          1003804675                          1       123-456-7890
07/01/06  99/99/99

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1185
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0150009  more blah blah, INC

 5678 SW ATH STREET

0CORAL GABLES      ,FL 33134

-blah: 015000900     COUNTY: 06     COUNTY DESCRIPTION: BROWARD

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 011234450   Junior, John DO             blah blah      3456 S Blah ROAD
FT LAUDERDALE      FL     33333-0000
 26          1235128794                          1      777-123-3456
07/01/06  99/99/99
 023456700   Brown, John A            The blah Group  1000 W Anywhere RD
CORAL SPRINGS      FL     33000-0000
 25          1740275528                          1      555-444-5555
07/01/06  99/99/99


TIA

Mike

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Tuesday, March 11, 2008 8:46 PM
To: [hidden email]
Subject: Re: Reading text data with

At 07:44 PM 3/11/2008, Roberts, Michael wrote:

[notes on data layout omitted]
>The second set of data is on two lines (two lines make a case), and
>includes a field header for each variable, however. One twist is
>that the header data is not consistent - while I show five lines of
>data, sometimes one of those lines is not included - sort of like an
>additional address element that does not exist for an address. [And]
>I need to keep the header information,

It looks like Gene's onto this one; but, it could be clearer if you'd
post a few cases, including ones with differing numbers of header lines.

Right now, I'd think about using an INPUT PROGRAM, but I haven't seen
your data, nor looked at your problem as hard as Gene has.


--
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date:
3/10/2008 7:27 PM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Pearmain, Michael
To expand make a map listing of the variables and pass over all irrelevant info at the start if it's fixed then read in what you need


-----Original Message-----
From: SPSSX(r) Discussion on behalf of Pearmain, Michael
Sent: Wed 3/12/2008 9:29 PM
To: [hidden email]
Subject:      Re: Reading text data with
 
I've said it before and i'll say it again...
use python a small program will do this






-----Original Message-----
From: SPSSX(r) Discussion on behalf of Roberts, Michael
Sent: Wed 3/12/2008 5:57 PM
To: [hidden email]
Subject:      Re: Reading text data with
 
Thank you all for responding.  Here is what the simulated data looks
like, warts and all!  The second set of header data is missing the suite
information, so there are only 4 lines of data:

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0100053  blah blah INC.

 1234 WEST blah STREET

 SUITE 123

 TAMPA             ,FL 33607-4173

-blah: 010005300     COUNTY: 29     COUNTY DESCRIPTION: HILLSBOROUGH

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 123456700   blah blah INC                blah blah      1234 W blah
AVENUE   TAMPA              FL     33000-0000
 90          1598841579                          1       555-555-5555
07/01/06  99/99/99
 543210101   Doe, John J., Md.          1234 N blah STREE
PLANT CITY         FL     33555-4302
 27          1003804675                          1       123-456-7890
07/01/06  99/99/99

1RUN DATE: 02/06/08                     blah blah MANAGEMENT INFORMATION
SYSTEM                                PAGE:  1185
                                                blah GROUP blah PROJECT

                                                         AD HOC 08-035

-blah BASE: 0150009  more blah blah, INC

 5678 SW ATH STREET

0CORAL GABLES      ,FL 33134

-blah: 015000900     COUNTY: 06     COUNTY DESCRIPTION: BROWARD

-INDIVIDUAL  NAME                                ADDRESS

 TYPE        TPI        TOMY   ZIP           STATUS
PHONE                  IND          BDATE     EDATE
+_______________________________________________________________________
_____________________________________________________________
 011234450   Junior, John DO             blah blah      3456 S Blah ROAD
FT LAUDERDALE      FL     33333-0000
 26          1235128794                          1      777-123-3456
07/01/06  99/99/99
 023456700   Brown, John A            The blah Group  1000 W Anywhere RD
CORAL SPRINGS      FL     33000-0000
 25          1740275528                          1      555-444-5555
07/01/06  99/99/99


TIA

Mike

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Tuesday, March 11, 2008 8:46 PM
To: [hidden email]
Subject: Re: Reading text data with

At 07:44 PM 3/11/2008, Roberts, Michael wrote:

[notes on data layout omitted]
>The second set of data is on two lines (two lines make a case), and
>includes a field header for each variable, however. One twist is
>that the header data is not consistent - while I show five lines of
>data, sometimes one of those lines is not included - sort of like an
>additional address element that does not exist for an address. [And]
>I need to keep the header information,

It looks like Gene's onto this one; but, it could be clearer if you'd
post a few cases, including ones with differing numbers of header lines.

Right now, I'd think about using an INPUT PROGRAM, but I haven't seen
your data, nor looked at your problem as hard as Gene has.


--
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date:
3/10/2008 7:27 PM

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Richard Ristow
In reply to this post by Roberts, Michael
At 05:57 PM 3/12/2008, Roberts, Michael wrote:

>Thank you all for responding.  Here is what the simulated data looks
>like, warts and all!

OK. This is printer-image data, in the form for the line printers
that were standard for a long time on IBM mainframes and elsewhere.
It's hard to tell your line length, but on most of these printers the
maximum line length was 132 characters. Records were usually 133
characters long, with the first character controlling the printer:
blank - Print on next line
1     - Print at head of a new page
0     - Skip a line, then print
-     - Skip 2 lines, then print
+     - Print on top the previous line (used for underscores, etc.)

I see the following types of line:

A. Page header - two lines, or three? (That is, should  "AD HOC
08-035" be on the second line?) This isn't really what they look
like; I've shortened them to fit here:
1RUN DATE: 02/06/08  blah blah MANAGEMENT INFORMATION SYSTEM   PAGE:  1
                                      blah GROUP blah PROJECT
                                                AD HOC 08-035
Conjecture: No information from these lines is needed in the final dataset.

B. "Base" groups of lines:
-blah BASE: 0100053  blah blah INC.
  1234 WEST blah STREET
  SUITE 123
  TAMPA             ,FL 33607-4173
Conjecture: The "base" is that 7-digit number. It, and the name and
address on these four lines, are retained and apply to all subsequent
data until the next "base" group.

C. "County" lines:
-blah: 010005300     COUNTY: 29     COUNTY DESCRIPTION: HILLSBOROUGH
Conjecture: There may be several counties for one 'base', though you
have only one per base in the example data. County data is retained
and applies to all subsequent data until the next "base" group or
"county" line.

D. "Individual" headers and records:
It looks like there are two lines with field headers, then a line
that prints underscores under them, then any number of lines each
with data for an individual. Here is what I can see, with "element"
names from a set of header lines, and data ("values") for two individuals:

Element       Value                     Value
INDIVIDUAL    011234450                 023456700
NAME          Junior, John DO           Brown, John A
??????        blah blah                 The blah Group
ADDRESS       3456 S Blah ROAD          1000 W Anywhere RD
(wrapped)     FT LAUDERDALE      FL     CORAL SPRINGS      FL
TYPE          ????                      ????
TPI           ????                      ????
TOMY          ????                      ????
ZIP           33333-0000                33000-000
STATUS        26                        25
??????        1235128794                1740275528
??????        1                         1
PHONE         777-123-3456              555-444-5555
IND           ????                      ????
BDATE         07/01/06                  07/01/06
EDATE         99/99/99                  99/99/99

. I've matched values to element names partly by order, partly by
morphology ("777-123-3456" has the form of a phone number, for example)
. The addresses above are on two lines, but they're on a single line
in the data; I've wrapped them so the lines fit, above
. Where there's "??????" for an element name, there's a data element
that doesn't seem to match any name in the headers, after making the
assignments that seem clearly right
. Where there's "????" for a value, I don't see any data that seems
to match the name. (Are TYPE, TPI and TOMY part of the address, somehow?)
...............
I expect you need logic like this; or, anyway, this is how I did it
(in SAS) the last time I had to:

A. Read a line. Classify it into one of the above categories.

B. If it's a page header line, or one of the lines of element names
for individuals, ignore it. (However, if it's a line of element
names, that may be useful as an indication that individual data will follow.)

C. It it's the start of a "base" group, as indicated by "BASE:" being
the second token on the line, read the values from the four lines in
the group, and keep them (LEAVE statement) for future use. (What is
the meaning of the "blah" that precedes the word "BASE:"? Is it a
value that needs to be kept?)

D. If it's a "county" line ("COUNTY:" is the 4th token), read the
county (number) and description, and save for future use. Does the
"blah" that begins the line need to be saved?

E. If it's lines for an individual (probably indicated by preceding
lines of element names with underscores), read the elements as above,
except correct and fill in the things I couldn't get. Write a record
(END CASE) with the individual data plus the last preceding "base"
and "county" data.

It'd be an INPUT PROGRAM, of course.

Python? Python's probably better suited to writing parsers in than
native SPSS is. I'm not sure how you'd do the path from external file
to Python to SPSS data file. Use Python without the SPSS interface,
to pre-process the file into easier-to-recognize lines, write that
out, then read in SPSS? Or how would you do it?

-Onward, ever onward,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading text data with

Roberts, Michael
Good Afternoon Listers,

I want to thank all responders for taking the time to send helpful ideas
and solutions to my query.  I especially want to thank Gene and Richard
for their time, knowledge, and willingness to help with this knotty data
problem with only a skeleton dataset to go on.

The syntax and ideas from Gene and Richard have allowed me to extract
the text data which Richard described below as 'Printer image data' with
relative ease.  For those of you who haven't experienced it - this is
not the best formatted data to work with!  Am I glad there are such
talented and experienced members here in this list!!!

Again, my thanks to you.
Sincerely

Mike


-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Thursday, March 13, 2008 2:20 AM
To: Roberts, Michael; [hidden email]
Cc: Gene Maguin; Pearmain, Michael
Subject: Re: Reading text data with

At 05:57 PM 3/12/2008, Roberts, Michael wrote:

>Thank you all for responding.  Here is what the simulated data looks
>like, warts and all!

OK. This is printer-image data, in the form for the line printers
that were standard for a long time on IBM mainframes and elsewhere.
It's hard to tell your line length, but on most of these printers the
maximum line length was 132 characters. Records were usually 133
characters long, with the first character controlling the printer:
blank - Print on next line
1     - Print at head of a new page
0     - Skip a line, then print
-     - Skip 2 lines, then print
+     - Print on top the previous line (used for underscores, etc.)

I see the following types of line:

A. Page header - two lines, or three? (That is, should  "AD HOC
08-035" be on the second line?) This isn't really what they look
like; I've shortened them to fit here:
1RUN DATE: 02/06/08  blah blah MANAGEMENT INFORMATION SYSTEM   PAGE:  1
                                      blah GROUP blah PROJECT
                                                AD HOC 08-035
Conjecture: No information from these lines is needed in the final
dataset.

B. "Base" groups of lines:
-blah BASE: 0100053  blah blah INC.
  1234 WEST blah STREET
  SUITE 123
  TAMPA             ,FL 33607-4173
Conjecture: The "base" is that 7-digit number. It, and the name and
address on these four lines, are retained and apply to all subsequent
data until the next "base" group.

C. "County" lines:
-blah: 010005300     COUNTY: 29     COUNTY DESCRIPTION: HILLSBOROUGH
Conjecture: There may be several counties for one 'base', though you
have only one per base in the example data. County data is retained
and applies to all subsequent data until the next "base" group or
"county" line.

D. "Individual" headers and records:
It looks like there are two lines with field headers, then a line
that prints underscores under them, then any number of lines each
with data for an individual. Here is what I can see, with "element"
names from a set of header lines, and data ("values") for two
individuals:

Element       Value                     Value
INDIVIDUAL    011234450                 023456700
NAME          Junior, John DO           Brown, John A
??????        blah blah                 The blah Group
ADDRESS       3456 S Blah ROAD          1000 W Anywhere RD
(wrapped)     FT LAUDERDALE      FL     CORAL SPRINGS      FL
TYPE          ????                      ????
TPI           ????                      ????
TOMY          ????                      ????
ZIP           33333-0000                33000-000
STATUS        26                        25
??????        1235128794                1740275528
??????        1                         1
PHONE         777-123-3456              555-444-5555
IND           ????                      ????
BDATE         07/01/06                  07/01/06
EDATE         99/99/99                  99/99/99

. I've matched values to element names partly by order, partly by
morphology ("777-123-3456" has the form of a phone number, for example)
. The addresses above are on two lines, but they're on a single line
in the data; I've wrapped them so the lines fit, above
. Where there's "??????" for an element name, there's a data element
that doesn't seem to match any name in the headers, after making the
assignments that seem clearly right
. Where there's "????" for a value, I don't see any data that seems
to match the name. (Are TYPE, TPI and TOMY part of the address,
somehow?)
...............
I expect you need logic like this; or, anyway, this is how I did it
(in SAS) the last time I had to:

A. Read a line. Classify it into one of the above categories.

B. If it's a page header line, or one of the lines of element names
for individuals, ignore it. (However, if it's a line of element
names, that may be useful as an indication that individual data will
follow.)

C. It it's the start of a "base" group, as indicated by "BASE:" being
the second token on the line, read the values from the four lines in
the group, and keep them (LEAVE statement) for future use. (What is
the meaning of the "blah" that precedes the word "BASE:"? Is it a
value that needs to be kept?)

D. If it's a "county" line ("COUNTY:" is the 4th token), read the
county (number) and description, and save for future use. Does the
"blah" that begins the line need to be saved?

E. If it's lines for an individual (probably indicated by preceding
lines of element names with underscores), read the elements as above,
except correct and fill in the things I couldn't get. Write a record
(END CASE) with the individual data plus the last preceding "base"
and "county" data.

It'd be an INPUT PROGRAM, of course.

Python? Python's probably better suited to writing parsers in than
native SPSS is. I'm not sure how you'd do the path from external file
to Python to SPSS data file. Use Python without the SPSS interface,
to pre-process the file into easier-to-recognize lines, write that
out, then read in SPSS? Or how would you do it?

-Onward, ever onward,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD