Get data type text problem v16

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Get data type text problem v16

Maguin, Eugene
I think there is a either a problem with the documentation for the Get data
/type=text command function or with the command itself when invoked from the
menu. The circumstances are, perhaps, very limited.

I encountered problems with the following examplar command. I also have a
second example.

GET DATA /TYPE=TXT/
   FILE='R:\Faculty Projects\xxxxxxxxx xxxxxx\School tttttttt
ttttttttttttt\'+
   'TitleAbstract.txt'/DELCASE=LINE/DELIMITERS="#"/ARRANGEMENT=DELIMITED/
   FIRSTCASE=2/IMPORTCASE=ALL/
   VARIABLES=RID F4.0 Year A6 Final A1 Title A171 Abstract A1783.

There are two key points.
1) My file contains fields of text strings with varying lengths.
2) I did not pre-specify the width of the string fields--because I didn't
know them. (I assume this would be equivalent to specifying simply 'A'
format in syntax, but I don't know.)

I guessing that the program reads a certain number of lines of data, but not
the whole file, and uses the result of that read operation to fix the string
field widths, which is then used to read the entire file. Obviously, not
having to pre-specify field widths is an immense advantage.

There is, of couse, a post-hoc fix, which is to accept the default field
widths and then test whether the last character of each field is blank. If
it is, I think you can be assured that no data was truncated. If it is not,
you can not be sure.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Get data type text problem v16

Oliver, Richard
It would appear that the Text Wizard scans the first 250 cases to determine the string width. The maximum string width needs to be specified in order to avoid truncating string values. If you just specify A, I think that is equivalent to A1.

The fact that it scans only the first 250 cases is not currently documented -- but will be for the next release.

One possible workaround is to specify an overly long width for all string variables and then use ALTER TYPE to adjust all strings to the longest value for each variable, as in:

alter type  all (a=amin).

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Sunday, March 29, 2009 12:19 PM
To: [hidden email]
Subject: Get data type text problem v16

I think there is a either a problem with the documentation for the Get data
/type=text command function or with the command itself when invoked from the
menu. The circumstances are, perhaps, very limited.

I encountered problems with the following examplar command. I also have a
second example.

GET DATA /TYPE=TXT/
   FILE='R:\Faculty Projects\xxxxxxxxx xxxxxx\School tttttttt
ttttttttttttt\'+
   'TitleAbstract.txt'/DELCASE=LINE/DELIMITERS="#"/ARRANGEMENT=DELIMITED/
   FIRSTCASE=2/IMPORTCASE=ALL/
   VARIABLES=RID F4.0 Year A6 Final A1 Title A171 Abstract A1783.

There are two key points.
1) My file contains fields of text strings with varying lengths.
2) I did not pre-specify the width of the string fields--because I didn't
know them. (I assume this would be equivalent to specifying simply 'A'
format in syntax, but I don't know.)

I guessing that the program reads a certain number of lines of data, but not
the whole file, and uses the result of that read operation to fix the string
field widths, which is then used to read the entire file. Obviously, not
having to pre-specify field widths is an immense advantage.

There is, of couse, a post-hoc fix, which is to accept the default field
widths and then test whether the last character of each field is blank. If
it is, I think you can be assured that no data was truncated. If it is not,
you can not be sure.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Get data type text problem v16

Maguin, Eugene
Richard,

Thanks for your reply. I certainly didn't expect anything on Sunday
afternoon!

Using Alter type would useful. Thank you for pointing that out. I had seen
the command in the reference but not looked at it.

Gene Maguin

>>It would appear that the Text Wizard scans the first 250 cases to
determine the string width. The maximum string width needs to be specified
in order to avoid truncating string values. If you just specify A, I think
that is equivalent to A1.

The fact that it scans only the first 250 cases is not currently documented
-- but will be for the next release.

One possible workaround is to specify an overly long width for all string
variables and then use ALTER TYPE to adjust all strings to the longest value
for each variable, as in:

alter type  all (a=amin).

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD