File import problems

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

File import problems

Robert L
Hi all!

I have got this text file with ; as delimiter. Fairly big, around 240 000 rows and 145 columns. The first 9 columns are numerical, the remaining 5 are string columns. When I use the Open data procedure, the string variables are not recognized at all, they are set to empty numerical variables. It should be possible to import a mix of numerical and string variables, shouldn't it? Anything else I have missed?

This is the first row of the data file:

20123817621;193911111111;2;252301;065012;411;20120102;1;0;;;;;

The problem is solved by use of other software, but is rather disturbing to see that it doesn't work as I thought it would. Hints and explanations are most welcome.

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: File import problems

eharrigan
Hi Robert,

It is possible for SPSS to read in a mix of string and numeric variables.  However, sometimes SPSS is not very good at guessing what type to use when reading in data, especially if the first row of data is blank (in my experience).  Step 5 of 6 of the text import wizard allows you to change the type and length of variables; you can review all your imported variables there to make sure they are reading in correctly!

Emma


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Robert Lundqvist
Sent: Thursday, February 14, 2013 11:21 AM
To: [hidden email]
Subject: File import problems

Hi all!

I have got this text file with ; as delimiter. Fairly big, around 240 000 rows and 145 columns. The first 9 columns are numerical, the remaining 5 are string columns. When I use the Open data procedure, the string variables are not recognized at all, they are set to empty numerical variables. It should be possible to import a mix of numerical and string variables, shouldn't it? Anything else I have missed?

This is the first row of the data file:

20123817621;193911111111;2;252301;065012;411;20120102;1;0;;;;;

The problem is solved by use of other software, but is rather disturbing to see that it doesn't work as I thought it would. Hints and explanations are most welcome.

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: File import problems

David Marso
Administrator
In reply to this post by Robert L
"It should be possible to import a mix of numerical and string variables, shouldn't it?"
Yes!  and it is.
"Anything else I have missed?"
Maybe paste your syntax so we can see what you are actually running!
--
Should look something like:
--
GET DATA  /TYPE = TXT
 /FILE = 'C:\Users\David Marso\Desktop\SPSSJUNK\Sampledata.txt'
 /DELCASE = LINE
 /DELIMITERS = ";"
 /ARRANGEMENT = DELIMITED
 /FIRSTCASE = 1
 /IMPORTCASE = ALL
 /VARIABLES =
 V1 F11.2
 V2 F12.2
 V3 F1.0
 V4 F6.2
 V5 F6.2
 V6 F3.2
 V7 F8.2
 V8 F1.0
 V9 F1.0
 V10 A1
 V11 A1
 V12 A1
 V13 A1
 etc......
.

Robert Lundqvist-3 wrote
Hi all!

I have got this text file with ; as delimiter. Fairly big, around 240 000 rows and 145 columns. The first 9 columns are numerical, the remaining 5 are string columns. When I use the Open data procedure, the string variables are not recognized at all, they are set to empty numerical variables. It should be possible to import a mix of numerical and string variables, shouldn't it? Anything else I have missed?

This is the first row of the data file:

20123817621;193911111111;2;252301;065012;411;20120102;1;0;;;;;

The problem is solved by use of other software, but is rather disturbing to see that it doesn't work as I thought it would. Hints and explanations are most welcome.

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: File import problems

Robert L
In reply to this post by Robert L
Combining Emma's and David's replies: it seems as if the problem lies how far down in the file the first strings are encountered, not whether there is a row with variable names or not. When I add such a row and use the file import wizard, the following syntax is generated:

GET DATA
  /TYPE=TXT
  /FILE= "ov2012.d"
  /DELCASE=LINE
  /DELIMITERS=";"
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /IMPORTCASE=ALL
  /VARIABLES=
  v1 F11.0
  v2 F12.0
  v3 F1.0
  v4 F6.0
  v5 F6.0
  v6 F3.0
  v7 F8.0
  v8 F1.0
  v9 F1.0
  v10 F1.0
  v11 F1.0
  v12 F1.0
  v13 F1.0
  v14 F1.0.
CACHE.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.

This means that the strings in column 10 to 14 are not recognized as such, neither type nor width. And I would prefer to avoid setting the length since I don't know exaktly how long the strings are. (Well, I do in this case, but it could be another length next time.)

I have experimented a bit, and the only way I get it to work is when there is a row with strings and not only empty cells not too far from the beginning of the file. A row with string in row 70 works, having the complete row with strings two hundred rows further down does not. This is not the way SPSS should work.

Could it be machine dependent? The system seems to skip at least one step in the import process when the necessary information is found too far down in the file.

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: File import problems

Jon K Peck
The Wizard is designed to look no more than 200 rows down for performance reasons.  People read text files with 10's of thousands of lines, so reading the whole file in setting up the wizard is impractical.  The limit could be make larger, but that just moves the problem farther down.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Robert Lundqvist <[hidden email]>
To:        [hidden email],
Date:        02/18/2013 12:31 AM
Subject:        Re: [SPSSX-L] File import problems
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Combining Emma's and David's replies: it seems as if the problem lies how far down in the file the first strings are encountered, not whether there is a row with variable names or not. When I add such a row and use the file import wizard, the following syntax is generated:

GET DATA
 /TYPE=TXT
 /FILE= "ov2012.d"
 /DELCASE=LINE
 /DELIMITERS=";"
 /ARRANGEMENT=DELIMITED
 /FIRSTCASE=2
 /IMPORTCASE=ALL
 /VARIABLES=
 v1 F11.0
 v2 F12.0
 v3 F1.0
 v4 F6.0
 v5 F6.0
 v6 F3.0
 v7 F8.0
 v8 F1.0
 v9 F1.0
 v10 F1.0
 v11 F1.0
 v12 F1.0
 v13 F1.0
 v14 F1.0.
CACHE.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.

This means that the strings in column 10 to 14 are not recognized as such, neither type nor width. And I would prefer to avoid setting the length since I don't know exaktly how long the strings are. (Well, I do in this case, but it could be another length next time.)

I have experimented a bit, and the only way I get it to work is when there is a row with strings and not only empty cells not too far from the beginning of the file. A row with string in row 70 works, having the complete row with strings two hundred rows further down does not. This is not the way SPSS should work.

Could it be machine dependent? The system seems to skip at least one step in the import process when the necessary information is found too far down in the file.

Robert

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD