Hi all!
I have got this text file with ; as delimiter. Fairly big, around 240 000 rows and 145 columns. The first 9 columns are numerical, the remaining 5 are string columns. When I use the Open data procedure, the string variables are not recognized at all, they are set to empty numerical variables. It should be possible to import a mix of numerical and string variables, shouldn't it? Anything else I have missed? This is the first row of the data file: 20123817621;193911111111;2;252301;065012;411;20120102;1;0;;;;; The problem is solved by use of other software, but is rather disturbing to see that it doesn't work as I thought it would. Hints and explanations are most welcome. Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
|
Hi Robert,
It is possible for SPSS to read in a mix of string and numeric variables. However, sometimes SPSS is not very good at guessing what type to use when reading in data, especially if the first row of data is blank (in my experience). Step 5 of 6 of the text import wizard allows you to change the type and length of variables; you can review all your imported variables there to make sure they are reading in correctly! Emma -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Robert Lundqvist Sent: Thursday, February 14, 2013 11:21 AM To: [hidden email] Subject: File import problems Hi all! I have got this text file with ; as delimiter. Fairly big, around 240 000 rows and 145 columns. The first 9 columns are numerical, the remaining 5 are string columns. When I use the Open data procedure, the string variables are not recognized at all, they are set to empty numerical variables. It should be possible to import a mix of numerical and string variables, shouldn't it? Anything else I have missed? This is the first row of the data file: 20123817621;193911111111;2;252301;065012;411;20120102;1;0;;;;; The problem is solved by use of other software, but is rather disturbing to see that it doesn't work as I thought it would. Hints and explanations are most welcome. Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Robert L
"It should be possible to import a mix of numerical and string variables, shouldn't it?"
Yes! and it is. "Anything else I have missed?" Maybe paste your syntax so we can see what you are actually running! -- Should look something like: -- GET DATA /TYPE = TXT /FILE = 'C:\Users\David Marso\Desktop\SPSSJUNK\Sampledata.txt' /DELCASE = LINE /DELIMITERS = ";" /ARRANGEMENT = DELIMITED /FIRSTCASE = 1 /IMPORTCASE = ALL /VARIABLES = V1 F11.2 V2 F12.2 V3 F1.0 V4 F6.2 V5 F6.2 V6 F3.2 V7 F8.2 V8 F1.0 V9 F1.0 V10 A1 V11 A1 V12 A1 V13 A1 etc...... .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Robert L
Combining Emma's and David's replies: it seems as if the problem lies how far down in the file the first strings are encountered, not whether there is a row with variable names or not. When I add such a row and use the file import wizard, the following syntax is generated:
GET DATA /TYPE=TXT /FILE= "ov2012.d" /DELCASE=LINE /DELIMITERS=";" /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES= v1 F11.0 v2 F12.0 v3 F1.0 v4 F6.0 v5 F6.0 v6 F3.0 v7 F8.0 v8 F1.0 v9 F1.0 v10 F1.0 v11 F1.0 v12 F1.0 v13 F1.0 v14 F1.0. CACHE. EXECUTE. DATASET NAME DataSet1 WINDOW=FRONT. This means that the strings in column 10 to 14 are not recognized as such, neither type nor width. And I would prefer to avoid setting the length since I don't know exaktly how long the strings are. (Well, I do in this case, but it could be another length next time.) I have experimented a bit, and the only way I get it to work is when there is a row with strings and not only empty cells not too far from the beginning of the file. A row with string in row 70 works, having the complete row with strings two hundred rows further down does not. This is not the way SPSS should work. Could it be machine dependent? The system seems to skip at least one step in the import process when the necessary information is found too far down in the file. Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
|
The Wizard is designed to look no more
than 200 rows down for performance reasons. People read text files
with 10's of thousands of lines, so reading the whole file in setting up
the wizard is impractical. The limit could be make larger, but that
just moves the problem farther down.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Robert Lundqvist <[hidden email]> To: [hidden email], Date: 02/18/2013 12:31 AM Subject: Re: [SPSSX-L] File import problems Sent by: "SPSSX(r) Discussion" <[hidden email]> Combining Emma's and David's replies: it seems as if the problem lies how far down in the file the first strings are encountered, not whether there is a row with variable names or not. When I add such a row and use the file import wizard, the following syntax is generated: GET DATA /TYPE=TXT /FILE= "ov2012.d" /DELCASE=LINE /DELIMITERS=";" /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES= v1 F11.0 v2 F12.0 v3 F1.0 v4 F6.0 v5 F6.0 v6 F3.0 v7 F8.0 v8 F1.0 v9 F1.0 v10 F1.0 v11 F1.0 v12 F1.0 v13 F1.0 v14 F1.0. CACHE. EXECUTE. DATASET NAME DataSet1 WINDOW=FRONT. This means that the strings in column 10 to 14 are not recognized as such, neither type nor width. And I would prefer to avoid setting the length since I don't know exaktly how long the strings are. (Well, I do in this case, but it could be another length next time.) I have experimented a bit, and the only way I get it to work is when there is a row with strings and not only empty cells not too far from the beginning of the file. A row with string in row 70 works, having the complete row with strings two hundred rows further down does not. This is not the way SPSS should work. Could it be machine dependent? The system seems to skip at least one step in the import process when the necessary information is found too far down in the file. Robert ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |