SPSSX Discussion

Problem Reading File

Classic

List

Threaded

4 messages Options

Marcos Sanches

Problem Reading File

Hello all,

I am using the code below to read a pipe delimited file. and here you have an example of a data line that is not being read correctly.

"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"...

SPSS wont read the whole piece "MARLY DARVIN "DAN" DXXXX-9778" in a single variable, as it should. I will instead consider the double quota in the middle of the string as a delimiter and split the string there and mess up everything further in the line.

My questions -

Is there a way to fix this so that SPSS will only cut the string at the pipes?

Is this a problem with SPSS or is it a problem with the data file that should not have double quotas other then the qualifiers?

Note - I considered replacing double quotes by a blank and the file would no longer have qualifiers, which I think would solve the problem, but the file is huge and this would be a time consuming task.

Follow the reading syntax:

GET DATA /TYPE = TXT

/FILE = 'P:\MRCE\01 Teams\Amy_Charles\GXI\Data\Preliminary\client2\wn_extract_enhanced.csv'

/DELCASE = LINE

/DELIMITERS = "|"

/QUALIFIER = '"'

/ARRANGEMENT = DELIMITED

/FIRSTCASE = 2

/IMPORTCASE = ALL

/VARIABLES =

X1 A37

X2 F4.2

...

Thanks a lot!

Marcos

Albert-Jan Roskam

Re: Problem Reading File

Hi,

Untested:

import spss, spssaux, csv
csv_reader - csv.reader(file("d:/mydata.csv"), sep = "|")
header = csv_reader.next()
with spss.DataStep():
dataset = spss.Dataset(name = None)
dataset.varlist.extend(header) # may be incorrect
for row in csv_reader:
dataset.cases.append(row)
spssaux.SaveDataFile("d:/mydata.sav")

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: Marcos Sanches <[hidden email]>
To: [hidden email]
Sent: Fri, September 10, 2010 2:56:33 PM
Subject: [SPSSX-L] Problem Reading File

Hello all,

I am using the code below to read a pipe delimited file. and here you have an example of a data line that is not being read correctly.

"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"...

My questions -

Is there a way to fix this so that SPSS will only cut the string at the pipes?

Is this a problem with SPSS or is it a problem with the data file that should not have double quotas other then the qualifiers?

Follow the reading syntax:

GET DATA /TYPE = TXT

/FILE = 'P:\MRCE\01 Teams\Amy_Charles\GXI\Data\Preliminary\client2\wn_extract_enhanced.csv'

/DELCASE = LINE

/DELIMITERS = "|"

/QUALIFIER = '"'

/ARRANGEMENT = DELIMITED

/FIRSTCASE = 2

/IMPORTCASE = ALL

/VARIABLES =

X1 A37

X2 F4.2

...

Thanks a lot!

Marcos

David Marso

Re: Problem Reading File

Administrator

In reply to this post by Marcos Sanches

On Fri, 10 Sep 2010 08:56:33 -0400, Marcos Sanches <[hidden email]> wrote:

>Hello all,
>
>I am using the code below to read a pipe delimited file. and here you have
>an example of a data line that is not being read correctly.
>
>"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"...
>
>SPSS wont read the whole piece "MARLY DARVIN "DAN" DXXXX-9778" in a single
>variable, as it should. I will instead consider the double quota in the
>middle of the string as a delimiter and split the string there and mess up
>everything further in the line.
>
>My questions -
>
>Is there a way to fix this so that SPSS will only cut the string at the
>pipes?
>Is this a problem with SPSS or is it a problem with the data file that
>should not have double quotas other then the qualifiers?
>
>Note - I considered replacing double quotes by a blank and the file would no
>longer have qualifiers, which I think would solve the problem, but the file
>is huge and this would be a time consuming task.
>
>Follow the reading syntax:
>
>GET DATA /TYPE = TXT
> /FILE = 'P:\MRCE\01
>Teams\Amy_Charles\GXI\Data\Preliminary\client2\wn_extract_enhanced.csv'
> /DELCASE = LINE
> /DELIMITERS = "|"
> /QUALIFIER = '"'
> /ARRANGEMENT = DELIMITED
> /FIRSTCASE = 2
> /IMPORTCASE = ALL
> /VARIABLES =
> X1 A37
>X2 F4.2
>...
>...
>Thanks a lot!
>
>Marcos
>

Here's one way to skin the cat.
You'll want to nuke the BEGIN DATA... END DATA and read external file.
You'll have to modify the string variable lengths later.
You will also want to alter the vector length from 6 to your variable count
and the A40 to your longest embedded string.
SUBSTR may need to be changed to CHAR.SUBSTR -or maybe not-???

* Another General Parser *.
* NON PiThong version ;-)
DATA LIST / X 1-255 (A).
BEGIN DATA
"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"
END DATA.

VECTOR PARSED(6, A40).
COMPUTE #0=0.
LOOP.
COMPUTE #1=INDEX(X,'|').
COMPUTE #0=#0+1.
IF #1>0 PARSED(#0)=SUBSTR(X,1,#1-1).
COMPUTE X=SUBSTR(X,#1+1).
END LOOP IF #1=0.
COMPUTE PARSED(#0)=X.
MATCH FILES / FILE * / DROP X.

LIST.
PARSED1: "XXX"
PARSED2: "MARTIN DARVIN "DAN" DXXXX-2248"
PARSED3: "0"
PARSED4: "1"
PARSED5: "Client 2"
PARSED6: "GG"

Number of cases read: 1 Number of cases listed: 1

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Art Kendall

Re: Problem Reading File

If you want to change all double quote to blank
think about something like this.
read the line as a long string.
use the REPLACE function.
write the new string to a .txt file.
read the new .txt file as pipe delimited.

REPLACE. REPLACE(a1, a2, a3[, a4]). String. In a1, instances of a2 are replaced with a3. The optional argument a4 specifies the number of occurrences to replace; if a4 is omitted, all occurrences are replaced. Arguments a1, a2, and a3 must resolve to string values (literal strings enclosed in quotes or string variables), and the optional argument a4 must resolve to a non-negative integer. For example, REPLACE("abcabc", "a", "x") returns a value of "xbcxbc" and REPLACE("abcabc", "a", "x", 1) returns a value of "xbcabc".

Art Kendall
Social Research Consultants

On 9/12/2010 9:16 AM, David Marso wrote:

On Fri, 10 Sep 2010 08:56:33 -0400, Marcos Sanches [hidden email] wrote:

Hello all,

I am using the code below to read a pipe delimited file. and here you have
an example of a data  line that is not being read correctly.

"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"...

SPSS wont read the whole piece "MARLY DARVIN "DAN" DXXXX-9778" in a single
variable, as it should. I will instead consider the double quota in the
middle of the string as a delimiter and split the string there and mess up
everything further in the line.

My questions -

Is there a way to fix this so that SPSS will only cut the string at the
pipes?
Is this a problem with SPSS or is it a problem with the data file that
should not have double quotas other then the qualifiers?

Note - I considered replacing double quotes by a blank and the file would no
longer have qualifiers, which I think would solve the problem, but the file
is huge and this would be a time consuming task.

Follow the reading syntax:

GET DATA  /TYPE = TXT
/FILE = 'P:\MRCE\01
Teams\Amy_Charles\GXI\Data\Preliminary\client2\wn_extract_enhanced.csv'
/DELCASE = LINE
/DELIMITERS = "|"
/QUALIFIER = '"'
/ARRANGEMENT = DELIMITED
/FIRSTCASE = 2
/IMPORTCASE = ALL
/VARIABLES =
X1 A37
X2 F4.2
...
...
Thanks a lot!

Marcos

Here's one way to skin the cat.
You'll want to nuke the BEGIN DATA... END DATA and read external file.
You'll have to modify the string variable lengths later.
You will also want to alter the vector length from 6 to your variable count
and the A40 to your longest embedded string.
SUBSTR may need to be changed to CHAR.SUBSTR -or maybe not-???

* Another General Parser *.
* NON PiThong version ;-)
DATA LIST / X 1-255 (A).
BEGIN DATA
"XXX"|"MARTIN DARVIN "DAN" DXXXX-2248"|"0"|"1"|"Client 2"|"GG"
END DATA.

VECTOR PARSED(6, A40).
COMPUTE #0=0.
LOOP.
COMPUTE #1=INDEX(X,'|').
COMPUTE #0=#0+1.
IF #1>0 PARSED(#0)=SUBSTR(X,1,#1-1).
COMPUTE X=SUBSTR(X,#1+1).
END LOOP IF #1=0.
COMPUTE PARSED(#0)=X.
MATCH FILES / FILE * / DROP X.

LIST.
PARSED1: "XXX"
 PARSED2: "MARTIN DARVIN "DAN" DXXXX-2248"
 PARSED3: "0"
 PARSED4: "1"
 PARSED5: "Client 2"
 PARSED6: "GG"


Number of cases read:  1    Number of cases listed:  1

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants