loading TXT file where 3 OR 4 lines represent one record

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

loading TXT file where 3 OR 4 lines represent one record

Tibor Toth

Greetings,

 

I have a TXT file that I need to load into SPSS (over 4,000 lines).  Sometimes 3 lines sometimes 4 lines represent one record. (See below that Anna Smith has no line for ADD2).  Could someone help me write an INPUT program/data list statement to load this data into SPSS?

 

 

Peter Young

ADD1: Long street 1

ADD2: Suite2

CITY: Chicago

Steven Old

ADD1: Park Place 7

ADD2: Suite5

CITY: New York

Anna Smith

ADD1: Main Street 7

CITY: New York

Jessica Martin

ADD1: Townsend road

ADD2: Suite8

CITY: Newark

 

And I need to load  it into SPSS so that I end up with

 

NAME                   ADDRESS1                           ADDRESS2           CITY

Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY: Chicago

Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY: New York

Anna Smith         ADD1: Main Street 7                                       CITY: New York

Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark

 

Thanks in advance,

Tibor

 

 

Tibor Tóth, Ph.D.

 

Center for Applied Demography & Survey Research

University of Delaware

285C Graham Hall

Newark, DE 19716

phone: (302)831-3320

e-mail: [hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

David Marso
Administrator
Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
 /BREAK ID
 / Name Address1 Address2 City=
 MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.


Tibor Toth wrote
Greetings,



I have a TXT file that I need to load into SPSS (over 4,000 lines).
Sometimes 3 lines sometimes 4 lines represent one record. (See below that
Anna Smith has no line for ADD2).  Could someone help me write an INPUT
program/data list statement to load this data into SPSS?





Peter Young

ADD1: Long street 1

ADD2: Suite2

CITY: Chicago

Steven Old

ADD1: Park Place 7

ADD2: Suite5

CITY: New York

Anna Smith

ADD1: Main Street 7

CITY: New York

Jessica Martin

ADD1: Townsend road

ADD2: Suite8

CITY: Newark



And I need to load  it into SPSS so that I end up with



NAME                   ADDRESS1
ADDRESS2           CITY

Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY: Chicago

Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY: New
York

Anna Smith         ADD1: Main Street
7                                       CITY: New York

Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark



Thanks in advance,

Tibor





Tibor Tóth, Ph.D.



Center for Applied Demography & Survey Research

University of Delaware

285C Graham Hall

Newark, DE 19716

phone: (302)831-3320

e-mail: [hidden email]
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

Jon K Peck
Because INPUT PROGRAMs have special features designed to handle this sort of data, and using an INPUT PROGRAM here would make the logic clearer.  In particular, REREAD and END CASE make handling this easy.

I don't have time to write the program right now - maybe later.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        05/15/2014 02:51 PM
Subject:        Re: [SPSSX-L] loading TXT file where 3 OR 4 lines represent one              record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
/BREAK ID
/ Name Address1 Address2 City=
MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.



Tibor Toth wrote
> Greetings,
>
>
>
> I have a TXT file that I need to load into SPSS (over 4,000 lines).
> Sometimes 3 lines sometimes 4 lines represent one record. (See below that
> Anna Smith has no line for ADD2).  Could someone help me write an INPUT
> program/data list statement to load this data into SPSS?
>
>
>
>
>
> Peter Young
>
> ADD1: Long street 1
>
> ADD2: Suite2
>
> CITY: Chicago
>
> Steven Old
>
> ADD1: Park Place 7
>
> ADD2: Suite5
>
> CITY: New York
>
> Anna Smith
>
> ADD1: Main Street 7
>
> CITY: New York
>
> Jessica Martin
>
> ADD1: Townsend road
>
> ADD2: Suite8
>
> CITY: Newark
>
>
>
> And I need to load  it into SPSS so that I end up with
>
>
>
> NAME                   ADDRESS1
> ADDRESS2           CITY
>
> Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY:
> Chicago
>
> Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY:
> New
> York
>
> Anna Smith         ADD1: Main Street
> 7                                       CITY: New York
>
> Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
>
>
>
> Thanks in advance,
>
> Tibor
>
>
>
>
>
> Tibor Tóth, Ph.D.
>
>
>
> Center for Applied Demography & Survey Research
>
> University of Delaware
>
> 285C Graham Hall
>
> Newark, DE 19716
>
> phone: (302)831-3320
>
> e-mail:

> tibi@





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/loading-TXT-file-where-3-OR-4-lines-represent-one-record-tp5726089p5726092.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

Bruce Weaver
Administrator
In reply to this post by David Marso
My first thought was to use "FILE TYPE mixed file" (as in the examples shown here); but that would work a whole lot better if the lines showing the names started with "NAME:".  So in the end, I was thinking along the same lines as David.  I suggest the following small changes, though.


STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=CHAR.SUBSTR(txt,7).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=CHAR.SUBSTR(txt,7).
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=CHAR.SUBSTR(txt,7).


And later, after the AGGREGATE and before listing the results...

DATASET ACTIVATE agg.
ALTER TYPE Name to CITY(amin).
FORMATS ID(F5.0).


This makes the output a bit tidier.  E.g.,

   ID Name           Address1      Address2 City
 
    1 Peter Young    Long street 1 Suite2   Chicago
    2 Steven Old     Park Place 7  Suite5   New York
    3 Anna Smith     Main Street 7          New York
    4 Jessica Martin Townsend road Suite8   Newark
 
Number of cases read:  4    Number of cases listed:  4

HTH.


David Marso wrote
Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
 /BREAK ID
 / Name Address1 Address2 City=
 MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.


Tibor Toth wrote
Greetings,



I have a TXT file that I need to load into SPSS (over 4,000 lines).
Sometimes 3 lines sometimes 4 lines represent one record. (See below that
Anna Smith has no line for ADD2).  Could someone help me write an INPUT
program/data list statement to load this data into SPSS?





Peter Young

ADD1: Long street 1

ADD2: Suite2

CITY: Chicago

Steven Old

ADD1: Park Place 7

ADD2: Suite5

CITY: New York

Anna Smith

ADD1: Main Street 7

CITY: New York

Jessica Martin

ADD1: Townsend road

ADD2: Suite8

CITY: Newark



And I need to load  it into SPSS so that I end up with



NAME                   ADDRESS1
ADDRESS2           CITY

Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY: Chicago

Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY: New
York

Anna Smith         ADD1: Main Street
7                                       CITY: New York

Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark



Thanks in advance,

Tibor





Tibor Tóth, Ph.D.



Center for Applied Demography & Survey Research

University of Delaware

285C Graham Hall

Newark, DE 19716

phone: (302)831-3320

e-mail: [hidden email]
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

Rick Oliver-3
In reply to this post by David Marso
In a similar vein:

data list list (";") /tempvar (a100).
begin data
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
end data.
string name address1 address2 city(a100).
if char.index(tempvar, "ADD1:")>0 address1=tempvar.
if char.index(tempvar, "ADD2:")>0 address2=tempvar.
if char.index(tempvar, "CITY:")>0 city=tempvar.
if char.index(tempvar, "ADD")=0 and char.index(tempvar, "CITY")=0
   name=tempvar.
execute.
if char.length(name)=0 name=lag(name).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE='aggfile'
  /BREAK=name
  /address1=MAX(address1)
  /address2=MAX(address2)
  /city=MAX(city).


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        05/15/2014 03:55 PM
Subject:        Re: loading TXT file where 3 OR 4 lines represent one record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
/BREAK ID
/ Name Address1 Address2 City=
MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.



Tibor Toth wrote
> Greetings,
>
>
>
> I have a TXT file that I need to load into SPSS (over 4,000 lines).
> Sometimes 3 lines sometimes 4 lines represent one record. (See below that
> Anna Smith has no line for ADD2).  Could someone help me write an INPUT
> program/data list statement to load this data into SPSS?
>
>
>
>
>
> Peter Young
>
> ADD1: Long street 1
>
> ADD2: Suite2
>
> CITY: Chicago
>
> Steven Old
>
> ADD1: Park Place 7
>
> ADD2: Suite5
>
> CITY: New York
>
> Anna Smith
>
> ADD1: Main Street 7
>
> CITY: New York
>
> Jessica Martin
>
> ADD1: Townsend road
>
> ADD2: Suite8
>
> CITY: Newark
>
>
>
> And I need to load  it into SPSS so that I end up with
>
>
>
> NAME                   ADDRESS1
> ADDRESS2           CITY
>
> Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY:
> Chicago
>
> Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY:
> New
> York
>
> Anna Smith         ADD1: Main Street
> 7                                       CITY: New York
>
> Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
>
>
>
> Thanks in advance,
>
> Tibor
>
>
>
>
>
> Tibor Tóth, Ph.D.
>
>
>
> Center for Applied Demography & Survey Research
>
> University of Delaware
>
> 285C Graham Hall
>
> Newark, DE 19716
>
> phone: (302)831-3320
>
> e-mail:

> tibi@





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/loading-TXT-file-where-3-OR-4-lines-represent-one-record-tp5726089p5726092.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

David Marso
Administrator
I don't like that version since name might not be unique in the file.
That's why I created the ID variable to use as a BREAK.
Rick Oliver wrote
In a similar vein:

data list list (";") /tempvar (a100).
begin data
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
end data.
string name address1 address2 city(a100).
if char.index(tempvar, "ADD1:")>0 address1=tempvar.
if char.index(tempvar, "ADD2:")>0 address2=tempvar.
if char.index(tempvar, "CITY:")>0 city=tempvar.
if char.index(tempvar, "ADD")=0 and char.index(tempvar, "CITY")=0
   name=tempvar.
execute.
if char.length(name)=0 name=lag(name).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE='aggfile'
  /BREAK=name
  /address1=MAX(address1)
  /address2=MAX(address2)
  /city=MAX(city).


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]



From:   David Marso <[hidden email]>
To:     [hidden email],
Date:   05/15/2014 03:55 PM
Subject:        Re: loading TXT file where 3 OR 4 lines represent one
record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
 /BREAK ID
 / Name Address1 Address2 City=
 MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.



Tibor Toth wrote
> Greetings,
>
>
>
> I have a TXT file that I need to load into SPSS (over 4,000 lines).
> Sometimes 3 lines sometimes 4 lines represent one record. (See below
that
> Anna Smith has no line for ADD2).  Could someone help me write an INPUT
> program/data list statement to load this data into SPSS?
>
>
>
>
>
> Peter Young
>
> ADD1: Long street 1
>
> ADD2: Suite2
>
> CITY: Chicago
>
> Steven Old
>
> ADD1: Park Place 7
>
> ADD2: Suite5
>
> CITY: New York
>
> Anna Smith
>
> ADD1: Main Street 7
>
> CITY: New York
>
> Jessica Martin
>
> ADD1: Townsend road
>
> ADD2: Suite8
>
> CITY: Newark
>
>
>
> And I need to load  it into SPSS so that I end up with
>
>
>
> NAME                   ADDRESS1
> ADDRESS2           CITY
>
> Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY:
> Chicago
>
> Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY:
> New
> York
>
> Anna Smith         ADD1: Main Street
> 7                                       CITY: New York
>
> Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
>
>
>
> Thanks in advance,
>
> Tibor
>
>
>
>
>
> Tibor Tóth, Ph.D.
>
>
>
> Center for Applied Demography & Survey Research
>
> University of Delaware
>
> 285C Graham Hall
>
> Newark, DE 19716
>
> phone: (302)831-3320
>
> e-mail:

> tibi@





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/loading-TXT-file-where-3-OR-4-lines-represent-one-record-tp5726089p5726092.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

David Marso
Administrator
In reply to this post by Bruce Weaver
Incorporating Bruce's suggestions and simplifying a few elements results in

STRING Name Address1 Address2 City (A50).

DO REPEAT rectype= "ADD1" "ADD2" "CITY" /var=Address1 Address2 City.
+  IF CHAR.SUBSTR(txt,1,4) EQ rectype var=CHAR.SUBSTR(txt,7).
END REPEAT.

COMPUTE ID=MAX(LAG(ID),SUM(1,LAG(ID)*(CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY"))).

IF NOT(ANY(CHAR.SUBSTR(txt,1,4),"ADD1","ADD2","CITY")) Name=txt.

ALTER TYPE Name Address1 Address2 City (AMIN).

DATASET DECLARE agg.
AGGREGATE OUTFILE agg /BREAK ID / Name Address1 Address2 City= MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.
LIST.

Bruce Weaver wrote
My first thought was to use "FILE TYPE mixed file" (as in the examples shown here); but that would work a whole lot better if the lines showing the names started with "NAME:".  So in the end, I was thinking along the same lines as David.  I suggest the following small changes, though.


STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=CHAR.SUBSTR(txt,7).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=CHAR.SUBSTR(txt,7).
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=CHAR.SUBSTR(txt,7).


And later, after the AGGREGATE and before listing the results...

DATASET ACTIVATE agg.
ALTER TYPE Name to CITY(amin).
FORMATS ID(F5.0).


This makes the output a bit tidier.  E.g.,

   ID Name           Address1      Address2 City
 
    1 Peter Young    Long street 1 Suite2   Chicago
    2 Steven Old     Park Place 7  Suite5   New York
    3 Anna Smith     Main Street 7          New York
    4 Jessica Martin Townsend road Suite8   Newark
 
Number of cases read:  4    Number of cases listed:  4

HTH.


David Marso wrote
Why not keep it simple?
DATA LIST / txt (A50).
BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.

STRING Name Address1 Address2 City (A50).
IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt.
IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt.
COMPUTE ID=1.
DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1).
COMPUTE Name=txt.
COMPUTE ID=LAG(ID)+1.
ELSE .
IF ($CASENUM GT 1) ID=LAG(ID).
ELSE.
IF ($CASENUM EQ 1) Name=txt.
END IF.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg
 /BREAK ID
 / Name Address1 Address2 City=
 MAX( Name Address1 Address2 City).
DATASET ACTIVATE agg.

LIST.


Tibor Toth wrote
Greetings,



I have a TXT file that I need to load into SPSS (over 4,000 lines).
Sometimes 3 lines sometimes 4 lines represent one record. (See below that
Anna Smith has no line for ADD2).  Could someone help me write an INPUT
program/data list statement to load this data into SPSS?





Peter Young

ADD1: Long street 1

ADD2: Suite2

CITY: Chicago

Steven Old

ADD1: Park Place 7

ADD2: Suite5

CITY: New York

Anna Smith

ADD1: Main Street 7

CITY: New York

Jessica Martin

ADD1: Townsend road

ADD2: Suite8

CITY: Newark



And I need to load  it into SPSS so that I end up with



NAME                   ADDRESS1
ADDRESS2           CITY

Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY: Chicago

Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY: New
York

Anna Smith         ADD1: Main Street
7                                       CITY: New York

Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark



Thanks in advance,

Tibor





Tibor Tóth, Ph.D.



Center for Applied Demography & Survey Research

University of Delaware

285C Graham Hall

Newark, DE 19716

phone: (302)831-3320

e-mail: [hidden email]
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

Jon K Peck
In reply to this post by Tibor Toth
Here is an input program solution.  It works by just reading each record according to its declared type while if no type matches, it is the name.
This mechanism is much more powerful than used here - each record type could have an entirely different set of variables, but this approach should make it very clear exactly what input is expected.

input program.
data list file="c:/temp/data.txt" / prefix(a5).
do if prefix eq "ADD1:".
reread column=6.
data list /add1(a50).
else if prefix eq "ADD2:".
reread column=6.
data list /add2(a50).
else if prefix eq "CITY:".
reread column=6.
data list / city(a50).
end case.
else.
reread column=1.
data list /name(a50).
end if.
end input program.
list.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Tibor Toth <[hidden email]>
To:        [hidden email],
Date:        05/15/2014 02:16 PM
Subject:        [SPSSX-L] loading TXT file where 3 OR 4 lines represent one record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Greetings,
 
I have a TXT file that I need to load into SPSS (over 4,000 lines).  Sometimes 3 lines sometimes 4 lines represent one record. (See below that Anna Smith has no line for ADD2).  Could someone help me write an INPUT program/data list statement to load this data into SPSS?
 
 
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
 
And I need to load  it into SPSS so that I end up with
 
NAME                   ADDRESS1                           ADDRESS2           CITY
Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY: Chicago
Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY: New York
Anna Smith         ADD1: Main Street 7                                       CITY: New York
Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
 
Thanks in advance,
Tibor
 
 
Tibor Tóth, Ph.D.
 
Center for Applied Demography & Survey Research
University of Delaware
285C Graham Hall
Newark, DE 19716
phone: (302)831-3320
e-mail: tibi@...
 
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

Bruce Weaver
Administrator
Very nice, Jon.  I was not aware of REREAD--it looks quite useful.  (I'd still add the ALTER TYPE with AMIN bit at the end though.)  ;-)

David, great use of DO-REPEAT in your latest post too.  It's much tidier than the original version.


Jon K Peck wrote
Here is an input program solution.  It works by just reading each record
according to its declared type while if no type matches, it is the name.
This mechanism is much more powerful than used here - each record type
could have an entirely different set of variables, but this approach
should make it very clear exactly what input is expected.

input program.
data list file="c:/temp/data.txt" / prefix(a5).
do if prefix eq "ADD1:".
reread column=6.
data list /add1(a50).
else if prefix eq "ADD2:".
reread column=6.
data list /add2(a50).
else if prefix eq "CITY:".
reread column=6.
data list / city(a50).
end case.
else.
reread column=1.
data list /name(a50).
end if.
end input program.
list.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   Tibor Toth <[hidden email]>
To:     [hidden email],
Date:   05/15/2014 02:16 PM
Subject:        [SPSSX-L] loading TXT file where 3 OR 4 lines represent
one record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Greetings,
 
I have a TXT file that I need to load into SPSS (over 4,000 lines).  
Sometimes 3 lines sometimes 4 lines represent one record. (See below that
Anna Smith has no line for ADD2).  Could someone help me write an INPUT
program/data list statement to load this data into SPSS?
 
 
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
 
And I need to load  it into SPSS so that I end up with
 
NAME                   ADDRESS1                          
ADDRESS2           CITY
Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY:
Chicago
Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY:
New York
Anna Smith         ADD1: Main Street
7                                       CITY: New York
Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
 
Thanks in advance,
Tibor
 
 
Tibor Tóth, Ph.D.
 
Center for Applied Demography & Survey Research
University of Delaware
285C Graham Hall
Newark, DE 19716
phone: (302)831-3320
e-mail: [hidden email]
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: loading TXT file where 3 OR 4 lines represent one record

David Marso
Administrator
This post was updated on .
In reply to this post by Jon K Peck
<modified for readability 05/16/2014 22:30>
Here is a rewrite of Jon's program using a VECTOR approach with record types used to calculate indexes into the vector .  
It seems to me that this is a rather fragile data structure in that things will go into the crapper if a case is missing a CITY: .  
This observation also applies to the code I posted yesterday.

INPUT PROGRAM.
STRING name address1 address2 city (A50).
VECTOR stuff=address1 TO city.

DATA LIST  / #prefix(a5).
COMPUTE #vtype=CHAR.INDEX("ADD1:ADD2:CITY:",#prefix).

DO IF #vtype=0.
+  REREAD.
+  DATA LIST /name (A50).
ELSE.
+  REREAD column=7.
+  DATA LIST / #stuff (A50).
+  COMPUTE stuff((#vtype + 4)/5 )=#stuff.
END IF.

DO IF #prefix="CITY:".
+  END CASE.
END IF .
END INPUT PROGRAM.

ALTER TYPE name address1 address2 city (AMIN)

BEGIN DATA
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
END DATA.
LIST.
Jon K Peck wrote
Here is an input program solution.  It works by just reading each record
according to its declared type while if no type matches, it is the name.
This mechanism is much more powerful than used here - each record type
could have an entirely different set of variables, but this approach
should make it very clear exactly what input is expected.

input program.
data list file="c:/temp/data.txt" / prefix(a5).
do if prefix eq "ADD1:".
reread column=6.
data list /add1(a50).
else if prefix eq "ADD2:".
reread column=6.
data list /add2(a50).
else if prefix eq "CITY:".
reread column=6.
data list / city(a50).
end case.
else.
reread column=1.
data list /name(a50).
end if.
end input program.
list.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   Tibor Toth <[hidden email]>
To:     [hidden email],
Date:   05/15/2014 02:16 PM
Subject:        [SPSSX-L] loading TXT file where 3 OR 4 lines represent
one record
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Greetings,
 
I have a TXT file that I need to load into SPSS (over 4,000 lines).  
Sometimes 3 lines sometimes 4 lines represent one record. (See below that
Anna Smith has no line for ADD2).  Could someone help me write an INPUT
program/data list statement to load this data into SPSS?
 
 
Peter Young
ADD1: Long street 1
ADD2: Suite2
CITY: Chicago
Steven Old
ADD1: Park Place 7
ADD2: Suite5
CITY: New York
Anna Smith
ADD1: Main Street 7
CITY: New York
Jessica Martin
ADD1: Townsend road
ADD2: Suite8
CITY: Newark
 
And I need to load  it into SPSS so that I end up with
 
NAME                   ADDRESS1                          
ADDRESS2           CITY
Peter Young       ADD1: Long street 1        ADD2: Suite2      CITY:
Chicago
Steven Old          ADD1: Park Place 7          ADD2: Suite5      CITY:
New York
Anna Smith         ADD1: Main Street
7                                       CITY: New York
Jessica Martin    ADD1: Townsend road   ADD2: Suite8      CITY: Newark
 
Thanks in advance,
Tibor
 
 
Tibor Tóth, Ph.D.
 
Center for Applied Demography & Survey Research
University of Delaware
285C Graham Hall
Newark, DE 19716
phone: (302)831-3320
e-mail: [hidden email]
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"