Greetings,
I have a TXT file that I need to load into SPSS (over 4,000 lines). Sometimes 3 lines sometimes 4 lines represent one record. (See below that Anna Smith has no line for ADD2). Could someone help me write an INPUT program/data list statement to load this data into SPSS?
Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark
And I need to load it into SPSS so that I end up with
NAME ADDRESS1 ADDRESS2 CITY Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark
Thanks in advance, Tibor
Tibor Tóth, Ph.D.
Center for Applied Demography & Survey Research University of Delaware 285C Graham Hall Newark, DE 19716 phone: (302)831-3320 e-mail: [hidden email]
|
Administrator
|
Why not keep it simple?
DATA LIST / txt (A50). BEGIN DATA Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark END DATA. STRING Name Address1 Address2 City (A50). IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt. IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt. IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt. COMPUTE ID=1. DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1). COMPUTE Name=txt. COMPUTE ID=LAG(ID)+1. ELSE . IF ($CASENUM GT 1) ID=LAG(ID). ELSE. IF ($CASENUM EQ 1) Name=txt. END IF. DATASET DECLARE agg. AGGREGATE OUTFILE agg /BREAK ID / Name Address1 Address2 City= MAX( Name Address1 Address2 City). DATASET ACTIVATE agg. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Because INPUT PROGRAMs have special features
designed to handle this sort of data, and using an INPUT PROGRAM here would
make the logic clearer. In particular, REREAD and END CASE make handling
this easy.
I don't have time to write the program right now - maybe later. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email], Date: 05/15/2014 02:51 PM Subject: Re: [SPSSX-L] loading TXT file where 3 OR 4 lines represent one record Sent by: "SPSSX(r) Discussion" <[hidden email]> Why not keep it simple? DATA LIST / txt (A50). BEGIN DATA Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark END DATA. STRING Name Address1 Address2 City (A50). IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt. IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt. IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt. COMPUTE ID=1. DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1). COMPUTE Name=txt. COMPUTE ID=LAG(ID)+1. ELSE . IF ($CASENUM GT 1) ID=LAG(ID). ELSE. IF ($CASENUM EQ 1) Name=txt. END IF. DATASET DECLARE agg. AGGREGATE OUTFILE agg /BREAK ID / Name Address1 Address2 City= MAX( Name Address1 Address2 City). DATASET ACTIVATE agg. LIST. Tibor Toth wrote > Greetings, > > > > I have a TXT file that I need to load into SPSS (over 4,000 lines). > Sometimes 3 lines sometimes 4 lines represent one record. (See below that > Anna Smith has no line for ADD2). Could someone help me write an INPUT > program/data list statement to load this data into SPSS? > > > > > > Peter Young > > ADD1: Long street 1 > > ADD2: Suite2 > > CITY: Chicago > > Steven Old > > ADD1: Park Place 7 > > ADD2: Suite5 > > CITY: New York > > Anna Smith > > ADD1: Main Street 7 > > CITY: New York > > Jessica Martin > > ADD1: Townsend road > > ADD2: Suite8 > > CITY: Newark > > > > And I need to load it into SPSS so that I end up with > > > > NAME ADDRESS1 > ADDRESS2 CITY > > Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: > Chicago > > Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: > New > York > > Anna Smith ADD1: Main Street > 7 CITY: New York > > Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark > > > > Thanks in advance, > > Tibor > > > > > > Tibor Tóth, Ph.D. > > > > Center for Applied Demography & Survey Research > > University of Delaware > > 285C Graham Hall > > Newark, DE 19716 > > phone: (302)831-3320 > > e-mail: > tibi@ ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loading-TXT-file-where-3-OR-4-lines-represent-one-record-tp5726089p5726092.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by David Marso
My first thought was to use "FILE TYPE mixed file" (as in the examples shown here); but that would work a whole lot better if the lines showing the names started with "NAME:". So in the end, I was thinking along the same lines as David. I suggest the following small changes, though.
STRING Name Address1 Address2 City (A50). IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=CHAR.SUBSTR(txt,7). IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=CHAR.SUBSTR(txt,7). IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=CHAR.SUBSTR(txt,7). And later, after the AGGREGATE and before listing the results... DATASET ACTIVATE agg. ALTER TYPE Name to CITY(amin). FORMATS ID(F5.0). This makes the output a bit tidier. E.g., ID Name Address1 Address2 City 1 Peter Young Long street 1 Suite2 Chicago 2 Steven Old Park Place 7 Suite5 New York 3 Anna Smith Main Street 7 New York 4 Jessica Martin Townsend road Suite8 Newark Number of cases read: 4 Number of cases listed: 4 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by David Marso
In a similar vein:
data list list (";") /tempvar (a100). begin data Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark end data. string name address1 address2 city(a100). if char.index(tempvar, "ADD1:")>0 address1=tempvar. if char.index(tempvar, "ADD2:")>0 address2=tempvar. if char.index(tempvar, "CITY:")>0 city=tempvar. if char.index(tempvar, "ADD")=0 and char.index(tempvar, "CITY")=0 name=tempvar. execute. if char.length(name)=0 name=lag(name). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=name /address1=MAX(address1) /address2=MAX(address2) /city=MAX(city). Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: David Marso <[hidden email]> To: [hidden email], Date: 05/15/2014 03:55 PM Subject: Re: loading TXT file where 3 OR 4 lines represent one record Sent by: "SPSSX(r) Discussion" <[hidden email]> Why not keep it simple? DATA LIST / txt (A50). BEGIN DATA Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark END DATA. STRING Name Address1 Address2 City (A50). IF CHAR.SUBSTR(txt,1,4) EQ "ADD1" Address1=txt. IF CHAR.SUBSTR(txt,1,4) EQ "ADD2" Address2=txt. IF CHAR.SUBSTR(txt,1,4) EQ "CITY" City=txt. COMPUTE ID=1. DO IF CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY" AND ($CASENUM GT 1). COMPUTE Name=txt. COMPUTE ID=LAG(ID)+1. ELSE . IF ($CASENUM GT 1) ID=LAG(ID). ELSE. IF ($CASENUM EQ 1) Name=txt. END IF. DATASET DECLARE agg. AGGREGATE OUTFILE agg /BREAK ID / Name Address1 Address2 City= MAX( Name Address1 Address2 City). DATASET ACTIVATE agg. LIST. Tibor Toth wrote > Greetings, > > > > I have a TXT file that I need to load into SPSS (over 4,000 lines). > Sometimes 3 lines sometimes 4 lines represent one record. (See below that > Anna Smith has no line for ADD2). Could someone help me write an INPUT > program/data list statement to load this data into SPSS? > > > > > > Peter Young > > ADD1: Long street 1 > > ADD2: Suite2 > > CITY: Chicago > > Steven Old > > ADD1: Park Place 7 > > ADD2: Suite5 > > CITY: New York > > Anna Smith > > ADD1: Main Street 7 > > CITY: New York > > Jessica Martin > > ADD1: Townsend road > > ADD2: Suite8 > > CITY: Newark > > > > And I need to load it into SPSS so that I end up with > > > > NAME ADDRESS1 > ADDRESS2 CITY > > Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: > Chicago > > Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: > New > York > > Anna Smith ADD1: Main Street > 7 CITY: New York > > Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark > > > > Thanks in advance, > > Tibor > > > > > > Tibor Tóth, Ph.D. > > > > Center for Applied Demography & Survey Research > > University of Delaware > > 285C Graham Hall > > Newark, DE 19716 > > phone: (302)831-3320 > > e-mail: > tibi@ ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loading-TXT-file-where-3-OR-4-lines-represent-one-record-tp5726089p5726092.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I don't like that version since name might not be unique in the file.
That's why I created the ID variable to use as a BREAK.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Bruce Weaver
Incorporating Bruce's suggestions and simplifying a few elements results in
STRING Name Address1 Address2 City (A50). DO REPEAT rectype= "ADD1" "ADD2" "CITY" /var=Address1 Address2 City. + IF CHAR.SUBSTR(txt,1,4) EQ rectype var=CHAR.SUBSTR(txt,7). END REPEAT. COMPUTE ID=MAX(LAG(ID),SUM(1,LAG(ID)*(CHAR.SUBSTR(LAG(txt),1,4) EQ "CITY"))). IF NOT(ANY(CHAR.SUBSTR(txt,1,4),"ADD1","ADD2","CITY")) Name=txt. ALTER TYPE Name Address1 Address2 City (AMIN). DATASET DECLARE agg. AGGREGATE OUTFILE agg /BREAK ID / Name Address1 Address2 City= MAX( Name Address1 Address2 City). DATASET ACTIVATE agg. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Tibor Toth
Here is an input program solution. It
works by just reading each record according to its declared type while
if no type matches, it is the name.
This mechanism is much more powerful than used here - each record type could have an entirely different set of variables, but this approach should make it very clear exactly what input is expected. input program. data list file="c:/temp/data.txt" / prefix(a5). do if prefix eq "ADD1:". reread column=6. data list /add1(a50). else if prefix eq "ADD2:". reread column=6. data list /add2(a50). else if prefix eq "CITY:". reread column=6. data list / city(a50). end case. else. reread column=1. data list /name(a50). end if. end input program. list. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Tibor Toth <[hidden email]> To: [hidden email], Date: 05/15/2014 02:16 PM Subject: [SPSSX-L] loading TXT file where 3 OR 4 lines represent one record Sent by: "SPSSX(r) Discussion" <[hidden email]> Greetings, I have a TXT file that I need to load into SPSS (over 4,000 lines). Sometimes 3 lines sometimes 4 lines represent one record. (See below that Anna Smith has no line for ADD2). Could someone help me write an INPUT program/data list statement to load this data into SPSS? Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark And I need to load it into SPSS so that I end up with NAME ADDRESS1 ADDRESS2 CITY Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark Thanks in advance, Tibor Tibor Tóth, Ph.D. Center for Applied Demography & Survey Research University of Delaware 285C Graham Hall Newark, DE 19716 phone: (302)831-3320 e-mail: tibi@... |
Administrator
|
Very nice, Jon. I was not aware of REREAD--it looks quite useful. (I'd still add the ALTER TYPE with AMIN bit at the end though.) ;-)
David, great use of DO-REPEAT in your latest post too. It's much tidier than the original version.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
This post was updated on .
In reply to this post by Jon K Peck
<modified for readability 05/16/2014 22:30>
Here is a rewrite of Jon's program using a VECTOR approach with record types used to calculate indexes into the vector . It seems to me that this is a rather fragile data structure in that things will go into the crapper if a case is missing a CITY: . This observation also applies to the code I posted yesterday. INPUT PROGRAM. STRING name address1 address2 city (A50). VECTOR stuff=address1 TO city. DATA LIST / #prefix(a5). COMPUTE #vtype=CHAR.INDEX("ADD1:ADD2:CITY:",#prefix). DO IF #vtype=0. + REREAD. + DATA LIST /name (A50). ELSE. + REREAD column=7. + DATA LIST / #stuff (A50). + COMPUTE stuff((#vtype + 4)/5 )=#stuff. END IF. DO IF #prefix="CITY:". + END CASE. END IF . END INPUT PROGRAM. ALTER TYPE name address1 address2 city (AMIN) BEGIN DATA Peter Young ADD1: Long street 1 ADD2: Suite2 CITY: Chicago Steven Old ADD1: Park Place 7 ADD2: Suite5 CITY: New York Anna Smith ADD1: Main Street 7 CITY: New York Jessica Martin ADD1: Townsend road ADD2: Suite8 CITY: Newark END DATA. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |