SPSSX Discussion

Extracting data from a messy file.

Classic

List

Threaded

8 messages Options

Ntonghanwah Forcheh

Extracting data from a messy file.

I am trying to reorganise data in a messy file where the variable names are listed followed by a few lines of data values for different cases, then another row of variable names and then one or more rows of corresponding data values.

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?

example is as shown below.

Thanks

Forcheh

INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104_OTHER    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 12397-01    1    2    52    1    10    Working at namola leuba    6    2        2
BAISIV 12397-02    1    1    13    3    9        1    2        2
BAISIV 12397-03    1    1    13    3    9        1    2        2
BAISIV 12397-05    1    2    17    4    9        1    2        2
BAISIV 12397-06    1    1    15    4    9        1    2        2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104B    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 11886-03    1    1    24    5    2    Clothing retailer    1    2        2
BAISIV 11886-04    1    1    18    5    9        1    2        2
BAISIV 11886-05    1    1    13    3    9        1    2        2
BAISIV 11886-07    2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q203aa
BAISIV 12398-01    1    2    29    4    5    1    2        2
BAISIV 12398-02    1    2    19    5    9    1    2        2
BAISIV 12398-07    2
BAISIV 12399-01    1    1    57    4    1    Veterenary officer    6    6    34    23
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q202A
BAISIV 12396-01    1    1    24    6    5    1    2        1    1
BAISIV 12396-03    1    2    20    5    10    1    2        1    2
BAISIV 12396-03    1    2    20    5    10    1    2        1    2


BAISIV 12326-01    1    2    54    4    6    1    2        2

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Rich Ulrich

Re: Extracting data from a messy file.

First, there is the question of what the organization is for these lines of data.

I think: This is like comma-delimited data, where "aaa" is a regular delimiter.
When a line ends with aaa, there is a Missing value following that aaa.

I think: There is a full set of possible questions, with a specific order, and each
"set" of lines following "INDIVIDUAL... " has a subset of those.
I think: I don't know what would be intended by a line that is shown already as
all "aaa" markers, except for a useless line of all Missing.

For processing: I would start by adding to the start of each line a couple of
counters, one for Set and one for Line number. The INDIVIDUAL ... would
be numbered as 0. The rest of the line is processed as text, and partitioned
(A25 or larger) into a the total number of variables, with the 'aaa' removed.
Most of these appear to have small integers, but you don't have to predict
where the long String answers occur.

Then I would pull out the INDIVIDUAL... lines to a separate file called INDEX,
leaving behind DATAONLY. Using a master list of Questions I would match to
convert every VARNAME into its proper, sequential number among the full list of
questions, in file INDEX#.

That list can be MATCHed with /table= INDEX# ... by set-number back to DATAONLY.

DATAONLY + INDEX# has variables, plus the columns where the questions belong in
the full set. It is straightforward from there to use LOOP and Vectors to assign each
to its proper slot; and finally, to convert Strings to arithmetic, after checking that
the file was created with proper consistency.

Does this handle the problem?

--
Rich Ulrich

Date: Sun, 22 Sep 2013 23:11:05 +0200
From: [hidden email]
Subject: Extracting data from a messy file.
To: [hidden email]

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?

example is as shown below.

Thanks

Forcheh

INDIVIDUAL_BARCODEааа ID_CONSENTааа Q101ааа Q102ааа Q103ааа Q104Aааа Q104_OTHERааа Q105ааа Q106ааа AGE_DIFFERENCEааа Q201
BAISIV 12397-01ааа 1ааа 2ааа 52ааа 1ааа 10ааа Working at namola leubaааа 6ааа 2ааа ааа 2
BAISIV 12397-02ааа 1ааа 1ааа 13ааа 3ааа 9ааа ааа 1ааа 2ааа ааа 2
BAISIV 12397-03ааа 1ааа 1ааа 13ааа 3ааа 9ааа ааа 1ааа 2ааа ааа 2
BAISIV 12397-05ааа 1ааа 2ааа 17ааа 4ааа 9ааа ааа 1ааа 2ааа ааа 2
BAISIV 12397-06ааа 1ааа 1ааа 15ааа 4ааа 9ааа ааа 1ааа 2ааа ааа 2
INDIVIDUAL_BARCODEааа ID_CONSENTааа Q101ааа Q102ааа Q103ааа Q104Aааа Q104Bааа Q105ааа Q106ааа AGE_DIFFERENCEааа Q201
BAISIV 11886-03ааа 1ааа 1ааа 24ааа 5ааа 2ааа Clothing retailerааа 1ааа 2ааа ааа 2
BAISIV 11886-04ааа 1ааа 1ааа 18ааа 5ааа 9ааа ааа 1ааа 2ааа ааа 2
BAISIV 11886-05ааа 1ааа 1ааа 13ааа 3ааа 9ааа ааа 1ааа 2ааа ааа 2
BAISIV 11886-07ааа 2ааа ааа ааа ааа ааа ааа ааа ааа ааа
INDIVIDUAL_BARCODEааа ID_CONSENTааа Q101ааа Q102ааа Q103ааа Q104Aааа Q105ааа Q106ааа AGE_DIFFERENCEааа Q201ааа Q203aa
BAISIV 12398-01ааа 1ааа 2ааа 29ааа 4ааа 5ааа 1ааа 2ааа ааа 2ааа
BAISIV 12398-02ааа 1ааа 2ааа 19ааа 5ааа 9ааа 1ааа 2ааа ааа 2ааа
BAISIV 12398-07ааа 2ааа ааа ааа ааа ааа ааа ааа ааа ааа
BAISIV 12399-01ааа 1ааа 1ааа 57ааа 4ааа 1ааа Veterenary officerааа 6ааа 6ааа 34ааа 23
INDIVIDUAL_BARCODEааа ID_CONSENTааа Q101ааа Q102ааа Q103ааа Q104Aааа Q105ааа Q106ааа AGE_DIFFERENCEааа Q201ааа Q202A
BAISIV 12396-01ааа 1ааа 1ааа 24ааа 6ааа 5ааа 1ааа 2ааа ааа 1ааа 1
BAISIV 12396-03ааа 1ааа 2ааа 20ааа 5ааа 10ааа 1ааа 2ааа ааа 1ааа 2
BAISIV 12396-03ааа 1ааа 2ааа 20ааа 5ааа 10ааа 1ааа 2ааа ааа 1ааа 2
ааа ааа ааа ааа ааа ааа ааа ааа ааа ааа
ааа ааа ааа ааа ааа ааа ааа ааа ааа ааа
BAISIV 12326-01ааа 1ааа 2ааа 54ааа 4ааа 6ааа 1ааа 2ааа ааа 2ааа

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
аMobile:а Orange +267 75 26 2963,а а Bmobile:а 73181378:а а Mascomа 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Albert-Jan Roskam

Re: Extracting data from a messy file.

In reply to this post by Ntonghanwah Forcheh

Hi,

I hacked this together and it seems to do the job. Output is in a csv file, see here if the indentation gets messed up: http://pastebin.com/WvHxF44m

import os
import csv
import pprint

# assumes that nobody has filled in a question using 4 consecutive spaces
def read(infile):
    with open(infile, "rb") as f:
        records = {}
        allvarnames = set()
        for line in f.readlines():
            values = None
            if line.startswith("INDIVIDUAL_BARCODE"):
                varnames = line.rstrip("\n").split(" " * 4)
                allvarnames.update(varnames)
            else:
                values = line.rstrip("\n").split(" " * 4)
                id_value = values[0]
            if values:
                record = dict(zip(varnames, values))
                records[id_value] = record
    #pprint.pprint(records)
    return allvarnames, records

def write(data, outfile, missing=""):
    allvarnames, records = data
    with open(outfile, "wb") as w:
        writer = csv.writer(w)
        allvarnames = sorted(list(allvarnames))
        writer.writerow(allvarnames)
        for id_value, record in records.items():
            merged = []
            for varname in allvarnames:
                merged.append(record.get(varname, missing))
            writer.writerow(merged)

if __name__ == "__main__":
    infile = os.path.expanduser("~/Desktop/forcheh.txt")
    outfile = os.path.expanduser("~/Desktop/forcheh_out.csv")
    write(read(infile), outfile)

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: Ntonghanwah Forcheh <[hidden email]>
To: [hidden email]
Sent: Sunday, September 22, 2013 11:11 PM
Subject: [SPSSX-L] Extracting data from a messy file.

I am trying to reorganise data in a messy file where the variable names are listed followed by a few lines of data values for different cases, then another row of variable names and then one or more rows of corresponding data values.

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?
example is as shown below.
Thanks
Forcheh

INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104_OTHER    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 12397-01    1    2    52    1    10    Working at namola leuba    6    2        2
BAISIV 12397-02    1    1    13    3    9        1    2        2
BAISIV 12397-03    1    1    13    3    9        1    2        2
BAISIV 12397-05    1    2    17    4    9        1    2        2
BAISIV 12397-06    1    1    15    4    9        1    2        2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104B    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 11886-03    1    1    24    5    2    Clothing retailer    1    2        2
BAISIV 11886-04    1    1    18    5    9        1    2        2
BAISIV 11886-05    1    1    13    3    9        1    2        2
BAISIV 11886-07    2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q203aa
BAISIV 12398-01    1    2    29    4    5    1    2        2
BAISIV 12398-02    1    2    19    5    9    1    2        2
BAISIV 12398-07    2
BAISIV 12399-01    1    1    57    4    1    Veterenary officer    6    6    34    23
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q202A
BAISIV 12396-01    1    1    24    6    5    1    2        1    1
BAISIV 12396-03    1    2    20    5    10    1    2        1    2
BAISIV 12396-03    1    2    20    5    10    1    2        1    2


BAISIV 12326-01    1    2    54    4    6    1    2        2

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

hillel vardi

Re: Extracting data from a messy file.

In reply to this post by Ntonghanwah Forcheh

Shalom

Here is an example of reading the "messy file " using SPSS syntax

* >>> use text editor to change the DELIMITERS ( 4 spaces ) to comma .

GET DATA
/TYPE=TXT
/FILE="L:\spss\messy_file.txt"
/DELCASE=LINE
/DELIMITERS=","
/ARRANGEMENT=DELIMITED
/FIRSTCASE=1
/IMPORTCASE=ALL
/VARIABLES=
V1 A18
V2 A10
V3 A4
V4 A4
V5 A4
V6 A5
V7 A23
V8 A4
V9 A14
V10 A14
V11 A6.
CACHE.
EXECUTE.

* >>>> if there are lines with more then 11 variable add them .

DATASET NAME messy.

ALTER TYPE v1 to v11(a30).
string #v1 to #v11   id vn1 vn2(a30) .
do if index(v1,'INDIVIDUAL_BARCODE') gt 0 .
do repeat vvv=v1 to v11/
                    vv_vv=#v1 to #v11 .
compute vv_vv=vvv.
END REPEAT .
ELSE.
VECTOR vv2=v1 to v11.
VECTOR vv1=#v1 to #v11.
loop   i=1 to 11.
compute id=vv2(1) .
compute vn1=vv1(i) .
compute vn2=vv2(i) .
xsave OUTFILE="L:\spss\messy_file2.sav" / keep= id vn1 vn2 .
END LOOP.
END IF .
execute .

get   FILE="L:\spss\messy_file2.sav" .
SORT CASES BY id vn1 vn2 .
add files file= */ BY id vn1 vn2 / first=first.
* >>>>>   there where duplicate values in the example I deleted them .
select if first eq 1.
CASESTOVARS
/ID=id
/INDEX=vn1
/GROUPBY=VARIABLE.

* >>>> use add files file=* / keep= var name          to order variables to the original order .

>> Here is file I used

INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q104_OTHER,Q105,Q106,AGE_DIFFERENCE,Q201
BAISIV 12397-01,1,2,52,1,10,Working at namola leuba,6,2,,2
BAISIV 12397-02,1,1,13,3,9,,1,2,,2
BAISIV 12397-03,1,1,13,3,9,,1,2,,2
BAISIV 12397-05,1,2,17,4,9,,1,2,,2
BAISIV 12397-06,1,1,15,4,9,,1,2,,2
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q104B,Q105,Q106,AGE_DIFFERENCE,Q201
BAISIV 11886-03,1,1,24,5,2,Clothing retailer,1,2,,2
BAISIV 11886-04,1,1,18,5,9,,1,2,,2
BAISIV 11886-05,1,1,13,3,9,,1,2,,2
BAISIV 11886-07,2,,,,,,,,
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q105,Q106,AGE_DIFFERENCE,Q201,Q203aa
BAISIV 12398-01,1,2,29,4,5,1,2,,2
BAISIV 12398-02,1,2,19,5,9,1,2,,2
BAISIV 12398-07,2,,,,,,,,
BAISIV 12399-01,1,1,57,4,1,Veterenary officer,6,6,34,23
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q105,Q106,AGE_DIFFERENCE,Q201,Q202A
BAISIV 12396-01,1,1,24,6,5,1,2,,1,1
BAISIV 12396-03,1,2,20,5,10,1,2,,1,2
BAISIV 12396-03,1,2,20,5,10,1,2,,1,2
BAISIV 12326-01,1,2,54,4,6,1,2,,2

Hillel Vardi
BGU

On 22/09/2013 23:11, Ntonghanwah Forcheh wrote:

I am trying to reorganise data in a messy file where the variable names are listed followed by a few lines of data values for different cases, then another row of variable names and then one or more rows of corresponding data values.

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?

example is as shown below.

Thanks

Forcheh

INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104_OTHER    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 12397-01    1    2    52    1    10    Working at namola leuba    6    2        2
BAISIV 12397-02    1    1    13    3    9        1    2        2
BAISIV 12397-03    1    1    13    3    9        1    2        2
BAISIV 12397-05    1    2    17    4    9        1    2        2
BAISIV 12397-06    1    1    15    4    9        1    2        2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104B    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 11886-03    1    1    24    5    2    Clothing retailer    1    2        2
BAISIV 11886-04    1    1    18    5    9        1    2        2
BAISIV 11886-05    1    1    13    3    9        1    2        2
BAISIV 11886-07    2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q203aa
BAISIV 12398-01    1    2    29    4    5    1    2        2
BAISIV 12398-02    1    2    19    5    9    1    2        2
BAISIV 12398-07    2
BAISIV 12399-01    1    1    57    4    1    Veterenary officer    6    6    34    23
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q202A
BAISIV 12396-01    1    1    24    6    5    1    2        1    1
BAISIV 12396-03    1    2    20    5    10    1    2        1    2
BAISIV 12396-03    1    2    20    5    10    1    2        1    2


BAISIV 12326-01    1    2    54    4    6    1    2        2

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Hillel Vardi-2

Re: Extracting data from a messy file.

In reply to this post by Ntonghanwah Forcheh

I am trying to reorganise data in a messy file where the variable names are listed followed by a few lines of data values for different cases, then another row of variable names and then one or more rows of corresponding data values.

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?

example is as shown below.

Thanks

Forcheh

INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104_OTHER    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 12397-01    1    2    52    1    10    Working at namola leuba    6    2        2
BAISIV 12397-02    1    1    13    3    9        1    2        2
BAISIV 12397-03    1    1    13    3    9        1    2        2
BAISIV 12397-05    1    2    17    4    9        1    2        2
BAISIV 12397-06    1    1    15    4    9        1    2        2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104B    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 11886-03    1    1    24    5    2    Clothing retailer    1    2        2
BAISIV 11886-04    1    1    18    5    9        1    2        2
BAISIV 11886-05    1    1    13    3    9        1    2        2
BAISIV 11886-07    2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q203aa
BAISIV 12398-01    1    2    29    4    5    1    2        2
BAISIV 12398-02    1    2    19    5    9    1    2        2
BAISIV 12398-07    2
BAISIV 12399-01    1    1    57    4    1    Veterenary officer    6    6    34    23
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q202A
BAISIV 12396-01    1    1    24    6    5    1    2        1    1
BAISIV 12396-03    1    2    20    5    10    1    2        1    2
BAISIV 12396-03    1    2    20    5    10    1    2        1    2


BAISIV 12326-01    1    2    54    4    6    1    2        2

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Ntonghanwah Forcheh

Re: Extracting data from a messy file.

In reply to this post by hillel vardi

Dear all,

Many thanks to the responses to my question. After posting, I "disappeared" and only returned to office.

I shall attempt the suggestions provided and give feed back very soon.

I notice that when the tabular data were posted, the table lines were converted to string of aaa. these were not part of the data. My apologies.

forcheh

On Tue, Sep 24, 2013 at 12:04 AM, hillel vardi <[hidden email]> wrote:

Shalom

Here is an example of reading the "messy file " using SPSS syntax

* >>> use text editor to change the DELIMITERS ( 4 spaces ) to comma .

GET DATA
/TYPE=TXT
/FILE="L:\spss\messy_file.txt"
/DELCASE=LINE
/DELIMITERS=","
/ARRANGEMENT=DELIMITED
/FIRSTCASE=1
/IMPORTCASE=ALL
/VARIABLES=
V1 A18
V2 A10
V3 A4
V4 A4
V5 A4
V6 A5
V7 A23
V8 A4
V9 A14
V10 A14
V11 A6.
CACHE.
EXECUTE.

* >>>> if there are lines with more then 11 variable add them .

DATASET NAME messy.

ALTER TYPE v1 to v11(a30).
string #v1 to #v11   id vn1 vn2(a30) .
do if index(v1,'INDIVIDUAL_BARCODE') gt 0 .
do repeat vvv=v1 to v11/
                    vv_vv=#v1 to #v11 .
compute vv_vv=vvv.
END REPEAT .
ELSE.
VECTOR vv2=v1 to v11.
VECTOR vv1=#v1 to #v11.
loop   i=1 to 11.
compute id=vv2(1) .
compute vn1=vv1(i) .
compute vn2=vv2(i) .
xsave OUTFILE="L:\spss\messy_file2.sav" / keep= id vn1 vn2 .
END LOOP.
END IF .
execute .

get   FILE="L:\spss\messy_file2.sav" .
SORT CASES BY id vn1 vn2 .
add files file= */ BY id vn1 vn2 / first=first.
* >>>>>   there where duplicate values in the example I deleted them .
select if first eq 1.
CASESTOVARS
/ID=id
/INDEX=vn1
/GROUPBY=VARIABLE.

* >>>> use add files file=* / keep= var name          to order variables to the original order .

>> Here is file I used

INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q104_OTHER,Q105,Q106,AGE_DIFFERENCE,Q201
BAISIV 12397-01,1,2,52,1,10,Working at namola leuba,6,2,,2
BAISIV 12397-02,1,1,13,3,9,,1,2,,2
BAISIV 12397-03,1,1,13,3,9,,1,2,,2
BAISIV 12397-05,1,2,17,4,9,,1,2,,2
BAISIV 12397-06,1,1,15,4,9,,1,2,,2
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q104B,Q105,Q106,AGE_DIFFERENCE,Q201
BAISIV 11886-03,1,1,24,5,2,Clothing retailer,1,2,,2
BAISIV 11886-04,1,1,18,5,9,,1,2,,2
BAISIV 11886-05,1,1,13,3,9,,1,2,,2
BAISIV 11886-07,2,,,,,,,,
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q105,Q106,AGE_DIFFERENCE,Q201,Q203aa
BAISIV 12398-01,1,2,29,4,5,1,2,,2
BAISIV 12398-02,1,2,19,5,9,1,2,,2
BAISIV 12398-07,2,,,,,,,,
BAISIV 12399-01,1,1,57,4,1,Veterenary officer,6,6,34,23
INDIVIDUAL_BARCODE,ID_CONSENT,Q101,Q102,Q103,Q104A,Q105,Q106,AGE_DIFFERENCE,Q201,Q202A
BAISIV 12396-01,1,1,24,6,5,1,2,,1,1
BAISIV 12396-03,1,2,20,5,10,1,2,,1,2
BAISIV 12396-03,1,2,20,5,10,1,2,,1,2
BAISIV 12326-01,1,2,54,4,6,1,2,,2

Hillel Vardi
BGU

On 22/09/2013 23:11, Ntonghanwah Forcheh wrote:

I am trying to reorganise data in a messy file where the variable names are listed followed by a few lines of data values for different cases, then another row of variable names and then one or more rows of corresponding data values.

Been trying to use vector-Loop-DO IF and Lag(vectorname, n) combination of commands but just cant get them to work.

Any hints please?

example is as shown below.

Thanks

Forcheh

INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104_OTHER    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 12397-01    1    2    52    1    10    Working at namola leuba    6    2        2
BAISIV 12397-02    1    1    13    3    9        1    2        2
BAISIV 12397-03    1    1    13    3    9        1    2        2
BAISIV 12397-05    1    2    17    4    9        1    2        2
BAISIV 12397-06    1    1    15    4    9        1    2        2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q104B    Q105    Q106    AGE_DIFFERENCE    Q201
BAISIV 11886-03    1    1    24    5    2    Clothing retailer    1    2        2
BAISIV 11886-04    1    1    18    5    9        1    2        2
BAISIV 11886-05    1    1    13    3    9        1    2        2
BAISIV 11886-07    2
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q203aa
BAISIV 12398-01    1    2    29    4    5    1    2        2
BAISIV 12398-02    1    2    19    5    9    1    2        2
BAISIV 12398-07    2
BAISIV 12399-01    1    1    57    4    1    Veterenary officer    6    6    34    23
INDIVIDUAL_BARCODE    ID_CONSENT    Q101    Q102    Q103    Q104A    Q105    Q106    AGE_DIFFERENCE    Q201    Q202A
BAISIV 12396-01    1    1    24    6    5    1    2        1    1
BAISIV 12396-03    1    2    20    5    10    1    2        1    2
BAISIV 12396-03    1    2    20    5    10    1    2        1    2


BAISIV 12326-01    1    2    54    4    6    1    2        2

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: <a href="tel:%2B267%20355%202696" value="+2673552696" target="_blank">+267 355 2696,
Mobile: Orange <a href="tel:%2B267%2075%2026%202963" value="+26775262963" target="_blank">+267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: <a href="tel:%2B267%203185099" value="+2673185099" target="_blank">+267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

--
Professor Ntonghanwah Forcheh
Department of Statistics,
University of Botswana
Private Bag UB00705, Gaborone, Botswana.
Office: +267 355 2696,
Mobile: Orange +267 75 26 2963, Bmobile: 73181378: Mascom 754 21238
fax: +267 3185099;
Alternative Email: [hidden email]
*@Honesty is a Virtue, Freedom of the Mind is Power.
Motto: Never be afraid to be honest, Never lie to yourself, Trust in the
Truth and you will be forever free.*

Rich Ulrich

Re: Extracting data from a messy file.

The presence of the "aaa" strings was useful, in this way:

It showed that there is *some* delimiter between fields, not merely
some blanks. It is a delimiter that you will conveniently make use of
when reading the input. It identifies the text field, and it marks
off some included Missing fields.

--
Rich Ulrich

Date: Thu, 26 Sep 2013 15:31:04 +0200
From: [hidden email]
Subject: Re: Extracting data from a messy file.
To: [hidden email]

Dear all,

Many thanks to the responses to my question. After posting, I "disappeared" and only returned to office.

I shall attempt the suggestions provided and give feed back very soon.

I notice that when the tabular data were posted, the table lines were converted to string of aaa. these were not part of the data. My apologies.

forchehа

[snip, previous]

Albert-Jan Roskam

Re: Extracting data from a messy file.

________________________________

> From: Rich Ulrich <[hidden email]>
>To: [hidden email]
>Sent: Thursday, September 26, 2013 6:33 PM
>Subject: Re: [SPSSX-L] Extracting data from a messy file.
>
>The presence of the "aaa" strings was useful, in this way:
>
>It showed that there is *some* delimiter between fields, not merely
>some blanks. It is a delimiter that you will conveniently make use of
>when reading the input. It identifies the text field, and it marks
>off some included Missing fields.

There was no aaa in the plain text version I received. I concluded that 4 spaces were the delimiter and the resulting file looked right.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD