SPSSX Discussion

Help with complex coding issue

Classic

List

Threaded

16 messages Options

hohmanz

Help with complex coding issue

Hello I need help figuring out how to write code to deal with a complex data issue. I have a dataset for each of my participants in a study (n = 100). For each participant/dataset I have over 900,000 data points (physiology data). I have a time variable (min), three physiology measures, and an indicator variable (5 = capture physiology data, and 0 = do not capture physiology data). So an example dataset would look like this
min ch1 ch2 ch3 indicator
.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 5
.000066667 .000762939 -.001983640 4.26178 5
.000083333 .000457764 -.001831050 4.25873 5
.000100000 .001831050 -.000305176 4.26025 5
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 0
.000216667 .001831050 -.001373290 4.25873 0
.000233333 .003051760 -.001068120 4.25568 0

My issue is that the datasets are huge (more than 900,000 cases) and I have 100 datasets, so I need to write code to get the data in the form that I need for my analyses.

So what I need to do is pull out the data into separate datasets. I need a dataset that is the first set of 5, another for the second set of 5, and then one for the last set of 5. Each set corresponds with a stimulus presentation, so I will need to aggregate the values for each set separately. I cannot figure how to write a do loop that will pull out the first set of 5 into a dataset, the second set of 5s in another dataset, etc. Any suggestions would be greatly appreciated, also if I need to clarify anything please let me know.

Bruce Weaver

Re: Help with complex coding issue

Administrator

You wrote: "My issue is that the datasets are huge (more than 900,000 cases) and I have 100 datasets, so I need to write code to get the data in the form that I need for my analyses."

I'm sorry, but I could not work out what form your data need to be in for the intended analyses. Could you generate a small example of what that final data file would look like (i.e., how many rows per subject, what the variables are, etc)? Thanks.

hohmanz wrote

Hello I need help figuring out how to write code to deal with a complex data issue. I have a dataset for each of my participants in a study (n = 100). For each participant/dataset I have over 900,000 data points (physiology data). I have a time variable (min), three physiology measures, and an indicator variable (5 = capture physiology data, and 0 = do not capture physiology data). So an example dataset would look like this
min ch1 ch2 ch3 indicator
.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 5
.000066667 .000762939 -.001983640 4.26178 5
.000083333 .000457764 -.001831050 4.25873 5
.000100000 .001831050 -.000305176 4.26025 5
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 0
.000216667 .001831050 -.001373290 4.25873 0
.000233333 .003051760 -.001068120 4.25568 0

My issue is that the datasets are huge (more than 900,000 cases) and I have 100 datasets, so I need to write code to get the data in the form that I need for my analyses.

So what I need to do is pull out the data into separate datasets. I need a dataset that is the first set of 5, another for the second set of 5, and then one for the last set of 5. Each set corresponds with a stimulus presentation, so I will need to aggregate the values for each set separately. I cannot figure how to write a do loop that will pull out the first set of 5 into a dataset, the second set of 5s in another dataset, etc. Any suggestions would be greatly appreciated, also if I need to clarify anything please let me know.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

F. J. Kelley

Re: Help with complex coding issue

In reply to this post by hohmanz

I found the description somewhat confusing too and an example would be a huge help. However, to a degree, this suggests "code to write code", which might be more easily done with a macro.
--Joe

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Help with complex coding issue

Administrator

In reply to this post by hohmanz

"
So what I need to do is pull out the data into separate datasets. "
Why do you believe this is necessary?
I very much doubt that is really the case!
See MOD function and AGGREGATE command?
---

hohmanz wrote

Hello I need help figuring out how to write code to deal with a complex data issue. I have a dataset for each of my participants in a study (n = 100). For each participant/dataset I have over 900,000 data points (physiology data). I have a time variable (min), three physiology measures, and an indicator variable (5 = capture physiology data, and 0 = do not capture physiology data). So an example dataset would look like this
min ch1 ch2 ch3 indicator
.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 5
.000066667 .000762939 -.001983640 4.26178 5
.000083333 .000457764 -.001831050 4.25873 5
.000100000 .001831050 -.000305176 4.26025 5
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 0
.000216667 .001831050 -.001373290 4.25873 0
.000233333 .003051760 -.001068120 4.25568 0

My issue is that the datasets are huge (more than 900,000 cases) and I have 100 datasets, so I need to write code to get the data in the form that I need for my analyses.

So what I need to do is pull out the data into separate datasets. I need a dataset that is the first set of 5, another for the second set of 5, and then one for the last set of 5. Each set corresponds with a stimulus presentation, so I will need to aggregate the values for each set separately. I cannot figure how to write a do loop that will pull out the first set of 5 into a dataset, the second set of 5s in another dataset, etc. Any suggestions would be greatly appreciated, also if I need to clarify anything please let me know.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

David Marso

Re: Help with complex coding issue

Administrator

Oops, not MOD, but rather TRUNC after division by 5 and reduction by a smidge.
DATA LIST FREE/ x.
BEGIN DATA
2 5 5 6 3 2 5 7 3 2 6 6 7 2 5 6 5 5 7 2 3 5 5 7 3 5 7 7 6 3 6 7 2 3
END DATA.
COMPUTE @=$CASENUM.
COMPUTE gp=TRUNC(@/5-.0001).
LIST.
AGGREGATE OUTFILE * / BREAK gp / Mean_X=MEAN(X)/MED_X=MEDIAN(X).
LIST.

X @ GP

2.00 1.00 .00
5.00 2.00 .00
5.00 3.00 .00
6.00 4.00 .00
3.00 5.00 .00
2.00 6.00 1.00
5.00 7.00 1.00
7.00 8.00 1.00
3.00 9.00 1.00
2.00 10.00 1.00
6.00 11.00 2.00
6.00 12.00 2.00
7.00 13.00 2.00
2.00 14.00 2.00
5.00 15.00 2.00
6.00 16.00 3.00
5.00 17.00 3.00
5.00 18.00 3.00
7.00 19.00 3.00
2.00 20.00 3.00
3.00 21.00 4.00
5.00 22.00 4.00
5.00 23.00 4.00
7.00 24.00 4.00
3.00 25.00 4.00
5.00 26.00 5.00
7.00 27.00 5.00
7.00 28.00 5.00
6.00 29.00 5.00
3.00 30.00 5.00
6.00 31.00 6.00
7.00 32.00 6.00
2.00 33.00 6.00
3.00 34.00 6.00

Number of cases read: 34 Number of cases listed: 34

After AGGREGATE:

GP MEAN_X MED_X

.00 4.20 5.00
1.00 3.80 3.00
2.00 5.20 6.00
3.00 5.00 5.00
4.00 4.60 5.00
5.00 5.60 6.00
6.00 4.50 4.50

Number of cases read: 7 Number of cases listed: 7

David Marso wrote

"
So what I need to do is pull out the data into separate datasets. "
Why do you believe this is necessary?
I very much doubt that is really the case!
See MOD function and AGGREGATE command?
---

hohmanz wrote

Hello I need help figuring out how to write code to deal with a complex data issue. I have a dataset for each of my participants in a study (n = 100). For each participant/dataset I have over 900,000 data points (physiology data). I have a time variable (min), three physiology measures, and an indicator variable (5 = capture physiology data, and 0 = do not capture physiology data). So an example dataset would look like this
min ch1 ch2 ch3 indicator
.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 5
.000066667 .000762939 -.001983640 4.26178 5
.000083333 .000457764 -.001831050 4.25873 5
.000100000 .001831050 -.000305176 4.26025 5
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 0
.000216667 .001831050 -.001373290 4.25873 0
.000233333 .003051760 -.001068120 4.25568 0

My issue is that the datasets are huge (more than 900,000 cases) and I have 100 datasets, so I need to write code to get the data in the form that I need for my analyses.

So what I need to do is pull out the data into separate datasets. I need a dataset that is the first set of 5, another for the second set of 5, and then one for the last set of 5. Each set corresponds with a stimulus presentation, so I will need to aggregate the values for each set separately. I cannot figure how to write a do loop that will pull out the first set of 5 into a dataset, the second set of 5s in another dataset, etc. Any suggestions would be greatly appreciated, also if I need to clarify anything please let me know.

Rich Ulrich

Re: Help with complex coding issue

In reply to this post by hohmanz

Assuming that you want to do similar things with the different
sets of data, it will *probably* be handy, in the long run, to label
the lines as to which set they belong to, rather than break them
into different sets. Putting in the Casenumber and (Add 4; divide
by 5; truncate) gets you numbered sets of five. You can further
use MOD( ) to break out these sets, if they come in cycles of 3 or
whatever.

I would convert those times in (min) back to milliseconds in order to have it
readable as 1, 2, 3, .... Further, for readability and almost no loss of precision,
I would probably multiply the ch1 and ch2 by a million and round off.
"Hard to read" is a source of error that you should avoid whenever you can.

I'd consider doing something with the "4.25.... " numbers, but it is less
obvious what should be acceptable to all the people reading the data.

--
Rich Ulrich

> Date: Fri, 19 Dec 2014 13:55:55 -0700

> From: [hidden email]
> Subject: Help with complex coding issue
> To: [hidden email]
>
> Hello I need help figuring out how to write code to deal with a complex data
> issue. I have a dataset for each of my participants in a study (n = 100).
> For each participant/dataset I have over 900,000 data points (physiology
> data). I have a time variable (min), three physiology measures, and an
> indicator variable (5 = capture physiology data, and 0 = do not capture
> physiology data). So an example dataset would look like this
> min ch1 ch2 ch3
> indicator
> .000000000 .004882810 -.001373290 4.25568 5
> .000016667 .003967290 -.002288820 4.26025 5
> .000033333 .003662110 -.001220700 4.26025 5
> .000050000 .002746580 -.000915527 4.25873 5
> .000066667 .000762939 -.001983640 4.26178 5
> .000083333 .000457764 -.001831050 4.25873 5
> .000100000 .001831050 -.000305176 4.26025 5
> .000116667 .004577640 -.001525880 4.25873 0
> .000133333 .004882810 .000762939 4.25873 0
> .000150000 .003509520 .000000000 4.25873 0
> .000166667 .001983640 .000000000 4.25720 5
> .000183333 .002136230 -.000610352 4.26025 5
> .000200000 .001983640 .000305176 4.25720 0
> .000216667 .001831050 -.001373290 4.25873 0
> .000233333 .003051760 -.001068120 4.25568 0
>
>
> My issue is that the datasets are huge (more than 900,000 cases) and I have
> 100 datasets, so I need to write code to get the data in the form that I
> need for my analyses.
>
> So what I need to do is pull out the data into separate datasets. I need a
> dataset that is the first set of 5, another for the second set of 5, and
> then one for the last set of 5. Each set corresponds with a stimulus
> presentation, so I will need to aggregate the values for each set
> separately. I cannot figure how to write a do loop that will pull out the
> first set of 5 into a dataset, the second set of 5s in another dataset, etc.
> Any suggestions would be greatly appreciated, also if I need to clarify
> anything please let me know.
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Help-with-complex-coding-issue-tp5728229.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Richard Ristow

Re: Help with complex coding issue

In reply to this post by hohmanz

At 03:55 PM 12/19/2014, hohmanz wrote:

>I have a dataset for each of my participants in a study (n = 100).
>For each participant/dataset I have over 900,000 data points
>(physiology data). I have a time variable (min), three physiology
>measures, and an indicator variable (5 = capture physiology data,
>and 0 = do not capture physiology data). So an example dataset would
>look like this
> min ch1 ch2 ch3 indicator
>.000000000 .004882810 -.001373290 4.25568 5
>.000016667 .003967290 -.002288820 4.26025 5
>.000033333 .003662110 -.001220700 4.26025 5

...
First of all, as Rich Ulrich wrote:
>it will *probably* be handy, in the long run, to label the lines as
>to which set they belong to, rather than break them into different sets.

I'd say this isn't handy; it's absolutely crucial. If I read you
correctly, currently each subject's data is in a separate file. You
should add whatever key identifies your subjects at the beginning of
every record of every file, followed by a record sequence number that
increases with time within each subject. That'll be tedious, and will
make your files bigger, but it removes enormous opportunities for
later confusion.

Rich Ulrich and David Marso suggested something like,
>Putting in the Casenumber and (Add 4; divide by 5; truncate) gets
>you numbered sets of five.

If I understand you correctly, you DON'T want to do that. They were
assuming you have the first record for 5 subjects, then the second
record for the same 5, etc., in your datasets. If I'

>My issue is that the datasets are huge (more than 900,000 cases) ...
That should be a pretty tractable size, in current SPSS
implementations on most current machines

>and I have 100 datasets,
which means 90,000,000 total records, which may be a bit much if they
were all strung together in one file.

>I need to write code to get the data in the form that I need for my analyses.
Good; what form is that? What summarizing do you do, on each subject's data?

>So what I need to do is pull out the data into separate datasets. I
>need a dataset that is the first set of 5 ...
The first set of 5 subjects? The first 5 records, for all subjects?

You may or may not need to create separate datasets; and if you do
need to, you may or may not need to do it the way you describe. The
question is, what data preparation are you doing for your analysis?

I suspect that the solution will be to catenate all your data into
that file of 90,000,000 records (WITH subject IDs and record numbers
in each record); then, possibly, some SORT CASES to get the file in
the order you need for your summarizing; then, the summarizing itself.

You'll probably use AGGREGATE for your summarizing; with a file this
size, you may need to use the PRESORTED option on AGGREGATE.

But from your description, your problem shouldn't be fundamentally difficult.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Help with complex coding issue

Administrator

"If I understand you correctly, you DON'T want to do that. They were
assuming you have the first record for 5 subjects, then the second
record for the same 5, etc., in your datasets. If I' .."

Please don't misattribute what you think I assumed Richard!
OP stated smething about doing something to the first 5 records, something to the second 5 etc...
Using the function I provided and AGGREGATE seems a bit more productive than busting the 900000 records into 180000 files (certainly a ridiculous proposal at best)!
OP needs to clearly communicate the actual issue.

Richard Ristow wrote

At 03:55 PM 12/19/2014, hohmanz wrote:

>I have a dataset for each of my participants in a study (n = 100).
>For each participant/dataset I have over 900,000 data points
>(physiology data). I have a time variable (min), three physiology
>measures, and an indicator variable (5 = capture physiology data,
>and 0 = do not capture physiology data). So an example dataset would
>look like this
> min ch1 ch2 ch3 indicator
>.000000000 .004882810 -.001373290 4.25568 5
>.000016667 .003967290 -.002288820 4.26025 5
>.000033333 .003662110 -.001220700 4.26025 5
...
First of all, as Rich Ulrich wrote:
>it will *probably* be handy, in the long run, to label the lines as
>to which set they belong to, rather than break them into different sets.

I'd say this isn't handy; it's absolutely crucial. If I read you
correctly, currently each subject's data is in a separate file. You
should add whatever key identifies your subjects at the beginning of
every record of every file, followed by a record sequence number that
increases with time within each subject. That'll be tedious, and will
make your files bigger, but it removes enormous opportunities for
later confusion.

Rich Ulrich and David Marso suggested something like,
>Putting in the Casenumber and (Add 4; divide by 5; truncate) gets
>you numbered sets of five.

If I understand you correctly, you DON'T want to do that. They were
assuming you have the first record for 5 subjects, then the second
record for the same 5, etc., in your datasets. If I'

>My issue is that the datasets are huge (more than 900,000 cases) ...
That should be a pretty tractable size, in current SPSS
implementations on most current machines

>and I have 100 datasets,
which means 90,000,000 total records, which may be a bit much if they
were all strung together in one file.

>I need to write code to get the data in the form that I need for my analyses.
Good; what form is that? What summarizing do you do, on each subject's data?

>So what I need to do is pull out the data into separate datasets. I
>need a dataset that is the first set of 5 ...
The first set of 5 subjects? The first 5 records, for all subjects?

You may or may not need to create separate datasets; and if you do
need to, you may or may not need to do it the way you describe. The
question is, what data preparation are you doing for your analysis?

I suspect that the solution will be to catenate all your data into
that file of 90,000,000 records (WITH subject IDs and record numbers
in each record); then, possibly, some SORT CASES to get the file in
the order you need for your summarizing; then, the summarizing itself.

You'll probably use AGGREGATE for your summarizing; with a file this
size, you may need to use the PRESORTED option on AGGREGATE.

But from your description, your problem shouldn't be fundamentally difficult.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Help with complex coding issue

In reply to this post by hohmanz

At 03:55 PM 12/19/2014, hohmanz wrote:

>Hello I need help figuring out how to write code to deal with a complex data
>issue.

Did this ever get resolved? You wrote,

>So what I need to do is pull out the data into separate datasets. I
>need a dataset that is the first set of 5, another for the second
>set of 5, and then one for the last set of 5

and we had differences in understanding of what you meant: the first
5 records for each subject? all records for the first 5 subjects?

Ask again, if you're still having difficulties. If you do, it may
help to expand on this:

>I need to write code to get the data in the form that I need for my analyses

If you can describe what form that is, it may open more possibilities
for how to go about it.

-Best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

hohmanz

Re: Help with complex coding issue

In reply to this post by Richard Ristow

Sorry for the long delay with getting back to the thread and I appreciate all of the comments that people have made. I now see the confusion that people are having with what I wrote. I did not mean by sets of five. It might be easiest to at first just focus on how to get the first data set organized and then I can write a macro to loop through the rest of the data sets. For my first data set (participant one). I have over 900,000 cases (physiological observations). There are 5 variables (min, ch1, ch2, ch3, and indicator). Min refers to time from first observation to last observation; ch1, ch2, ch3 are observations I will be doing analyses on; and indicator tells me if the observations are during a time period of interest (as indicated by 5) or are not (as indicated by 0). What I need to do is pull out all of the cases which have an indicator of 5, which sounds easy. However, I want the cases pulled into different datasets based on which set they occur. Here is an abbreviated example

.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 0
.000066667 .000762939 -.001983640 4.26178 0
.000083333 .000457764 -.001831050 4.25873 0
.000100000 .001831050 -.000305176 4.26025 0
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 5
.000216667 .001831050 -.001373290 4.25873 5
.000233333 .003051760 -.001068120 4.25568 5
.000250000 .003967290 -.001068120 4.26331 5
.000266667 .003204350 -.001373290 4.25873 5
.000283333 .004119870 -.001373290 4.26178 5
.000300000 .003967290 -.001678470 4.25873 5
.000316667 .003967290 -.000915527 4.25720 5
.000333333 .004425050 -.001983640 4.26025 5
.000350000 .002593990 -.001983640 4.26483 5
.000366667 .000305176 -.001678470 4.25873 5
.000383333 .000457764 -.001373290 4.26025 5
.000400000 -.001678470 -.001220700 4.26025 0
.000416667 .000457764 -.000457764 4.25873 0
.000433333 .005493160 -.000610352 4.26331 0
.000450000 .004119870 -.000305176 4.26483 0
.000466667 .002441410 -.001068120 4.26483 0
.000483333 .005035400 -.001220700 4.26331 5
.000500000 .006866460 -.000305176 4.26483 5
.000516667 .007476810 -.000610352 4.26483 5
.000533333 .005340580 -.001678470 4.26483 5
.000550000 .004577640 -.000915527 4.26025 5
.000566667 .003814700 -.001678470 4.26636 5

So for this above example I would need three data sets, one for the first set of cases with an indicator = 5, another one for the second set of cases with an indicator = 5, and then one data set for the final set. As you can see with my abbreviated example, there is no set timeframe/number of cases it varies with in each dataset and between data sets. So I am look for code that can loop through cases and export them into a file based on their value of the indicator and then stop and start a new file. Is this possible? Is this more clear? Again sorry for taking so long to respond to this thread.

Thanks,

Zach

Art Kendall

Re: Help with complex coding issue

It is not clear why you want to split this in different data sets.

SPLIT FILE
or
stratification
or
other approaches might be suggested if you were to to clarify the reason for making separate files.

Art Kendall
Social Research Consultants

David Marso

Re: Help with complex coding issue

Administrator

In reply to this post by hohmanz

"So I am look for code that can loop through cases and export them into a file based on their value of the indicator and then stop and start a new file. Is this possible?"
Again! I state this is misguided!
What are your INTENTIONS after this?
--

hohmanz wrote

Sorry for the long delay with getting back to the thread and I appreciate all of the comments that people have made. I now see the confusion that people are having with what I wrote. I did not mean by sets of five. It might be easiest to at first just focus on how to get the first data set organized and then I can write a macro to loop through the rest of the data sets. For my first data set (participant one). I have over 900,000 cases (physiological observations). There are 5 variables (min, ch1, ch2, ch3, and indicator). Min refers to time from first observation to last observation; ch1, ch2, ch3 are observations I will be doing analyses on; and indicator tells me if the observations are during a time period of interest (as indicated by 5) or are not (as indicated by 0). What I need to do is pull out all of the cases which have an indicator of 5, which sounds easy. However, I want the cases pulled into different datasets based on which set they occur. Here is an abbreviated example

.000000000 .004882810 -.001373290 4.25568 5
.000016667 .003967290 -.002288820 4.26025 5
.000033333 .003662110 -.001220700 4.26025 5
.000050000 .002746580 -.000915527 4.25873 0
.000066667 .000762939 -.001983640 4.26178 0
.000083333 .000457764 -.001831050 4.25873 0
.000100000 .001831050 -.000305176 4.26025 0
.000116667 .004577640 -.001525880 4.25873 0
.000133333 .004882810 .000762939 4.25873 0
.000150000 .003509520 .000000000 4.25873 0
.000166667 .001983640 .000000000 4.25720 5
.000183333 .002136230 -.000610352 4.26025 5
.000200000 .001983640 .000305176 4.25720 5
.000216667 .001831050 -.001373290 4.25873 5
.000233333 .003051760 -.001068120 4.25568 5
.000250000 .003967290 -.001068120 4.26331 5
.000266667 .003204350 -.001373290 4.25873 5
.000283333 .004119870 -.001373290 4.26178 5
.000300000 .003967290 -.001678470 4.25873 5
.000316667 .003967290 -.000915527 4.25720 5
.000333333 .004425050 -.001983640 4.26025 5
.000350000 .002593990 -.001983640 4.26483 5
.000366667 .000305176 -.001678470 4.25873 5
.000383333 .000457764 -.001373290 4.26025 5
.000400000 -.001678470 -.001220700 4.26025 0
.000416667 .000457764 -.000457764 4.25873 0
.000433333 .005493160 -.000610352 4.26331 0
.000450000 .004119870 -.000305176 4.26483 0
.000466667 .002441410 -.001068120 4.26483 0
.000483333 .005035400 -.001220700 4.26331 5
.000500000 .006866460 -.000305176 4.26483 5
.000516667 .007476810 -.000610352 4.26483 5
.000533333 .005340580 -.001678470 4.26483 5
.000550000 .004577640 -.000915527 4.26025 5
.000566667 .003814700 -.001678470 4.26636 5

So for this above example I would need three data sets, one for the first set of cases with an indicator = 5, another one for the second set of cases with an indicator = 5, and then one data set for the final set. As you can see with my abbreviated example, there is no set timeframe/number of cases it varies with in each dataset and between data sets. So I am look for code that can loop through cases and export them into a file based on their value of the indicator and then stop and start a new file. Is this possible? Is this more clear? Again sorry for taking so long to respond to this thread.

Thanks,

Zach

hohmanz

Re: Help with complex coding issue

I will probably then create an average for each of the ch1 ch2 and ch3 variables that will be merged into another wide dataset I have. Though it is possible that I will not be computing the average and I might run a HLM analysis, which is why I want them in that format. Another option to creating three data sets would be able to identify if the case is in the first, second, or third section of 5s. So if I could create an indicator variable with the value of 1, 2, or 3 that would work as well. If it would help I can attach text file of one dataset (there is no personally identifying information in the dataset), I am not sure if I can or how to do that, but I would be more than happy to do that.

Thanks,

Zach

David Marso

Re: Help with complex coding issue

Administrator

Very simple with a LAG function ;-)

DO IF $CASENUM EQ 1.
+ IF Indicator=5 GROUP=1.
+ IF Indicator=0 GROUP=0.
ELSE.
+ IF LAG(Indicator) EQ 5 OR Indicator EQ 0 Group=LAG(group).
+ IF LAG(Indicator) EQ 0 AND Indicator EQ 5 Group=LAG(Group)+1.
END IF.
EXECUTE.

hohmanz wrote

I will probably then create an average for each of the ch1 ch2 and ch3 variables that will be merged into another wide dataset I have. Though it is possible that I will not be computing the average and I might run a HLM analysis, which is why I want them in that format. Another option to creating three data sets would be able to identify if the case is in the first, second, or third section of 5s. So if I could create an indicator variable with the value of 1, 2, or 3 that would work as well. If it would help I can attach text file of one dataset (there is no personally identifying information in the dataset), I am not sure if I can or how to do that, but I would be more than happy to do that.

Thanks,

Zach

David Marso

Re: Help with complex coding issue

Administrator

Of course that can be expressed in one line of code as well ;-)

COMPUTE Group= SUM(LAG(Group),
($CASENUM EQ 1) AND Indicator EQ 5,
LAG(Indicator) EQ 0 AND Indicator EQ 5) .

*Alternatively*

COMPUTE Group= SUM(LAG(Group),SUM($CASENUM EQ 1, LAG(Indicator) EQ 0) AND Indicator EQ 5).

David Marso wrote

Very simple with a LAG function ;-)

DO IF $CASENUM EQ 1.
+ IF Indicator=5 GROUP=1.
+ IF Indicator=0 GROUP=0.
ELSE.
+ IF LAG(Indicator) EQ 5 OR Indicator EQ 0 Group=LAG(group).
+ IF LAG(Indicator) EQ 0 AND Indicator EQ 5 Group=LAG(Group)+1.
END IF.
EXECUTE.

hohmanz wrote

I will probably then create an average for each of the ch1 ch2 and ch3 variables that will be merged into another wide dataset I have. Though it is possible that I will not be computing the average and I might run a HLM analysis, which is why I want them in that format. Another option to creating three data sets would be able to identify if the case is in the first, second, or third section of 5s. So if I could create an indicator variable with the value of 1, 2, or 3 that would work as well. If it would help I can attach text file of one dataset (there is no personally identifying information in the dataset), I am not sure if I can or how to do that, but I would be more than happy to do that.

Thanks,

Zach

hohmanz

Re: Help with complex coding issue

excellent. Thanks this will do. I appreciate everyones help!