SPSSX Discussion

obtaining the average number of consecutive responses of a true response

Classic

List

Threaded

8 messages Options

msherman

obtaining the average number of consecutive responses of a true response

Dear List: I have a data set with 375 items answered with either a true response or a false response. I want to be able to identify folks who are being careless or demonstrating insufficient effort by responding with the same response over time. In particular, I want to get the average number of runs of true for each individual. I have been able to obtain some syntax that captures the maximum number of consecutive runs of either of T or of F. I have done google search but nothing comes up close to getting me the average number of true runs. The syntax below captures max number of runs of either true or false. Is this possible.

VECTOR v = v1 to v375.

COMPUTE #run = 1.

COMPUTE maxrun = 1.

LOOP #i = 2 to 375.

DO IF v(#i) eq v(#i-1).

COMPUTE #run = #run + 1.

COMPUTE maxrun = max(maxrun, #run).

ELSE.

COMPUTE #run = 1.

END IF.

END LOOP.

EXECUTE.

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Master’s Education: Thesis Track

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

[hidden email]

410-617-2417 tel

410-617-5341 fax

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: obtaining the average number of consecutive responses of a true response

The usual Runs test, which I expect to be more robust in general, tests the number or

runs rather than the maximum length. You can obtain this by using FLIP to transpose

the data, and Nonparametric tests to test for each individual.

If you want to test the maximum length, as you propose, you can use the cdf of the

binomial function to get p-values. If the proportions are not always near 0.5, you

could compute the exact mean for that binomial.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Martin Sherman <[hidden email]>
Sent: Saturday, November 5, 2016 3:11 PM
To: [hidden email]
Subject: obtaining the average number of consecutive responses of a true response

VECTOR v = v1 to v375.

COMPUTE #run = 1.

COMPUTE maxrun = 1.

LOOP #i = 2 to 375.

DO IF v(#i) eq v(#i-1).

COMPUTE #run = #run + 1.

COMPUTE maxrun = max(maxrun, #run).

ELSE.

COMPUTE #run = 1.

END IF.

END LOOP.

EXECUTE.

Andy W

Re: obtaining the average number of consecutive responses of a true response

Good call Rich, I would use VARSTOCASES and SPLIT FILE though. Example below, plus an example heatmap to visualize the incorrect responses.

**************************************************.
*SIMULATING EXAMPLE DATA.
SET SEED 10.
INPUT PROGRAM.
LOOP Person = 1 TO 40.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

*Simulating 100 variables, random T/F.
*Person 4 has weird run at 70 to 100.
*Person 7 has weird run 40 to 65.
*Person 32 has weird run 1 to 10.
VECTOR A(100).
LOOP #i = 1 TO 100.
COMPUTE A(#i) = RV.BERNOULLI(0.80).
END LOOP.
DO IF Person = 4.
RECODE A70 TO A100 (ELSE = 0).
ELSE IF Person = 7.
RECODE A40 TO A65 (ELSE = 0).
ELSE IF Person = 32.
RECODE A1 TO A10 (ELSE = 0).
END IF.
EXECUTE.

*CONDUCTING THE ANALYSIS
*Now you would reshape the dataset, then split file.
VARSTOCASES /MAKE A FROM A1 TO A100 /INDEX AnswerNum.
SPLIT FILE BY Person.
NPAR TESTS /RUNS(0.5)=A.
SPLIT FILE OFF.
*May want to use OMS to make it easier to flag folks.

*EXTRA VIZ - HEATMAP OF RUNS.
FORMATS A (F1.0) Person (F2.0).
VALUE LABELS A 0 'False' 1 'Correct'.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=AnswerNum Person A MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
PAGE: begin(scale(600px,900px))
SOURCE: s=userSource(id("graphdataset"))
DATA: AnswerNum=col(source(s), name("AnswerNum"), unit.category())
DATA: Person=col(source(s), name("Person"), unit.category())
DATA: A=col(source(s), name("A"), unit.category())
GUIDE: axis(dim(1), null())
GUIDE: axis(dim(2), label("Individual Incorrect Answers"))
GUIDE: legend(aesthetic(aesthetic.color.interior), null())
SCALE: cat(dim(2), sort.statistic(summary.mean(A)), reverse())
SCALE: cat(aesthetic(aesthetic.color.interior), map(("0", color.black),("1",color.white)))
ELEMENT: polygon(position(AnswerNum*Person), color.interior(A), color.exterior(color.white))
PAGE: end()
END GPL.
*This automatically sorts those with the most incorrect to the top of the graphic.
**************************************************.

I've written some code for a runs test for multiple groups, so could be an ok exploratory tool for multiple choice answers.

- http://stats.stackexchange.com/a/73170/1036
- code here, https://www.dropbox.com/sh/kr6qvukrw6xvue4/AABWSg-DAcoLoysqTKMyeRdNa

As is the FLIP file approach may be easier with that though, as that macro won't work with SPLIT FILE. (It could, but it would take alittle work.)

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Kirill Orlov

Re: obtaining the average number of consecutive responses of a true response

To the analysis and the graph suggested by Andy W above I might add a trick to highlight
chains (runs) of 1s of different lengths with my macro function /*!runs()*/ which operates in MATRIX session.
The highlighted dataset could then be plotted by GPL syntax similar to Andy's.

*Andy's simulated example dataset.
SET SEED 10.
INPUT PROGRAM.
LOOP Person = 1 TO 40.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
*Simulating 100 variables, random T/F.
*Person 4 has weird run at 70 to 100.
*Person 7 has weird run 40 to 65.
*Person 32 has weird run 1 to 10.
VECTOR A(100).
LOOP #i = 1 TO 100.
COMPUTE A(#i) = RV.BERNOULLI(0.80).
END LOOP.
DO IF Person = 4.
RECODE A70 TO A100 (ELSE = 0).
ELSE IF Person = 7.
RECODE A40 TO A65 (ELSE = 0).
ELSE IF Person = 32.
RECODE A1 TO A10 (ELSE = 0).
END IF.
EXECUTE.

*The code of the macro function, to read into memory.
*(you can find the function at http://www.spsstools.net/en/KO-spssmacros
*collection "Matrix - End Matrix fuctuions").
define !runs(!pos= !token(1) /!pos= !charend('%') /!pos= !charend('%') /!pos= !charend(')'))
comp !4= !2.
comp @maxw= !3.
loop @w= 2 to @maxw.
-comp @w_= @w-1.
-comp @a= !4(:,1:(ncol(!4)-@w_)).
-comp @b= @a.
-loop @i= 2 to @w.
- comp @b= @b and !4(:,@i:(ncol(!4)-@w+@i)).
-end loop.
-comp !4(:,1:(ncol(!4)-@w_))= @a+@b.
-loop @i= 1 to @w_.
- comp @a= !4(:,2:ncol(!4)).
- comp @b= !4(:,1:(ncol(!4)-1)).
- comp !4(:,2:ncol(!4))= @a+(@a=@w_)&*(@b=@w).
-end loop.
end loop.
release @maxw,@w,@w_,@a,@b,@i.
!enddefine.

*Run the highlighting.
set mxloops 10000.
matrix.
get data /vari= A1 to A100 /names= names.
!runs(data%5%runs). /*I set argument maxw here to 5
save runs /out= * /names= names.
end matrix.
*In this example with maxw=5 all chains (runs) of length 1 will be coded as 1,
*of length 2 will be coded as 2, ..., of length 5+ will be coded as 5.

06.11.2016 16:43, Andy W пишет:

Good call Rich, I would use VARSTOCASES and SPLIT FILE though. Example below,
plus an example heatmap to visualize the incorrect responses.

**************************************************.
*SIMULATING EXAMPLE DATA.
SET SEED 10.
INPUT PROGRAM.
LOOP Person = 1 TO 40.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.

*Simulating 100 variables, random T/F.
*Person 4 has weird run at 70 to 100.
*Person 7 has weird run 40 to 65.
*Person 32 has weird run 1 to 10.
VECTOR A(100).
LOOP #i = 1 TO 100.
  COMPUTE A(#i) = RV.BERNOULLI(0.80).
END LOOP.
DO IF Person = 4.
  RECODE A70 TO A100 (ELSE = 0).
ELSE IF Person = 7.
  RECODE A40 TO A65 (ELSE = 0).
ELSE IF Person = 32.
  RECODE A1 TO A10 (ELSE = 0).
END IF.
EXECUTE.

*CONDUCTING THE ANALYSIS
*Now you would reshape the dataset, then split file.
VARSTOCASES /MAKE A FROM A1 TO A100 /INDEX AnswerNum.
SPLIT FILE BY Person.
NPAR TESTS /RUNS(0.5)=A.
SPLIT FILE OFF.
*May want to use OMS to make it easier to flag folks.

*EXTRA VIZ - HEATMAP OF RUNS.
FORMATS A (F1.0) Person (F2.0).
VALUE LABELS A 0 'False' 1 'Correct'.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=AnswerNum Person A
MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  PAGE: begin(scale(600px,900px))
  SOURCE: s=userSource(id("graphdataset"))
  DATA: AnswerNum=col(source(s), name("AnswerNum"), unit.category())
  DATA: Person=col(source(s), name("Person"), unit.category())
  DATA: A=col(source(s), name("A"), unit.category())
  GUIDE: axis(dim(1), null())
  GUIDE: axis(dim(2), label("Individual Incorrect Answers"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), null())
  SCALE: cat(dim(2), sort.statistic(summary.mean(A)), reverse())
  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",
color.black),("1",color.white)))
  ELEMENT: polygon(position(AnswerNum*Person), color.interior(A),
color.exterior(color.white))
  PAGE: end()
END GPL.
*This automatically sorts those with the most incorrect to the top of the
graphic.

msherman

Re: obtaining the average number of consecutive responses of a true response

I ran the syntax below and changed the maxw to 100 and obtained a data file that contains the runs of true (here is one subject)

Here is the actual true and false (1 and 0)

.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .00 etc.

And here is the output

.00 1.00 .00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 .00

So this creates the number to trues for each run of true but now I want to obtain each participant’s average number of true runs

Just using the above the average would be (1 + 12)/2 = 6.5 averaged across the two runs of true

And now I want to do this across all 100 variables and get an average for each participant. The I will examine the distribution of all participants

To see which participants are outliers in regard to average number of runs per participant.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kirill Orlov
Sent: Sunday, November 06, 2016 9:35 AM
To: [hidden email]
Subject: Re: obtaining the average number of consecutive responses of a true response

06.11.2016 16:43, Andy W пишет:

Good call Rich, I would use VARSTOCASES and SPLIT FILE though. Example below,

plus an example heatmap to visualize the incorrect responses.

**************************************************.

*SIMULATING EXAMPLE DATA.

SET SEED 10.

INPUT PROGRAM.

LOOP Person = 1 TO 40.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

*Simulating 100 variables, random T/F.

*Person 4 has weird run at 70 to 100.

*Person 7 has weird run 40 to 65.

*Person 32 has weird run 1 to 10.

VECTOR A(100).

LOOP #i = 1 TO 100.

  COMPUTE A(#i) = RV.BERNOULLI(0.80).

END LOOP.

DO IF Person = 4.

  RECODE A70 TO A100 (ELSE = 0).

ELSE IF Person = 7.

  RECODE A40 TO A65 (ELSE = 0).

ELSE IF Person = 32.

  RECODE A1 TO A10 (ELSE = 0).

END IF.

EXECUTE.

*CONDUCTING THE ANALYSIS

*Now you would reshape the dataset, then split file.

VARSTOCASES /MAKE A FROM A1 TO A100 /INDEX AnswerNum.

SPLIT FILE BY Person.

NPAR TESTS /RUNS(0.5)=A.

SPLIT FILE OFF.

*May want to use OMS to make it easier to flag folks.

*EXTRA VIZ - HEATMAP OF RUNS.

FORMATS A (F1.0) Person (F2.0).

VALUE LABELS A 0 'False' 1 'Correct'.

GGRAPH

  /GRAPHDATASET NAME="graphdataset" VARIABLES=AnswerNum Person A

MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  PAGE: begin(scale(600px,900px))

  SOURCE: s=userSource(id("graphdataset"))

  DATA: AnswerNum=col(source(s), name("AnswerNum"), unit.category())

  DATA: Person=col(source(s), name("Person"), unit.category())

  DATA: A=col(source(s), name("A"), unit.category())

  GUIDE: axis(dim(1), null())

  GUIDE: axis(dim(2), label("Individual Incorrect Answers"))

  GUIDE: legend(aesthetic(aesthetic.color.interior), null())

  SCALE: cat(dim(2), sort.statistic(summary.mean(A)), reverse())

  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",

color.black),("1",color.white)))

  ELEMENT: polygon(position(AnswerNum*Person), color.interior(A),

color.exterior(color.white))

  PAGE: end()

END GPL.

*This automatically sorts those with the most incorrect to the top of the

graphic.

Maguin, Eugene

Re: obtaining the average number of consecutive responses of a true response

Martin,

Having not studied Kiril’s macro, I’m mystified by the results you got. You ran Andy’s example to create a dataset of 40 records of 100 variables and a ‘1’ was coded at probability .8. Passed it through the macro and got what? A record with 16 values for each of your 40 people and that for one person shows 12 strings of each of length 12—all that from a string of length 100 going in to the macro. Or, a record of the first 16 people with one value per person.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Sunday, November 06, 2016 12:10 PM
To: [hidden email]
Subject: Re: obtaining the average number of consecutive responses of a true response

I ran the syntax below and changed the maxw to 100 and obtained a data file that contains the runs of true (here is one subject)

Here is the actual true and false (1 and 0)

.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .00 etc.

And here is the output

.00 1.00 .00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 .00

So this creates the number to trues for each run of true but now I want to obtain each participant’s average number of true runs

Just using the above the average would be (1 + 12)/2 = 6.5 averaged across the two runs of true

And now I want to do this across all 100 variables and get an average for each participant. The I will examine the distribution of all participants

To see which participants are outliers in regard to average number of runs per participant.

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Kirill Orlov
Sent: Sunday, November 06, 2016 9:35 AM
To: [hidden email]
Subject: Re: obtaining the average number of consecutive responses of a true response

06.11.2016 16:43, Andy W пишет:

Good call Rich, I would use VARSTOCASES and SPLIT FILE though. Example below,

plus an example heatmap to visualize the incorrect responses.

**************************************************.

*SIMULATING EXAMPLE DATA.

SET SEED 10.

INPUT PROGRAM.

LOOP Person = 1 TO 40.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

*Simulating 100 variables, random T/F.

*Person 4 has weird run at 70 to 100.

*Person 7 has weird run 40 to 65.

*Person 32 has weird run 1 to 10.

VECTOR A(100).

LOOP #i = 1 TO 100.

  COMPUTE A(#i) = RV.BERNOULLI(0.80).

END LOOP.

DO IF Person = 4.

  RECODE A70 TO A100 (ELSE = 0).

ELSE IF Person = 7.

  RECODE A40 TO A65 (ELSE = 0).

ELSE IF Person = 32.

  RECODE A1 TO A10 (ELSE = 0).

END IF.

EXECUTE.

*CONDUCTING THE ANALYSIS

*Now you would reshape the dataset, then split file.

VARSTOCASES /MAKE A FROM A1 TO A100 /INDEX AnswerNum.

SPLIT FILE BY Person.

NPAR TESTS /RUNS(0.5)=A.

SPLIT FILE OFF.

*May want to use OMS to make it easier to flag folks.

*EXTRA VIZ - HEATMAP OF RUNS.

FORMATS A (F1.0) Person (F2.0).

VALUE LABELS A 0 'False' 1 'Correct'.

GGRAPH

  /GRAPHDATASET NAME="graphdataset" VARIABLES=AnswerNum Person A

MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  PAGE: begin(scale(600px,900px))

  SOURCE: s=userSource(id("graphdataset"))

  DATA: AnswerNum=col(source(s), name("AnswerNum"), unit.category())

  DATA: Person=col(source(s), name("Person"), unit.category())

  DATA: A=col(source(s), name("A"), unit.category())

  GUIDE: axis(dim(1), null())

  GUIDE: axis(dim(2), label("Individual Incorrect Answers"))

  GUIDE: legend(aesthetic(aesthetic.color.interior), null())

  SCALE: cat(dim(2), sort.statistic(summary.mean(A)), reverse())

  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",

color.black),("1",color.white)))

  ELEMENT: polygon(position(AnswerNum*Person), color.interior(A),

color.exterior(color.white))

  PAGE: end()

END GPL.

*This automatically sorts those with the most incorrect to the top of the

graphic.

msherman

Re: obtaining the average number of consecutive responses of a true response

Gene: My mistake. The syntax I ran is below. Creating the data used p = .50 and then changed maxw to 100%.

Data generated

ID v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 ……… v100

1.00 .00 .00 .00 .00 1.00 .00 1.00 1.00 1.00 .00 1.00 1.00 .00 .00 etc

2.00 1.00 1.00 .00 .00 1.00 .00 1.00 .00 1.00 1.00 .00 1.00 .00 1.00 etc

3.00 1.00 .00 1.00 .00 1.00 .00 .00 1.00 .00 .00 .00 .00 1.00 .00

4.00 .00 .00 .00 1.00 .00 1.00 1.00 .00 1.00 .00 1.00 1.00 1.00 .00

5.00 1.00 1.00 1.00 1.00 .00 .00 .00 .00 .00 1.00 1.00 1.00 .00 .00

Data where strings of True are created

1.00 .00 .00 .00 .00 1.00 .00 3.00 3.00 3.00 .00 2.00 2.00 .00 .00 etc

2.00 2.00 2.00 .00 .00 1.00 .00 1.00 .00 2.00 2.00 .00 1.00 .00 1.00

3.00 1.00 .00 1.00 .00 1.00 .00 .00 1.00 .00 .00 .00 .00 1.00 .00

4.00 .00 .00 .00 1.00 .00 2.00 2.00 .00 1.00 .00 3.00 3.00 3.00 .00

5.00 4.00 4.00 4.00 4.00 .00 .00 .00 .00 .00 3.00 3.00 3.00 .00 .00

Just using the above data the averages would be

1.00 (1 + 3 + 2)/3 = 2.00

2.00 (2 + 1 + 1 + 2 + 1 +1)/6 =1.33

3.00 (1 + 1+ 1 + 1 + 1)/ 1.00

4.00 (1 + 2 + 1 + 3)/4 = 1.75

5.0 (4 + 3)/2= 3.5

So at this point I need to count up the number times that 1 appeared plus the number of times that 2 appeared (but I need to address the fact that when 2 is presented it is presented 2 times for each string

of True and True) plus the number of times that 3 appeared (but I need to address the fact that when 3 appears it is presented 3 times for each string of True, True, and True), etc

for example-here are three strings of (True = 1 and False = 0).

1 0 1 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1

And now in terms of strings we would have

1 1 1 2 2 0 2 2 0 3 3 3 0 3 3 3

so if there are 3 ones (1) then the number I want 3X1 = 3

if there are 2 twos (2) then the number I want is 2X2 = 4

if there are 2 threes (3) then the number I want is 2X3 = 6

total = 13

13/7(distinct strings) = 1.86 average number to strings of trues

Unless my thinking and math is off.

* Encoding: UTF-8.

*Andy's simulated example dataset.

SET SEED 10.

INPUT PROGRAM.

LOOP Person = 1 TO 40.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

*Simulating 100 variables, random T/F.

*Person 4 has weird run at 70 to 100.

*Person 7 has weird run 40 to 65.

*Person 32 has weird run 1 to 10.

VECTOR A(100).

LOOP #i = 1 TO 100.

COMPUTE A(#i) = RV.BERNOULLI(0.50).

END LOOP.

DO IF Person = 4.

RECODE A70 TO A100 (ELSE = 0).

ELSE IF Person = 7.

RECODE A40 TO A65 (ELSE = 0).

ELSE IF Person = 32.

RECODE A1 TO A10 (ELSE = 0).

END IF.

EXECUTE.

*The code of the macro function, to read into memory.

*(you can find the function at http://www.spsstools.net/en/KO-spssmacros

*collection "Matrix - End Matrix fuctuions").

define !runs(!pos= !token(1) /!pos= !charend('%') /!pos= !charend('%') /!pos= !charend(')'))

comp !4= !2.

comp @maxw= !3.

loop @w= 2 to @maxw.

-comp @w_= @w-1.

-comp @a= !4(:,1:(ncol(!4)-@w_)).

-comp @b= @a.

-loop @i= 2 to @w.

- comp @b= @b and !4(:,@i:(ncol(!4)-@w+@i)).

-end loop.

-comp !4(:,1:(ncol(!4)-@w_))= @a+@b.

-loop @i= 1 to @w_.

- comp @a= !4(:,2:ncol(!4)).

- comp @b= !4(:,1:(ncol(!4)-1)).

- comp !4(:,2:ncol(!4))= @a+(@a=@w_)&*(@b=@w).

-end loop.

end loop.

release @maxw,@w,@w_,@a,@b,@i.

!enddefine.

*Run the highlighting.

set mxloops 10000.

matrix.

get data /vari= A1 to A100 /names= names.

!runs(data%100%runs). /*I set argument maxw here to 5 Note to MFS change this to 100 for 100 variables

save runs /out= * /names= names.

end matrix.

*In this example with maxw=5 all chains (runs) of length 1 will be coded as 1,

*of length 2 will be coded as 2, ..., of length 5+ will be coded as 5 but I changed it to 100%make is !runs(data%100%runs).

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maguin, Eugene
Sent: Monday, November 07, 2016 9:20 AM
To: [hidden email]
Subject: Re: obtaining the average number of consecutive responses of a true response

Martin,

Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martin Sherman
Sent: Sunday, November 06, 2016 12:10 PM
To: [hidden email]
Subject: Re: obtaining the average number of consecutive responses of a true response

I ran the syntax below and changed the maxw to 100 and obtained a data file that contains the runs of true (here is one subject)

Here is the actual true and false (1 and 0)

.00 1.00 .00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .00 etc.

And here is the output

.00 1.00 .00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 .00

So this creates the number to trues for each run of true but now I want to obtain each participant’s average number of true runs

Just using the above the average would be (1 + 12)/2 = 6.5 averaged across the two runs of true

And now I want to do this across all 100 variables and get an average for each participant. The I will examine the distribution of all participants

To see which participants are outliers in regard to average number of runs per participant.

06.11.2016 16:43, Andy W пишет:

Good call Rich, I would use VARSTOCASES and SPLIT FILE though. Example below,

plus an example heatmap to visualize the incorrect responses.

**************************************************.

*SIMULATING EXAMPLE DATA.

SET SEED 10.

INPUT PROGRAM.

LOOP Person = 1 TO 40.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

*Simulating 100 variables, random T/F.

*Person 4 has weird run at 70 to 100.

*Person 7 has weird run 40 to 65.

*Person 32 has weird run 1 to 10.

VECTOR A(100).

LOOP #i = 1 TO 100.

  COMPUTE A(#i) = RV.BERNOULLI(0.80).

END LOOP.

DO IF Person = 4.

  RECODE A70 TO A100 (ELSE = 0).

ELSE IF Person = 7.

  RECODE A40 TO A65 (ELSE = 0).

ELSE IF Person = 32.

  RECODE A1 TO A10 (ELSE = 0).

END IF.

EXECUTE.

*CONDUCTING THE ANALYSIS

*Now you would reshape the dataset, then split file.

VARSTOCASES /MAKE A FROM A1 TO A100 /INDEX AnswerNum.

SPLIT FILE BY Person.

NPAR TESTS /RUNS(0.5)=A.

SPLIT FILE OFF.

*May want to use OMS to make it easier to flag folks.

*EXTRA VIZ - HEATMAP OF RUNS.

FORMATS A (F1.0) Person (F2.0).

VALUE LABELS A 0 'False' 1 'Correct'.

GGRAPH

  /GRAPHDATASET NAME="graphdataset" VARIABLES=AnswerNum Person A

MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  PAGE: begin(scale(600px,900px))

  SOURCE: s=userSource(id("graphdataset"))

  DATA: AnswerNum=col(source(s), name("AnswerNum"), unit.category())

  DATA: Person=col(source(s), name("Person"), unit.category())

  DATA: A=col(source(s), name("A"), unit.category())

  GUIDE: axis(dim(1), null())

  GUIDE: axis(dim(2), label("Individual Incorrect Answers"))

  GUIDE: legend(aesthetic(aesthetic.color.interior), null())

  SCALE: cat(dim(2), sort.statistic(summary.mean(A)), reverse())

  SCALE: cat(aesthetic(aesthetic.color.interior), map(("0",

color.black),("1",color.white)))

  ELEMENT: polygon(position(AnswerNum*Person), color.interior(A),

color.exterior(color.white))

  PAGE: end()

END GPL.

*This automatically sorts those with the most incorrect to the top of the

graphic.

Kirill Orlov

Re: obtaining the average number of consecutive responses of a true response

In reply to this post by Maguin, Eugene

->Having not studied Kirill’s macro,

Eugene,
Why not open the description document from the link I gave, to read what that function does? It simply marks chains of ones with their length integer.
So that input row
0 1 0 1 1 1 0 0 1 1 1 1 0 0 0 1 1
becomes
0 1 0 3 3 3 0 0 4 4 4 4 0 0 0 2 2

I was hoping that that would be helpful for the OP in some way.

07.11.2016 17:20, Maguin, Eugene пишет:

Martin,

Having not studied Kiril’s macro, I’m mystified by the results you got. You ran Andy’s example to create a dataset of 40 records of 100 variables and a ‘1’ was coded at probability .8. Passed it through the macro and got what? A record with 16 values for each of your 40 people and that for one person shows 12 strings of each of length 12—all that from a string of length 100 going in to the macro. Or, a record of the first 16 people with one value per person.

Gene Maguin