Question on extracting the variables I need

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on extracting the variables I need

Snider-Lotz, Tom-2
Hello, all --

I've got a data file of item-level information from the administration of a computer-adaptive test to a number of test takers.  Each case represents one test taker, and includes information from 170 test items.  Each test taker sees a maximum of 30 items, and could take those items in any order.  The database contains 10 variables for each test item, and I am only interested in two of them:  the variable that indicates what order within the test taker's test the item was given (call it SEQ) and another variable (call it X).  In other words, I have 1700 labeled columns, but no more than 300 contain data for any given test taker.  Out of the 300, there are only 60 that are of interest.  SEQ and X are not consecutive variables in the database.

So (after a few columns of demographic info) each record has the structure:

X1, A1, B1, C1, D1, SEQ1, E1, F1, G1, H1, X2, A2, B2, C2, D2, SEQ2, E2, F2, G2, H2, X3, ...

where the A, B, C, D, etc are the variables that are *not* of interest.

For each test taker I need to create a variable (call it Y) equal to the value of X for the last item the test taker took (i.e., the item with the maximum value of SEQ).

I'd appreciate any suggestions on how to tackle this one.  I suspect that looping and maybe vectors are involved, but I can't seem to get a foothold on it.

Thanks!

 -- Tom


________________________________

Thomas G. Snider-Lotz, Ph.D.
Principal Scientist
PreVisor
1805 Old Alabama Road, Suite 150
Roswell, GA 30076
T:678.832.0555 F:770.642.6115
www.previsor.com
[hidden email] <https://atlmail.previsor.com/signatures/tsnider-lotz.jpg>     <http://www.previsor.com/signature>
DISCLAIMER:
This communication, along with any attachment(s), is intended only for the use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of any information contained in or attached to this communication is strictly prohibited. If you have received this message in error, please notify the sender immediately and destroy the original communication and its attachments without reading, printing or saving in any manner. Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

R: Question on extracting the variables I need

Luca Meyer-3
Hello Tom,

This syntax will yeld a variable, called ORDER, which specifies which SEQ
variable contains the larger value and the corresponding X variable. The
syntax assumes the name variables you provided in your example and does not
compute Y for which I think it is necessary a programmatic routine which I
do not have time now to build & test.

/* TESTED SPSS 15 - it assumes you have a directory called C:\temp\ */

/* first some sample data */
DATA LIST FREE /TESTID X1 A1 B1 SEQ1 C1 D1 X2 A2 B2 SEQ2 C2 D2 X3 A3 B3 SEQ3
C3 D3.
BEGIN DATA
1 100 20 20 1 20 20 200 40 40 2 40 40 300 60 60 3 60 60
2 200 20 20 2 20 20 300 40 40 3 40 40 100 60 60 1 60 60
END DATA.
SAVE OUTFILE "C:\TEMP\DATA.SAV".

/* then traspose the data and eliminate the variables I do not use */
VARSTOCASES  /MAKE SEQ FROM SEQ1 SEQ2 SEQ3 /* NOTE: this needs to be
manually extended uo to SEQ170 */
 /KEEP =  TESTID SEQ1 SEQ2 SEQ3 /* NOTE: this needs to be manually extended
uo to SEQ170 */
 /NULL = KEEP.

/* I associate an order sequence to the new file */
IF $CASENUM=1 ORDER=1.
DO IF $CASENUM>1.
IF LAG(TESTID)=TESTID ORDER=LAG(ORDER)+1.
IF LAG(TESTID)<>TESTID ORDER=1.
END IF.
SORT CASES BY TESTID (A) SEQ (D).

/* I select the max value for the sequence */
IF $CASENUM=1 SEL=1.
DO IF $CASENUM>1.
IF LAG(TESTID)<>TESTID SEL=1.
END IF.
SELECT IF SEL=1.
EXE.

/* get rid of some var i do not need and save temporarily the order file */
DEL VAR SEQ SEL.
DATASET NAME ORDER.

/* now I match the original file with the order file */
GET FILE "C:\TEMP\DATA.SAV".
SORT CASES BY TESTID.
MATCH FILES /FILE=* /FILE=ORDER.
EXE.
DATASET CLOSE ORDER.

/* finally I compute the output variables */
STRING SEQ_MAX (A6) X_MAX(A4).
COMPUTE SEQ_MAX=CONCAT("SEQ",LTRIM(STRING(ORDER,F3.0))).
COMPUTE X_MAX=CONCAT("X",LTRIM(STRING(ORDER,F3.0))).
EXE.

HTH,
Luca


-----Messaggio originale-----
Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di
Snider-Lotz, Tom
Inviato: martedì 5 febbraio 2008 0.55
A: [hidden email]
Oggetto: Question on extracting the variables I need

Hello, all --

I've got a data file of item-level information from the administration of a
computer-adaptive test to a number of test takers.  Each case represents one
test taker, and includes information from 170 test items.  Each test taker
sees a maximum of 30 items, and could take those items in any order.  The
database contains 10 variables for each test item, and I am only interested
in two of them:  the variable that indicates what order within the test
taker's test the item was given (call it SEQ) and another variable (call it
X).  In other words, I have 1700 labeled columns, but no more than 300
contain data for any given test taker.  Out of the 300, there are only 60
that are of interest.  SEQ and X are not consecutive variables in the
database.

So (after a few columns of demographic info) each record has the structure:

X1, A1, B1, C1, D1, SEQ1, E1, F1, G1, H1, X2, A2, B2, C2, D2, SEQ2, E2, F2,
G2, H2, X3, ...

where the A, B, C, D, etc are the variables that are *not* of interest.

For each test taker I need to create a variable (call it Y) equal to the
value of X for the last item the test taker took (i.e., the item with the
maximum value of SEQ).

I'd appreciate any suggestions on how to tackle this one.  I suspect that
looping and maybe vectors are involved, but I can't seem to get a foothold
on it.

Thanks!

 -- Tom


________________________________

Thomas G. Snider-Lotz, Ph.D.
Principal Scientist
PreVisor
1805 Old Alabama Road, Suite 150
Roswell, GA 30076
T:678.832.0555 F:770.642.6115
www.previsor.com
[hidden email]
<https://atlmail.previsor.com/signatures/tsnider-lotz.jpg>
<http://www.previsor.com/signature>
DISCLAIMER:
This communication, along with any attachment(s), is intended only for the
use of the addressee(s) and may contain proprietary, confidential or
privileged information. If you are not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of any
information contained in or attached to this communication is strictly
prohibited. If you have received this message in error, please notify the
sender immediately and destroy the original communication and its
attachments without reading, printing or saving in any manner. Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.19/1257 - Release Date: 03/02/2008
17.49


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.19/1257 - Release Date: 03/02/2008
17.49

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD