Finding strings in value labels (in numeric vars)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Finding strings in value labels (in numeric vars)

John F Hall

I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 

 

count nines = popband to newjob3

payment to gor

(-9).

 

Error # 4574.  Command name: count

The COUNT command includes a list of counted variables in which some have

numeric values and some have string values.  All the counted variables must be

of the same type.

Execution of this command stops.

 

I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 

 

There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

spss.giesel@yahoo.de
Hi, John,
  In a first step I'd look for the types via Python script (spss.GetVariableType()) and find all string variables that meet your requirements.
Another more pragmatic way without Python might be sorting the variables by type via
SORT VARIABLES. But I don't know how long it takes with 10 thousand variables.

Second step would be conversion via
ALTER TYPE ...

Good luck,
  Mario


John F Hall <[hidden email]> schrieb am 11:01 Freitag, 20.Mai 2016:


I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 
 
count nines = popband to newjob3
payment to gor
(-9).
 
Error # 4574.  Command name: count
The COUNT command includes a list of counted variables in which some have
numeric values and some have string values.  All the counted variables must be
of the same type.
Execution of this command stops.
 
I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 
 
There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?
 
John F Hall (Mr)
[Retired academic survey researcher]
 
Email:   [hidden email] 
 
 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

John F Hall

Mario

 

There’s only one String variable in the whole file.  My problem is to find strings in the value labels for the other 10,755 variables, which are all Numeric.  How they got there in the first place, I have no idea.  For all I know, they may be the result of automatic archiving software rather than insufficient skill in SPSS of the people who wrote the original setup files.  My Python skills are zero: I leave all that to Jon Peck (although I have been able to tweak his code from time to time.  It will be quicker for me to run progressive 50/50 splits in the variable lists.  I just hope that future generations of teachers, students and researchers will appreciate the result of my efforts, a clean, ready to use  cumulative “mother” file for all 32 waves of British Social Attitudes (1983 – 2014).

 

John

 

From: Mario Giesel [mailto:[hidden email]]
Sent: 20 May 2016 11:53
To: John F Hall <[hidden email]>; [hidden email]
Subject: Re: Finding strings in value labels (in numeric vars)

 

Hi, John,

  In a first step I'd look for the types via Python script (spss.GetVariableType()) and find all string variables that meet your requirements.

Another more pragmatic way without Python might be sorting the variables by type via

SORT VARIABLES. But I don't know how long it takes with 10 thousand variables.



Second step would be conversion via

ALTER TYPE ...

 

Good luck,

  Mario

 

John F Hall <[hidden email]> schrieb am 11:01 Freitag, 20.Mai 2016:

 

I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 

 

count nines = popband to newjob3

payment to gor

(-9).

 

Error # 4574.  Command name: count

The COUNT command includes a list of counted variables in which some have

numeric values and some have string values.  All the counted variables must be

of the same type.

Execution of this command stops.

 

I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 

 

There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

spss.giesel@yahoo.de
Just to be clear:
COUNT counts values - not value labels.
So if a value is string then the respective variable must be of type string imho.
Your error message indicates that there are numeric and string variables mixed in your COUNT variable list:
All the counted variables must be
of the same type.

Therefore I assume you have to alter the type of at least one variable to get the command working.
I probably did not get your point so far - maybe you deliver a concrete data listing that exemplifies your problem?

Mario

John F Hall <[hidden email]> schrieb am 13:10 Freitag, 20.Mai 2016:


Mario
 
There’s only one String variable in the whole file.  My problem is to find strings in the value labels for the other 10,755 variables, which are all Numeric.  How they got there in the first place, I have no idea.  For all I know, they may be the result of automatic archiving software rather than insufficient skill in SPSS of the people who wrote the original setup files.  My Python skills are zero: I leave all that to Jon Peck (although I have been able to tweak his code from time to time.  It will be quicker for me to run progressive 50/50 splits in the variable lists.  I just hope that future generations of teachers, students and researchers will appreciate the result of my efforts, a clean, ready to use  cumulative “mother” file for all 32 waves of British Social Attitudes (1983 – 2014).
 
John
 
From: Mario Giesel [mailto:[hidden email]]
Sent: 20 May 2016 11:53
To: John F Hall <[hidden email]>; [hidden email]
Subject: Re: Finding strings in value labels (in numeric vars)
 
Hi, John,
  In a first step I'd look for the types via Python script (spss.GetVariableType()) and find all string variables that meet your requirements.
Another more pragmatic way without Python might be sorting the variables by type via
SORT VARIABLES. But I don't know how long it takes with 10 thousand variables.


Second step would be conversion via

ALTER TYPE ...
 
Good luck,
  Mario

 
John F Hall <[hidden email]> schrieb am 11:01 Freitag, 20.Mai 2016:
 
I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 
 
count nines = popband to newjob3
payment to gor
(-9).
 
Error # 4574.  Command name: count
The COUNT command includes a list of counted variables in which some have
numeric values and some have string values.  All the counted variables must be
of the same type.
Execution of this command stops.
 
I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 
 
There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?
 
John F Hall (Mr)
[Retired academic survey researcher]
 
Email:   [hidden email] 
 
 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

Jon Peck
In reply to this post by John F Hall
I don't think count looks at value labels.  There must be an unsuspected string variable in that list.  Use SPSSINC SELECT VARIABLES (on Utilities) to define a macro including all or selected variables by type.

On Friday, May 20, 2016, John F Hall <[hidden email]> wrote:

I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 

 

count nines = popband to newjob3

payment to gor

(-9).

 

Error # 4574.  Command name: count

The COUNT command includes a list of counted variables in which some have

numeric values and some have string values.  All the counted variables must be

of the same type.

Execution of this command stops.

 

I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 

 

There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;johnfhall@orange.fr&#39;);" target="_blank">johnfhall@... 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;LISTSERV@LISTSERV.UGA.EDU&#39;);" target="_blank">LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

Maguin, Eugene
In reply to this post by John F Hall

John, I’m having difficulty visualizing what you are describing. Actually I can’t visualize it. Perhaps I’m wrong but the value labels are string/text. The values that the text-string label are numeric. I’d like to see an example of what you encountering. Gene Maguin

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John F Hall
Sent: Friday, May 20, 2016 5:00 AM
To: [hidden email]
Subject: Finding strings in value labels (in numeric vars)

 

I am editing a massive data set created by someone else.  When using count to look for negative values in numeric variables I am encountering errors because some of the value labels for these variables are strings. 

 

count nines = popband to newjob3

payment to gor

(-9).

 

Error # 4574.  Command name: count

The COUNT command includes a list of counted variables in which some have

numeric values and some have string values.  All the counted variables must be

of the same type.

Execution of this command stops.

 

I have identified two such variables because “ZZZZ” appeared in the Missing column of the Data Editor, but only one of them is declared as String (not easy to spot when eyeballing 10,756 variables). 

 

There are 3322 variables implied in this count command, and several thousand in other lists.  Short of splitting the lists and finding the culprit(s) by systematic elimination,  does anyone know of a quick way to search for (complete, not partial) string values in value labels if the actual strings are unknown?

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding strings in value labels (in numeric vars)

John F Hall
In reply to this post by Jon Peck

Jon Peck’s suggestion was a neat trick.  Just tried that on all 10,756 variables.  It came up with more string vars than I found by eyeballing.

 

SPSSINC SELECT VARIABLES

MACRONAME="!findstringvars"

VARIABLES=year to  dhaname

/PROPERTIES TYPE=STRING LEVEL=NOMINAL ORDINAL SCALE 

ROLE=ANY

/OPTIONS ORDER=ALPHA REVERSE=NO  PRINT=YES IFNONE=ERROR SEPARATOR=" ".

 

In total 32 strings were found (most of them for administrative and sampling data)

 

disp dic /var

ConName1 LACode ccname cendco95 cendcoth censudha censusdc

conname consname council councode dcldrnam dhaname lad locauth

newdc nwarno95 nwarnoth nwname postcode ptycontr sector

sqdate2a sqdate2b ward wardname wardno95 wardnoth

wardo95 wardoth wardpaf xnamdcbc.

 

. . shows that the print and write formats for those variables are all alpha.  The alpha formats derive from my early attempts at reconciling incompatible formats for the same variables when I first had a shot at creating the cumulative file, but I wish whoever wrote the original setups had stuck to using the same type and format for each variable regardless of wave. 

 

Only two of these variables have strings as missing values:

 

Councode            “z”         “missing”

“-2”       Skp,Eng+Wales

Ward                    “zzzz”    “missing”

. . but there are dozens of alpha values in the other variables which need to be declared as missing.

 

I’ve listed the string variables in file order, and will then use the file positions to list numeric vars in blocks, skipping the strings.  Sounds horrendously clumsy and tedious, but it’s quicker for me.  Also I can use the blocks in other jobs.  I thought about using autorecode but that could raise more problems at this stage.

 

Due at a wedding reception in two hours, so desperately doing more detective work to find variables with remaining problems.  With any luck I’ll be finished by Christmas when my commentaries can be read out as horror stories in front of the Yuletide log.

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop

 

 

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: 20 May 2016 13:36
To: [hidden email]
Subject: Re: Finding strings in value labels (in numeric vars)

 

I don't think count looks at value labels.  There must be an unsuspected string variable in that list.  Use SPSSINC SELECT VARIABLES (on Utilities) to define a macro including all or selected variables by type.


command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD