How to remove extraneous value labels. Keep only that actuallu occur in data.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to remove extraneous value labels. Keep only that actuallu occur in data.

Art Kendall
I am posting this question in case somebody has already worked this out.

I have variables with way too many value labels.  I received string
variables with mixed spellings, capitalizations, and even languages. I
AUTORECODEd and then RECODEd to collapse categories. So now there are labels
for values that do not occur in the data.

One kludge way to clean then up would be be to do
FREQUENCIES MISSING= INCLUDE.
Then highlight-copy-and-paste labels from the output file  into a  syntax
file.
Then edit that syntax file as a new set of value labels

Any suggestions?



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Jon Peck
I think if you AUTORECODE again, the unused labels in the new variable would disappear.  Otherwise, this would take a small amount of Python code.  For CTABLE use, though, you can choose to exclude empty categories,so those phantom labels would never appear.

On Fri, Oct 2, 2020 at 2:01 PM Art Kendall <[hidden email]> wrote:
I am posting this question in case somebody has already worked this out.

I have variables with way too many value labels.  I received string
variables with mixed spellings, capitalizations, and even languages. I
AUTORECODEd and then RECODEd to collapse categories. So now there are labels
for values that do not occur in the data.

One kludge way to clean then up would be be to do
FREQUENCIES MISSING= INCLUDE.
Then highlight-copy-and-paste labels from the output file  into a  syntax
file.
Then edit that syntax file as a new set of value labels

Any suggestions?



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Kathy Reinig

STRING labelvar (A120).
COMPUTE labelvar=VALUELABEL(var1). 

I would think that if you turned it back into a string, then autorecode that string you would be where you want to be.

Kathy Reinig
KJ Reinig Associates

On Oct 2, 2020, at 4:40 PM, Jon Peck <[hidden email]> wrote:

I think if you AUTORECODE again, the unused labels in the new variable would disappear.  Otherwise, this would take a small amount of Python code.  For CTABLE use, though, you can choose to exclude empty categories,so those phantom labels would never appear.

On Fri, Oct 2, 2020 at 2:01 PM Art Kendall <[hidden email]> wrote:
I am posting this question in case somebody has already worked this out.

I have variables with way too many value labels.  I received string
variables with mixed spellings, capitalizations, and even languages. I
AUTORECODEd and then RECODEd to collapse categories. So now there are labels
for values that do not occur in the data.

One kludge way to clean then up would be be to do
FREQUENCIES MISSING= INCLUDE.
Then highlight-copy-and-paste labels from the output file  into a  syntax
file.
Then edit that syntax file as a new set of value labels

Any suggestions?



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Jon Peck
That assumes that all values are labelled,  The VALUELABEL function returns an empty string if the value does not have a value label.  (There is a Python function in the spssaux2.py module that can find unlabelled values.)

On Sat, Oct 3, 2020 at 7:44 PM Kathy Reinig <[hidden email]> wrote:

STRING labelvar (A120).
COMPUTE labelvar=VALUELABEL(var1). 

I would think that if you turned it back into a string, then autorecode that string you would be where you want to be.

Kathy Reinig
KJ Reinig Associates

On Oct 2, 2020, at 4:40 PM, Jon Peck <[hidden email]> wrote:

I think if you AUTORECODE again, the unused labels in the new variable would disappear.  Otherwise, this would take a small amount of Python code.  For CTABLE use, though, you can choose to exclude empty categories,so those phantom labels would never appear.

On Fri, Oct 2, 2020 at 2:01 PM Art Kendall <[hidden email]> wrote:
I am posting this question in case somebody has already worked this out.

I have variables with way too many value labels.  I received string
variables with mixed spellings, capitalizations, and even languages. I
AUTORECODEd and then RECODEd to collapse categories. So now there are labels
for values that do not occur in the data.

One kludge way to clean then up would be be to do
FREQUENCIES MISSING= INCLUDE.
Then highlight-copy-and-paste labels from the output file  into a  syntax
file.
Then edit that syntax file as a new set of value labels

Any suggestions?



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Art Kendall
Thank you for the feedback.

It would be a good tool for quality assurance in data definition to be able
to get a warning if say a variable had 25 or 50 values and all/some were
unlabelled.

I have worked out a better formulations of what I need to clean up the value
labels. (Boy do I wish these people had input from experienced researchers
before they created their dataset.)
(The NGO will often not have SPSS available when it uses the reorganized
data.)

What I am trying to do is look at the* valid *values that occur across all
the countries and repeats.  A lot of work went into having the same values
indicate the reasons data is missing across all the variables.
Fortunately there are no valid negative values in the data.
Some variables could have the values in alphabetical order.
Ordinal variables need to have the labels in a particular order.

I have gone ahead with a manual kludge approach, -- accumulate a syntax file
this way:

save an extra copy of the dataset as is.
open spss and open the data set.
DATASET COPY to get a working copy.
open a new syntax file
go to the variables view of the working copy.
    right click a variable name
    click <descriptive statistics>
    go to the output window
    copy-and-paste the variable name,  old values and labels into the syntax
file
    edit the pasted text into RECODE and VALUE LABELS
   highlight and run those two commands
   go to the dataset window
    right click the variable name again
    click <descriptive statistics>
    go to the output window
   eyeball the results
   If okay delete the contents of the output window,
       else  delete the contents of the output window,
       DATASET CLOSE the working dataset and DATASET COPY .
-------------
A suggestion to IBM/SPSS

Many programs have the ability  to present a list and allow the user to
slide rows up and down.
it would be great if SPSS has a tool with 3 or 4 columns
-- sequence number which stays in place
-- possiblly a column with the old value
-- the label
-- column the contained 'valid' or 'missing"

The resulting RECODE, VALUE LABELS, and MISSING VALUES syntax would then be
pasted.


 




-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Jon Peck
You might find the Data > Define Variable Properties dialog helpful.  It's easy to see unlabelled values and inconsistent labels for a group of variables, and it can copy labels to or from other variables.

Here is an example of finding unlabelled values.  The function automatically excludes variables that have no value labels.  Here I have specified to check all the categorical variables.  You could alternatively
give it a specific list of variables or other selection specifications.

begin program.
import spssaux, spssaux2
    
varstocheck = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])
unlabelled = spssaux2.FindUnlabelledValues(varstocheck)
for k, v in unlabelled.items():
    print k, v
end program.

The output might look like this (from the employee data.sav file after tinkering with some values).
jobcat [14.0] 
educ [33.0, 34.0] 
minority [7.0] 
gender []

On Sun, Oct 4, 2020 at 11:17 AM Art Kendall <[hidden email]> wrote:
Thank you for the feedback.

It would be a good tool for quality assurance in data definition to be able
to get a warning if say a variable had 25 or 50 values and all/some were
unlabelled.

I have worked out a better formulations of what I need to clean up the value
labels. (Boy do I wish these people had input from experienced researchers
before they created their dataset.)
(The NGO will often not have SPSS available when it uses the reorganized
data.)

What I am trying to do is look at the* valid *values that occur across all
the countries and repeats.  A lot of work went into having the same values
indicate the reasons data is missing across all the variables.
Fortunately there are no valid negative values in the data.
Some variables could have the values in alphabetical order.
Ordinal variables need to have the labels in a particular order.

I have gone ahead with a manual kludge approach, -- accumulate a syntax file
this way:

save an extra copy of the dataset as is.
open spss and open the data set.
DATASET COPY to get a working copy.
open a new syntax file
go to the variables view of the working copy.
    right click a variable name
    click <descriptive statistics>
    go to the output window
    copy-and-paste the variable name,  old values and labels into the syntax
file
    edit the pasted text into RECODE and VALUE LABELS
   highlight and run those two commands
   go to the dataset window
    right click the variable name again
    click <descriptive statistics>
    go to the output window
   eyeball the results
   If okay delete the contents of the output window,
       else  delete the contents of the output window,
       DATASET CLOSE the working dataset and DATASET COPY .
-------------
A suggestion to IBM/SPSS

Many programs have the ability  to present a list and allow the user to
slide rows up and down.
it would be great if SPSS has a tool with 3 or 4 columns
-- sequence number which stays in place
-- possiblly a column with the old value
-- the label
-- column the contained 'valid' or 'missing"

The resulting RECODE, VALUE LABELS, and MISSING VALUES syntax would then be
pasted.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to remove extraneous value labels. Keep only that actuallu occur in data.

Art Kendall
for later version of SPSS

begin program python3.
import spssaux, spssaux2
   
varstocheck = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])
unlabelled = spssaux2.FindUnlabelledValues(varstocheck)
for k, v in unlabelled.items():
    print (k, v)
end program.




-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants