Scale Development

The development of an objective and reliable method of measuring the magnitude of various psychological dimensions from natural language was motivated by the recognition that diagnosticians and therapists use speech as the major source of information. In doing so, they assess how and what is said in an impressionistic manner that allows for a relatively high likelihood of distortion and/or error from potentially incorrect empathic responses and inferences during the process of evaluating the subject's talk. How to minimize such error variance and how to maximize the uniformity and consistency of the inferential evaluations concerning the speaker's subjective experience and the relative magnitude of these psychological states and conflicts became a compelling aim. In the process of probing the emotional reactions of subjects or patients, an effort was made to minimize reactions of guarding or covering. Hence the instructions to elicit speech from subject were purposely relatively ambiguous and non-structured. Speakers were asked to tell about personal or dramatic life experiences. From such standardized instructions it was found possible to compare individuals in a standard context so that demographic and personality variables could be explored and investigated, while holding relatively constant the influence of such variables as the instructions for eliciting speech, the nature and personality of the interviewer, the context, and the situation. The effects of varying these non-interviewee variables have been subsequently investigated, one by one, after reliable and valid content analysis scales were developed.

Here, we focus on a content analysis method originally developed by Gottschalk and Gleser (1969). This technique is capable of measuring the magnitude of any mental or emotional state or trait that can be clearly defined and categorized. It uses not simply words in isolation to classify content, but rather identifies relationships and attitudes reported by the subject. For example, a clause would be marked for death anxiety if it makes "references to death, dying, threat of death, or anxiety about death." (Gottschalk and Gleser, 1969, p. 23) Further, the score would be refined, depending on whether the death reference applied to the speaker, animate others, inanimate objects (destruction rather than death in this case), or if the reference was a denial of death anxiety.

The development of the Gottschalk-Gleser method of content analysis has involved a long series of steps.

It has required that the psychological dimensions to be measured (for example, anxiety, hostility outward, hostility inward, cognitive and intellectual impairment, social alienation-personal disorganization, depression, and hope) be precisely defined,
that the lexical cues be carefully pinpointed by which a receiver of any verbal messages infers the occurrence of any of these psychological states,
and the linguistic, principally syntactic, cues conveying intensity (for example, the word "very" in the proper context) be specified.
Next differential weights were assigned to these semantic and linguistic cues conveying the magnitude of a subjective experience whenever appropriate.
Furthermore, a systematic means had to be arrived at for correcting for the number of words spoken per unit of time so that one individual could be compared to himself on different occasions or to others with regards to the magnitude of any particular psychological state.
A series of weighted thematic categories had to be specified for every psychological dimension to be measured and
research technicians were trained to score these typescripts of human speech according any one scale an at inter-scorer reliability of 0.80 or above.
Moreover, a set of construct-validation studies had to be carried out to recheck exactly what each content analysis scale measured, and these validation studies have included the use of four kinds of criterion measures: psychological, physiological, pharmacological, and biochemical.
On the basis of these construct-validation studies, changes have been made in the content categories and their assigned weights of each specific scale, in the direction of maximizing the correlations between the content analysis scores with these various independent criterion measures.

This work, conducted empirically over many years, has resulted in scales that measure the following psychologically interesting states or traits:

Anxiety (including Death, Mutilation, Separation, Guilt, Shame, and Diffuse Anxiety subscales),
Hostility Outward (including Overt Hostility, Covert Hostility, and Total Hostility Outward subscales),
Hostility Inward,
Ambivalent Hostility (hostility originating externally and directed towards the self),
Social Alienation-Personal Disorganization,
Cognitive Impairment,
Hope,
Depression (with 7 subscales),
Health/Sickness (with health and sickness subscales),
Human Relations,
Achievement Strivings,
Dependency Strivings, and
Quality of Life.

Brief definition of scales

The theoretical framework from which this measurement approach has been developed has been an eclectic one and has included behavioral and conditioning theory, psychoanalytic clinic theory, and linguistic theory. In addition, the formulation of these psychological states has been deeply influenced by the position that they all have biologic roots. Both the definition of each separate psychological scale and the selection of the specific verbal content items used as cues for inferring each dimension have been influenced by the decision that whatever psychological dimension is measured by this content analysis approach should, whenever possible, be associated with some biologic characteristic of the individual in addition to psychological aspect or some social situation.

The content analysis technician applying this procedure to typescripts of tape-recorded speech has not had to worry about approaching the work of the content analysis following one theoretical orientation or another. Rather, the technician follows a strictly empirical approach, scoring the occurrence of any content or themes in each grammatical clause of speech according to a set of various, well-delineated language categories making up each of the separate verbal behavior scales. Two manuals (Gottschalk, Winget, Gleser, 1969; Gottschalk, 1982) and a book (Gottschalk, 1995) are available as well as journal articles (Gottschalk, 1975; Gottschalk and Hoigaard-Martin, 1986) which indicate what verbal categories should be looked for and how much the occurrence of each one is to be weighted. Following initial coding of content in this way, the content analysis technician, then, follows prescribed mathematical calculations leading up to a final score for the magnitude of any one psychological dimension or another.

While the utility of these Scales has been demonstrated repeatedly through decades of research, widespread everyday use of content analysis of verbal behavior for research and clinical practice has been hampered by the relatively high training and performance requirements associated with the manual application of the technique. For example, Gottschalk and Gleser [1969] recommend an inter-coder reliability coefficient of 0.80 or better with the scoring of qualified experts in the use of these content analysis Scales. To achieve this level of familiarity and skill in coding these Scales requires some practice with previously published and unpublished examples of scoring these content analysis Scales and continual monitoring of trained scorers. Manual scoring is also not a particularly quick process, requiring not only trained content judgments, but also extensive post-processing of scores to prepare Scale-based summaries and analyses.

Brief digest of known uses

Summary of reliability studies

Summary of validity studies

Development of scoring norms

Many individuals, mostly researchers, have achieved an acceptable level of proficiency coding the content and form analysis of verbal behavior, specifically, scoring content analysis scales based on the Gottschalk-Gleser content analysis method, and they have published excellent work involving them. Some investigators or clinicians, however, have not wanted to take the time or acquire the expertise to use these content analysis scales reliably. Extending the use of the method required addressing these issues of training and inter-scorer reliability. The most effective way means was by developing a computer program which could apply the method.

Computerizing the scoring system

Return to home page