What Is ABA?

The seven dimensions of ABA identified in your text and the Baer, Wolf, and Risley article, are the guiding principles for the field of Applied Behavior Analysis. As a developing professional, it is important to understand the dimensions because each one is relevant to the work of a behavior analyst.

For this assignment, refer to the following articles assigned in the study for this unit:

  • Krentz, Miltenberger, and Valbuena’s “Using Token Reinforcement to Increase Walking for Adults With Intellectual Disabilities.”
  • Marsic, Berman, Barry, and McCloskey’s “The Relationship Between Intentional Self-Injurious Behavior and the Loudness Dependence of Auditory Evoked Potential in Research Volunteers.”

Then complete the following:

  • Identify which of the articles is behavior analytic and which is not, and provide an explanation for your choice.
    • Be sure to identify which of the seven dimensions of ABA are present in the behavior analytic article.
    • Analyze why the other article is not behavior analytic. How do you know the seven dimensions are not present?

Assignment Requirements

  • Written communication: Should be free of errors that detract from the overall message.
  • APA formatting: References and citations are formatted according to current APA style guidelines.
  • Resources: Minimum of 1–2 scholarly or professional resources.
  • Length: 2–3 double-spaced pages, in addition to the title page and reference page.
  • Font and font size: Times New Roman, 12 point.

    The Relationship Between Intentional Self-Injurious Behavior and the Loudness Dependence of Auditory Evoked Potential in Research Volunteers

    Angelika Marsic,1 Mitchell E. Berman,2 Tammy D. Barry,1 and Michael S. McCloskey3

    1The University of Southern Mississippi 2Mississippi State University 3Temple University

    Objective: Serotonergic (5-HT) functioning has been shown to be inversely associated with inten- tional self-injurious behaviors. The purpose of this study was to examine the association between three related self-report measures of intentional self-injurious behaviors (suicidal thoughts/behavior, history of nonsuicidal self-injury, history of severe self-harm when angry) and a putative electrophysiological index of 5-HT activity, the loudness dependence of auditory evoked potential (LDAEP). Method: Auditory evoked potentials were recorded from 41 men (mean age = 20.69, standard deviation [SD] = 2.98) during the administration of various tone loudness stimuli, followed by completion of the self-report measures. Results: The component slope was associated with all measures of self-injurious behavior in the expected direction. Conclusion: The LDAEP has the potential to be used as a noninvasive index of intentional self-harm disposition. Additional studies are needed using other popu- lations, including women and treatment-seeking individuals, to determine if the LDAEP more broadly discriminates risk of self-injuring. C© 2014 Wiley Periodicals, Inc. J. Clin. Psychol. 71:250–257, 2015.

    Keywords: Self-injury; 5-HT functioning; LDAEP; N1/P2; EEG

    The prevalence rate for intentional self-injurious behavior (SIB) ranges from 1% to 4% in adults (Briere & Gil, 1998; Klonsky, Oltmanns, & Turkheimer, 2003; Prinstein, 2008) and from 17% to 38% in college students, with lifetime prevalence estimates of 35% (e.g., Gratz, 2001; Whitlock, Eckenrode, & Silverman, 2006). Not unexpectedly, the rates of SIB are even higher in clinical populations (21%–61% in adolescents and young adults and 21% in adults; e.g., Briere & Gil, 1998; Darche, 1990; DiClemente, Ponton, & Hartley, 1991; Prinstein, 2008). The most severe form of SIB—suicide—is the second leading cause of death among 25- to 34-year-olds, the third leading cause of death among 15- to 24-year-olds, and the 11th leading cause of death overall in the United States (Centers for Disease Control and Prevention [CDC], 2007). There are approximately 100–200 attempts for every completed suicide among young adults aged 15–24 years (Goldsmith, Pellmar, Kleinman, & Bunney, 2002).

    One of the most extensively studied biological correlates of SIB is serotonergic (5-HT) neu- rotransmitter activity. Specifically, attenuated 5-HT functioning has been associated with SIBs across the spectrum of lethality (e.g., Arango et al., 1990; Audenaert et al., 2001; Kamali, Oquendo, & Mann, 2001; Malone, Corbitt, Li, & Mann, 1996; McCloskey, Ben-Zeev, Lee, Berman, & Coccaro, 2009). For example, a meta-analysis by Lester (1995) reviewed 27 neuro- chemical studies of the association between 5-HT and SIB involving 1202 psychiatric patients and controls. The results provided strong evidence for the role of serotonin in suicidal be- havior. Individuals who had attempted suicide had lower levels of cerebral spinal fluid (CSF) 5-hydroxyindoleacetic acid (5-HIAA; a 5-HT metabolite) compared to psychiatric controls. As- berg (1997) reviewed 33 studies and found that low levels of CSF 5-HIAA were associated with

    Please address correspondence to: Angelika Marsic, 3264 Willowbrook Avenue, Palmdale, CA 93551. E-mail: Angelika.Marsic@gmail.com

    JOURNAL OF CLINICAL PSYCHOLOGY, Vol. 71(3), 250–257 (2015) C© 2014 Wiley Periodicals, Inc. Published online in Wiley Online Library (wileyonlinelibrary.com/journal/jclp). DOI: 10.1002/jclp.22136

     

     

    Self-Injurious Behavior and the LDAEP 251

    suicidality in unipolar depression and personality disorders. Diminished levels of CSF 5-HIAA have also been found in depressed patients with a high-lethality suicide attempt compared to depressed individuals with a low-lethality suicide attempt (Mann & Malone, 1997). Further- more, lower CSF 5-HIAA levels have been found in individuals engaging in nonlethal SIBs (López-Ibor, Saiz-Ruiz, & Pérez de los Cobos, 1985).

    Commonly used biological indexes of central 5-HT functioning can be costly and invasive (e.g., lumbar puncture; pharmacochallenge). However, a neurophysiological approach that takes advantage of electrical brain wave activity measured at the scalp (the loudness dependence of the auditory evoked potential: LDAEP; Hegerl & Juckel, 1993) may provide a noninvasive means to assess behaviorally relevant central 5-HT functioning. Although auditory evoked potentials are generated by a complex interrelationship of different neurotransmitters, there is mounting evidence that the LDAEP is most likely modulated by serotonergic system activities (Hegerl & Juckel, 1993). The LDAEP is a measure of auditory cortex activity as represented by the auditory evoked potential slopes (Hegerl, Gallinat, & Juckel, 2001). The intensity dependence of the auditory evoked N1/P2 component (i.e., dB level of the tone dependence) has been proposed to be inversely related to central serotonergic activity. That is, low serotonergic innervation of the auditory cortex ostensibly produces a more pronounced LDAEP N1/P2 component (i.e., increased N1/P2 amplitude to increasing intensity tones) and vice versa (Hegerl & Juckel, 1993).

    In humans, the N1/P2 component comprises two overlapping subcomponents generated by the superior temporal plane (mainly primary auditory cortex) and the lateral temporal gyri (secondary auditory cortex; Hegerl & Juckel, 1993). The N1/P2 component, occurring about 70–200 ms poststimulus, is used as a combined ratio parameter because it has higher loudness dependence reliability than when loudness dependence is measured separately for N1 and P2.

    In addition, the relationship with clinical features and personality factors is stronger with the loudness dependence of the combined parameter than with individual amplitudes (Hegerl, Gallinat, & Mrowinski, 1994). The N1/P2 component also exhibits prominent and stable in- terindividual differences. For example, Hegerl, Prochno, Ulrich, and Muller-Oerlinghausen (1988) found test-retest reliability of .77 for the Cz site (i.e., midline position of the central lobe) and .74 for the amplitude/stimulus intensity function (ASF) slope among healthy partic- ipants. ASF reflects the N1/P2 amplitude changes as the tone intensity increase. Hegerl and Juckel (1993) reported a test-retest correlation of .90 for the intraindividual stability of the in- tensity dependence N1/P2 component, mainly generated by the activity of the primary auditory cortex.

    The experimental evidence for a relationship between the LDAEP and 5-HT was first observed in animals (Hegerl et al., 1993; Juckel, Molnar, Hegerl, Csepe, & Karmos, 1997). However, experimental studies in humans (i.e., using pharmacological agents to augment serotonin levels in the brain) of the LDAEP as an index of acute 5-HT changes have yielded mixed results. For example, 5-HT augmentation in a double-blind, placebo-controlled study was shown to produce a significant decrease in N1/P2 slope with increasing tone loudness, lending support for the validity of the LDAEP as a 5-HT index (Nathan, Segrave, Phan, O’Neill, & Croft, 2006). Follow-up studies with healthy participants have failed to replicate these findings (Guille et al., 2008; Uhl et al., 2006), indicating that, at least in healthy subjects, the LDAEP may not be a good indicator of acute changes in central 5-HT activity.

    The lack of evidence for acute changes in 5-HT activity as a function of pharmacochallenge with 5-HT agents does not preclude the use of the LDAEP as a valid biological indicator in vulnerable individuals. Several clinical studies have found a strong LDAEP in individuals characterized by psychiatric disorders ostensibly marked by 5-HT dysfunction. For example, Gallinat, Bottlender, and Juckel (2000) found that a significantly higher number of depressive patients fell into a strong LDAEP group (seemingly reflecting attenuated 5-HT activity), and that those same individuals exhibited a significant decrease in depressive symptoms after a selective serotonin reuptake inhibitor (SSRI) treatment compared to depressive patients with a less prominent LDAEP.

    Hegerl and colleagues (1998) found that patients with high levels of serotonin syndrome (i.e., enhanced central 5-HT activity) exhibited a weaker LDAEP than those with low serotonin syndrome. Chen and colleagues (2005) found a sharper LDAEP slope in a depression–suicide

     

     

    252 Journal of Clinical Psychology, March 2015

    group as opposed to a depression–nonsuicidal group, demonstrating a potential utility of the LDAEP in discriminating suicidality among depressed individuals (Chen et al., 2005). Based on these findings, O’Neill, Croft, and Nathan (2008) concluded that although evidence for the LDAEP as an indicator of acute serotonergic changes among humans is conflicting in nature, evidence for the LDAEP as a useful biological index of 5-HT functioning in vulnerable individuals is more compelling.

    The purpose of this study was to examine the relationship between intentional SIBs and the LDAEP in a sample of young adults. If the LDAEP reflects relatively stable central 5-HT activity, an association between the LDAEP slope and measures of SIBs should emerge. To date no study has examined the relationship between the LDAEP N1/P2 slope and intentional SIBs in a nonclinical population. Given that previous studies have found that an increase in the auditory evoked N1/P2 component slope with increasing tone loudness (i.e., strong LDAEP) is inversely related to indexes of central serotonergic activity, it was expected that strong LDAEP would be positively related to various measures of SIB.

    Method

    Participants

    Forty-one men recruited from undergraduate classes took part in the study. The majority of the participants self-identified as Caucasian (56.1%), followed by African American (39%) and “other” (4.9%) race or ethnicity. Participants ranged in age from 18 to 31 years (mean [M] = 20.69, SD = 2.98). A history of schizophrenia or mood, anxiety, or substance dependence disorder was exclusionary. In addition, any hearing impairment, a history of seizures, and a history of traumatic brain injury were exclusionary. Potential participants were excluded if they were currently under medical treatment. Participants were asked to not consume alcohol or caffeinated beverages in the 24 hours before the study day. The study was reviewed and approved by the Institutional Review Board for the Protection of Human Subjects.

    Measures

    Health Screening Questionnaire. A brief health screening questionnaire was created for the current study, including items on the participant’s age, gender, and race. Along with this demographic information, items addressing the health and psychiatric exclusion criteria listed above were included.

    Suicidal Behaviors Questionnaire (SBQ; Cole, 1988). The SBQ is a four-item self- report measure that assesses suicidal thoughts, plans, and behavior. The SBQ questions are as follows: “Have you ever thought about or attempted to kill yourself?”; “How often have you thought about killing yourself in the past year?”; “Have you ever told someone that you were going to commit suicide, or that you might do it?”; “How likely is it that you will attempt suicide one day?” Items are rated on a Likert-format scale, with values ranging from 0–6, 0–4, 0–2, and 0–4, respectively. Scores range from 0 to 16 (with higher scores implying greater suicidal disposition). The SBQ has adequate internal consistency (α = .80) for a nonclinical sample and good test-retest stability over time (r = .95; Cotton, Peters, & Range, 1995).

    Furthermore, the SBQ has good construct validity, as shown by a significant positive corre- lation (r = .69) with the Scale for Suicidal Ideation in a nonclinical sample (Cotton et al., 1995) and with laboratory measures of self-aggression (Berman & Walley, 2003). Internal consistency for the current sample was adequate (α = .71).

    Deliberate Self-Harm Inventory (DSHI; Gratz, 2001). The DSHI is a 17-question, self-report scale of nonsuicidal SIBs. The DSHI comprises NSSI behaviors that do not have the goal of ending one’s life (e.g., self-cutting, burning, scratching, biting, and punching); items include, for example, “Have you ever intentionally (i.e., on purpose) carved words into your skin?” and “Have you ever intentionally (i.e., on purpose) used bleach, comet, or oven cleaner to

     

     

    Self-Injurious Behavior and the LDAEP 253

    scrub your skin?” Individuals endorse Yes or No for each item. A DSHI total score is obtained by summing the number of endorsed self-harm behaviors. The DSHI has shown adequate internal consistency (α = .82) and test-retest stability (r = .92; Gratz, 2001). Adequate correlations with related self-report measures of self-harm behaviors have been found (e.g., DSHI and the self-harm items on the Mental Health History Form, r = .49; Gratz, 2001). DSHI internal consistency for the current sample was also adequate (α = .84).

    The Life History of Aggression Scale-Self-Aggression subscale (LHA-SA; Coc- caro, Berman, & Kavoussi, 1997). The LHA is an 11-item measure of past aggressive, self-aggressive, and antisocial behaviors. The LHA assesses the frequency and intensity of these behaviors, rather than aggressive traits or ideation, and it provides information about these from age 13 on. For the current study, we used the self-report two-item Self-Aggression subscale of the LHA (e.g., “Deliberately tried to physically hurt yourself in anger or desperation” and “Deliberately tried to end your life or kill yourself in anger or desperation”), which is rated on a 6-point scale, ranging from 0 (no occurrences) to 5 (more events than can be counted), reflecting the total number of occurrences. The LHA-SA was found in previous studies to have somewhat low internal consistency (α = .45) due to gender differences (females, α = .71; males, α = .18) but had adequate inter-rater agreement (r = .84) and test-retest reliability (r = .97; Coccaro et al., 1997). LHA-SA internal consistency for the men in this study, however, was adequate (α = .72).

    LDAEP. LDAEP stimulus presentation, data acquisition, and analyses were accomplished using equipment and software obtained from the James Long Company, a 16-channel custom optically isolated bioamplifier. LDAEPs were recorded with 15 electrodes arranged according to the 10–20 electroencephalogram (EEG) electrode system, using M1 as a reference and AFz as ground. Impedances were kept below 5 kΩ throughout the testing. Pure sinus tones (1000 Hz, some with 100 ms duration with 10 ms rise and 10 ms fall time, and some with some with 50 ms duration with 10 ms rise and 10 ms fall time, inter stimulus interval (ISI) randomized between 1800 and 2200 ms) of five intensities (60, 70, 80, 90, 100 dB) were presented biaurally in a pseudorandomized form by headphones.

    Data were collected with a sampling rate of 500 Hz and an analogous bandpass filter (0.16– 50 Hz). Seventy sweeps of each stimulus intensity and time duration were presented (700 sweeps in all, with 350 sweeps of 50 ms tone duration, and 350 sweeps of 100 ms tone duration). Poststimulus peak latencies were determined between 80–120 ms for N1 and 150–230 ms for P2 components.

    Procedure

    Upon arrival, participants completed the informed consent process, after which a brief screening interview was administered. If the participant did not meet any exclusionary criteria, then he was instructed to complete the demographic questionnaire, which was computer administered. Next, the participant was prepared for the EEG recording. An appropriately sized electrocap compris- ing 15 electrodes, following the International 10–20 system, was fitted on the participant’s head. The scalp was prepared by application of a mildly abrasive gel (OmniPrep). Electrooculography (EOG) electrodes were placed on the outer canthi of each eye and on the supraorbital and infraorbital ridge of the left eye, to allow for detection and removal of ocular artifacts.

    According to lab standards, each electrode site displayed impedance of less than 5 kΩ, while the impedance on the EOG sites were kept at less than 10 kΩ. The left mastoid electrode site was used as a reference site during the collection phase. However, during the analysis, the right mastoid was averaged with the left mastoid to serve as the final reference to avoid the left or right hemisphere bias that is often found when using just one reference site (Luck, 2005).

    The participant was instructed to refrain from moving his eyes during testing to ensure mini- mal contamination of the data. Specifically, a fixation point was displayed on the screen for the duration of the EEG experiment, and the participant was asked to softly focus on that point and refrain from any eye movement other than regular blinking. In addition, the participant was asked to refrain from making any body movements. After the EEG data collection, the

     

     

    254 Journal of Clinical Psychology, March 2015

    participant completed the self-report measures of SIB (SBQ, SHI, LHA-SA). Finally, the par- ticipant was debriefed and psychology course research credit was applied.

    Results

    EEG Analysis

    Prior to analyzing the N1 and P2 amplitudes, a grand mean waveform for each electrode site was created. Based on visual inspection of the grand mean waveform and findings from previous research, appropriate latency time intervals were determined (Hegerl & Juckel, 1993). Previous research demonstrated that the N1/P2 amplitude is most pronounced at the Cz site. Therefore, our analysis used the Cz site to be consistent with previous research studies. N1 amplitudes were determined by computing the average amplitude between a latency of 80 and 120 ms. P2 amplitudes were determined by computing the average amplitude between a latency of 150 and 230 ms. The N1/P2 amplitude was calculated as the difference between N1 and P2 (P2-N1) at the Cz site. Finally, the N1/P2 slope for each participant was calculated using tone intensity as the independent variable and N1/P2 amplitude as the dependent variable. Individual slopes were used in the bivariate analyses.

    Statistical Analyses

    As would be expected, the frequency of reported thoughts and behaviors differed across mea- sures. For the SBQ (which on face is strongly weighted to assessing suicidal thoughts), 16 of 41 participants positively endorsed at least one item. The rates for actual nonsuicidal SIBs on the DSHI were somewhat lower, with 11 of 41 participants endorsing at least one behavior. Rates were lowest for the LHA-SA, which assesses more serious forms of intentional self-harm behavior. Specifically, 7 of 41 participants positively endorsed one of the two LHA-SA items (six endorsed “Deliberately tried to physically hurt yourself in anger or desperation” and three endorsed “Deliberately tried to end your life or kill yourself in anger or desperation” at some time after 13 years of age).

    Spearman rho bivariate analyses tested one-tailed at alpha = .05 revealed that the measures of SIB were significantly correlated with one another. Specifically, SBQ scores were associated with scores on the DSHI (r = .53, p = .003) and the LHA-SA (r = .49, p = .001). Moreover, scores on the DSHI and the LHA-SA were strongly correlated (r = .53, p < .001). Exploratory analysis for the two LHA-SA items revealed that deliberate physical self-harm and deliberate attempt to end one’s life were also associated (r = .46, p = .001).

    The LDAEP index (N1/P2 slopes) was positively correlated with scores on the SBQ (r = .31, p = .03), the DSHI (r = .42, p = .003), and the LHA-SA (r = .32, p = .02). Exploratory analyses for the two LHA-SA items and N1/P2 slopes revealed a nonsignificant association in the expected direction for deliberate self-harm when angered (r = .26, p = .052). Interestingly, despite the small proportion of participants with a history of actual suicide attempts in the sample, the LDAEP index was significantly related to this most lethal index of self-injury (r = .36, p = .01). Therefore, the overall pattern of associations suggests that higher slopes of N1/P2 amplitudes are related to higher scores on measures of SIBs across the spectrum of lethality.

    Discussion

    The LDAEP has been proposed as a putative, reliable, noninvasive index of 5-HT functioning in the central nervous system (Hegerl et al., 1994, 2001). Although an increasing number of studies has examined the relationship between the LDAEP and central serotonergic functioning in individuals with more serious psychopathologies marked by serotonin dysfunction (e.g., depression), to our knowledge this is the first study to date that has explored the relationship between the LDAEP and a history of SIB in a nonclinical population.

    Consistent with expectations, the LDAEP slope (with greater slopes indicative of attenuated 5-HT functioning) was positively associated with various indexes of SIB (including suicidal

     

     

    Self-Injurious Behavior and the LDAEP 255

    thoughts, nonsuicidal self-injury, and past suicide attempts), suggesting that the N1/P2 slope LDAEP index can identify SIB in a nonclinical population. The present findings support the notion that the LDAEP could potentially be used to prospectively discriminate individuals who are at risk of self-harming. This possibility can be tested in future studies by recruiting individuals with a notable history of self-injurious or suicidal behaviors and individuals who are experiencing urges currently and conducting LDAEP assessments across time.

    Limitations

    These findings also provide qualified evidence that the LDAEP is potentially a useful index of 5-HT functioning. However, this suggestion should be interpreted with caution, taking into consideration the study limitations. First, as with most event related potential (ERP) studies, the sample size was relatively small. Second, this study exclusively relied on self-report measures of lifetime SIB. Behavioral measures of SIB (e.g., Implicit Association Tasks or self-shock tasks such as the Self-Aggression Paradigm; Berman & Walley, 2003) and information from third- party sources would provide a more comprehensive approach to understanding the boundaries of the relationship between the LDAEP and SIBs.

    Future studies should incorporate community and clinical samples, including men and women, which are more representative of people who engage in SIBs. Although the partici- pants were screened for psychopathology, no measures were administered assessing subclinical levels of internalizing symptoms (e.g., depression and anxiety) that may have mediated the SIB– LDAEP association. Importantly, we did not use a structured or semistructured interview to fully assess various forms of psychopathology associated with self-harm, such as personality disorders.

Which theory of dreaming seems to best explain Arlene’s disturbing dreams, and why?

Read the following case studies below and answer the questions that follow.

Process

Your assignment must include the following:

1. A cover sheet

2. The answers to both Case Study 1 and Case Study 2 written in complete sentences

Formatting

Format your paper using a standard font, such as Times New Roman, 12 point, double-spaced. Set the margins at a standard 1 inch on all side.

For the body of your paper, make a clear distinction when you’re answering the questions about Case Study 1 and answer questions 1–5 in complete sentences. Then move on to Case Study 2 and continue in the same format. For clarity, please include each question from the case study prior to your response

Case Study 1

The Case of Arlene Amarosi, the Woman Who Dreams of Stress

Arlene Amarosi, a working mother, has been under a lot of stress this year. She has been having difficulty getting to sleep, and often lies in bed starting at the ceiling while worrying about her problems. As a result, she’s often tired throughout her workday and relies on coffee and caffeinated energy drinks to keep her going. Lately Arlene’s sleep has been disturbed even more often than usual. Several times over the past week she has been awakened by disturbing dreams. In these dreams she is always at work, struggling to keep up with an impossible workload. She is struggling with the new software that her company recently trained her to use, but no matter how fast she goes, she can’t keep up with the workflow. The dream ends when Arlene wakes up in a panic. It often takes Arlene hours to get back to sleep, and she has been feeling even more tired than usual during work.

Questions

1. Arlene is worried that her recent dream experiences indicate that something is wrong with her. If you were Arlene’s friend and wanted to reassure her, how would you help her to understand the normal experience of sleep and dreams?

2. Which theory of dreaming seems to best explain Arlene’s disturbing dreams, and why?
3. How might meditation help Arlene?

4 If you were Arlene´s health care provider, how would you advise her to overcome her insomnia?
5. What are some effects on Arlene of her high caffeine intake

Case Study 2

John Buckingham, The New Guy On The Job

When John Buckingham moved across the country to take a new job, he didn’t expect to run into much difficulty. He would be doing the same kind of work he was used to doing, just for a new company. But when he arrived on his first day, he realized there was more for him to adjust to than he had realized.

Clearly, John had moved to a region where the culture was much more laid back and casual than he was used to. He showed up for his first day in his usual business suit only to find that almost all the other employees wore jeans, Western shirts, and cowboy boots. Many of them merely stared awkwardly when they first saw John, and then hurriedly tried to look busy while avoiding eye contact.

John got the message. On his second day at work John also wore jeans and a casual shirt, although he didn’t yet own a pair own cowboy boots. He found that people seemed more relaxed around him, but that they continued to treat him warily. It would be several weeks—after he’d gone out and bought boots and started wearing them to work—before certain people warmed up to John enough to even talk to him.

Questions

  1. What does the behavior of John’s co-workers toward John suggest about their attributions for his initial manner of dress?
  2. Describe the kinds of biases that might have affected John’s co-workers as they formed impressions of him on his first day. Could they have been using a faulty schema to understand him? Is there evidence of the halo effect?
  3. Explain why John changed his manner of dress so soon after starting his new job? What processes were likely involved in his decision to do so?
  4. John’s co-workers seemed very hesitant to “warm up” to John. How would you explain to John their initial reluctance to like him very much?
  5. If you were the human resources director for this company, what strategies would you employ to prevent experiences like John’s? How would you justify the implementation of these strategies to the company president?

 

Describe how the group will define operationally and measure the variables.

This is a Collaborative Learning Community (CLC) assignment.

Before beginning this assignment, each group should submit a filled-in copy of the CLC Agreement Form.

Each CLC team will design a quasi or a true experimental study, investigating the impact of the independent variable on the dependent variable.

Address the following in 500-750 words:

1. Design either a quasi or experimental study to investigate the variables. What is the hypothesis? Describe the types of hypotheses with respect to testing. What does the experimental method allow that the correlation design does not?

2. Identify the independent variable. Identify the dependent variable.

3. Describe how the group will define operationally and measure the variables.

4. Describe how the group will obtain a random sample of participants.

5. Discuss how the group will ensure the study has high internal validity. Will the subjects be assigned randomly to the groups? Why or why not.

6. Are there any ethical concerns about the treatment of participants emerging from the experiment?

7. Consider the data presented, would you use t or score? Why? include the appropriate effect size.

8. Submit an SPSS output for the quasi or true experimental study.

Include at least two to four scholarly sources.

While APA style is not required for the body of this assignment, solid academic writing is expected and in-text citations and references should be presented using APA documentation guidelines, which can be found in the APA Style Guide, located in the Student Success Center.

This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.

7 Page Psychology Essay

Peter Kruyen

Using short tests and questionnaires for making decisions about individuals:

 

 

 

 

 

 

 

 

 

 

 

Using Short Tests and Questionnaires for Making Decisions about Individuals:

When is Short too Short?

 

Peter Mathieu Kruyen

 

 

 

Cover design by Roos Verhooren (info@roosverhooren.nl) Printed by Ridderprint BV, Ridderkerk, the Netherlands © Peter Mathieu Kruyen, 2012 No part of this publication may be reproduced or transmitted in any form or by any means, electronically or mechanically, including photocopying, recording or using any information storage and retrieval system, without the written permission of the author, or, when appropriate, of the publisher of the publication. ISBN/EAN: 978-90-5335-614-2

 

This research was supported by a grant from the Netherlands Organisation for Scientific Research (NWO), grant number 400-05-179.

 

 

Using Short Tests and Questionnaires for Making Decisions about Individuals:

When is Short too Short?

 

Proefschrift ter verkrijging van de graad van doctor

aan Tilburg University

op gezag van de rector magnificus,

prof. dr. Ph. Eijlander,

in het openbaar te verdedigen ten overstaan van een

door het college voor promoties aangewezen commissie

in de aula van de Universiteit

 

op vrijdag 14 december 2012 om 14:15 uur

 

door

 

Peter Mathieu Kruyen,

geboren op 26 juli 1983 te Dordrecht

 

 

 

Promotor: Prof. dr. K. Sijtsma

Copromotor: Dr. W. H. M. Emons

 

Overige leden van de Promotiecommissie:

Prof. dr. M. Ph. Born

Prof. dr. R. R. Meijer

Prof. dr. M. J. P. M. van Veldhoven

Dr. L. A. van der Ark

Dr. A. V. A. M. Evers

 

 

 

.

 

 

 

 

Contents

1. Introduction …………………………………………………………………………………………………………….. 1

1.1 Test length and individual decision-making ………………………………………………………………… 3 1.2 Preliminaries: Test length and measurement precision …………………………………………………. 6 1.3 Overview of the thesis ……………………………………………………………………………………………. 12

2. On the shortcomings of shortened tests: A literature review ……………………………………. 15

2.1 Introduction …………………………………………………………………………………………………………… 17 2.2 Research questions …………………………………………………………………………………………………. 18 2.3 Technical terms ……………………………………………………………………………………………………… 19 2.4 Method …………………………………………………………………………………………………………………. 26 2.5 Results ………………………………………………………………………………………………………………….. 29 2.6 Discussion …………………………………………………………………………………………………………….. 40 Appendix: Coding scheme ……………………………………………………………………………………………. 44

3. Test length and decision quality: When is short too short? ………………………………………. 47

3.1 Introduction …………………………………………………………………………………………………………… 49 3.2 Background …………………………………………………………………………………………………………… 50 3.3 Method …………………………………………………………………………………………………………………. 53 3.4 Results ………………………………………………………………………………………………………………….. 63 3.5 Discussion …………………………………………………………………………………………………………….. 71

4. Assessing individual change using short tests and questionnaires …………………………….. 75

4.1 Introduction …………………………………………………………………………………………………………… 77 4.2 Theory ………………………………………………………………………………………………………………….. 79 4.3 Method …………………………………………………………………………………………………………………. 82 4.4 Results ………………………………………………………………………………………………………………….. 87 4.5 Discussion …………………………………………………………………………………………………………….. 94

5. Shortening the S-STAI: Consequences for research and individual decision-making … 99

5.1 Introduction …………………………………………………………………………………………………………. 101 5.2 Background: Pitfalls of shortening the S-STAI ………………………………………………………… 103 5.3 Method ……………………………………………………………………………………………………………….. 108 5.4 Results ………………………………………………………………………………………………………………… 112 5.5 Discussion …………………………………………………………………………………………………………… 117 Appendix: Explanation of strategies used to shorten the S-STAI …………………………………….. 119

6. Conclusion and discussion …………………………………………………………………………………….. 121

6.1 Conclusion ………………………………………………………………………………………………………….. 123 6.2 Discussion …………………………………………………………………………………………………………… 128

References ……………………………………………………………………………………………………………….. 131

 

 

 

Summary …………………………………………………………………………………………………………………. 143

Samenvatting (Summary in Dutch) …………………………………………………………………………… 149

Woord van dank (Acknowledgments in Dutch) …………………………………………………………. 157

 

 

 

 

1

Chapter 1: Introduction

 

 

Chapter 1

2

 

 

 

Introduction

3

1.1 Test Length and Individual Decision-Making

Psychological tests and questionnaires play an important role in individual

decision-making in areas such as personnel selection, clinical assessment, and educational

testing. To make informed decisions about individuals, psychologists are interested in

constructs like motivation, anxiety, and reading level, which have been shown to be valid

predictors of criteria such as job success, suitability for therapy, and mastery of reading

skills. These unobservable constructs are measured by a collection of items comprising a

test. Addition of the scores on the items provides a respondent’s total score or test score,

which reflects the respondent’s level on the construct of interest. Total scores are used to

decide, for example, which applicant to hire for a job, whether a patient benefited from a

treatment, or to determine if a particular student needs additional reading help.

Before using a test for individual decision-making, tests users need to be certain to

a particular extent that decisions for individual respondents do not depend on one

particular test administration (e.g., Emons, Sijtsma, & Meijer, 2007, p. 133; Hambleton &

Slater, 1997). When total scores vary considerable across different (hypothetical) test

administrations due to random influences like mood and disturbing noises during the test

administration, the risk of incorrect individual decisions may be substantial. As a result,

test users may reject a suited applicant, continue an unsuccessful treatment, or deny

additional help to a student with a low reading level. Incorrect decisions may have

important negative consequences such as a decline of the well-being of individual

respondents and the waste of organizational resources.

In this PhD thesis, the focus is on the influence of random measurement error or

total-score unreliability on test performance in relation to individual decision-making.

Special attention is given to test length in relation to reliability, which is a group

characteristic, and measurement precision, which pertains to measurement of individuals.

 

 

Chapter 1

4

Throughout, we concentrate on reliability issues in decision-making about individuals, and

for the sake of simplicity assume that tests are valid. Validity is a highly important topic

that cannot be addressed in passing and justifies a PhD study on its own.

Generally, tests consisting of many items, say, at least 40 items, are more reliable

than tests consisting of only a few items, say, 15 or fewer items. Specifically, psychometric

theory—the theory of psychological measurement—shows that the more items respondents

answer, the smaller the relative influence of random errors on total scores (e.g., Allen &

Yen, 1979, pp. 85-88; Nunnally & Bernstein, 1994, pp. 230-233). However, extending test

length with the purpose of minimizing the relative influence of random errors encounters

numerous practical objections. For example, filling out long tests may result in high

administration costs. In other applications, test users do not want to trouble respondents

with many questions, for example, when critically-ill patients need to be assessed. To

summarize, test users and test constructors often do not appreciate long tests and

questionnaires and rather prefer tests that are as short as possible, often within the limits of

particular psychometric constraints.

Hence, short tests—including shortened versions of previously developed longer

tests—are abound in practice. In personnel psychology, for example, researchers

developed a 12-item form of the Raven Advanced Progressive Matrices test (originally 36

items, Arthur & Day, 1994) and a scale measuring focus of attention by means of 10 items

(Gardner, Dunham, Cummings, & Pierce, 1989). In clinical psychology, examples include

a 13-item version of the Beck Depression Inventory (originally 21 items, Beck & Beck,

1972) and a 3-item version of the Brief Pain Inventory (originally 11 items, Krebs et al.,

2009). In personality psychology, we can find, for example, a 10-item and a 5-item

questionnaire allegedly measuring the complete Big Five construct (Gosling, Rentfrow, &

 

 

Introduction

5

Swann, 2003). For the purpose of comparison, the NEO PI-R contains 240 items to

measure the Big Five (Costa & McCrae, 1992).

One way or another, psychologists need to deal with the consequences of using

short tests for the risk of making incorrect decisions about individuals. The goal of this

thesis is to assess whether, based on psychometric considerations, short tests may be used

for making decisions about individuals. Total scores can be reliable, but if a test designed

to measure reading level also measures another construct such as anxiety, scores will be

interpreted incorrectly. In this thesis, we focus on the relationship between test length and

reliability, because reliability is a necessary (although not a sufficient) condition for tests to

be valid (Nunnally & Bernstein, 1994, p. 214); that is, poor test performance that is mostly

due to random measurement error does not reflect the influence of the construct of interest,

and a reliable test may or may not measure the intended construct. Validity should be

studied on its own but for practical reasons is simply assumed here.

We answer the following research questions:

1. To what extent do psychologists pay attention to the consequences of using short

tests for making decisions about individuals?

2. How should one assess the risk of making incorrect individual decisions?

3. To what extent does test shortening increase the risk of making incorrect individual

decisions?

4. What are minimal test-length requirements for making decisions about individuals

with sufficient certainty?

 

 

 

 

 

Chapter 1

6

1.2 Preliminaries: Test Length and Measurement Precision

Often reliability is assessed by coefficient alpha or the test-retest correlation

(Nunnally & Bernstein, 1994, pp. 251-255). A test is deemed suited for individual

decision-making if the total-score reliability exceeds a minimum value that is recognized

as a rule of thumb. However, for psychologists interested in individual decision-making,

measurement precision is more important than reliability (Harvill, 1991; Mellenbergh,

1996; Nunnally & Bernstein, 1994, p. 260; Sijtsma & Emons, 2011).

In this section, we show that the reliability coefficient conveys insufficient

information to assess whether the total score is precise enough to be useful for individual

decision-making. Specifically, we show that total-score reliability as estimated by

coefficient alpha can be acceptable for short tests but that meanwhile measurement

precision of the test is much lower and even unacceptably low.

 

1.2.1 Theory

We studied the relationship between test length and measurement precision from

the perspective of classical test theory (CTT). CTT assumes that a total score, which is the

sum of the scores on the items in the test, and which is denoted 𝑋 , equals the sum of true score 𝑇 and random measurement error 𝐸 : 𝑋 = 𝑇 + 𝐸 . The statistical model of CTT assumes that the same test is administered an infinite number of times to a particular

respondent and that these administrations are independent so that different administrations

can be considered to be replications. Due to random processes reflected by the error

component, replications produce a distribution of total scores, also known as the propensity

distribution (Lord & Novick, 1968, pp. 29-30). The mean of the propensity distribution is

defined as the respondent’s true score and the dispersion is the respondent’s measurement-

 

 

Introduction

7

Figure 1.1: Example of propensity distributions for two respondents with different true scores and error variances.

error variance. Figure 1.1 shows for two respondents their hypothetical propensity

distribution, which are different with respect to true score and error variance. Thus, CTT

assumes that different respondents are measured with different precision.

In the real world, instead of a propensity distribution only one total score is

available for each respondent. Hence, in practice one uses the sample of total scores from

all individuals to estimate one common error variance, which may be considered the mean

of all the error variances of the unobserved propensity distributions (Lord & Novick, 1968,

p. 35). The mean standard deviation, which is known as the standard error of measurement

(SEM), is used for quantifying measurement precision for each individual. Let 𝑆 denote

0.0

0.1

0.2

0.3

0.4

0.5 D

en si

ty

Total scores (X +)

True scoreRespondent 1 True scoreRespondent 2 0 2 4 6 8 10 12 14

 

 

Chapter 1

8

the total-score variance in the sample, 𝑆 the unobservable true-score variance, and 𝑆 the measurement-error variance. Given the definition of random measurement error, it can be

shown that 𝑆 = 𝑆 + 𝑆 . Using this result, in the sample total-score reliability is defined as 𝑟 = 𝑆 /𝑆 = 1 − 𝑆 /𝑆 . The SEM can be derived to be equal to 𝑆 = 𝑆 1 − 𝑟 (Allen & Yen, 1979, p. 89), and 𝑟 may be substituted by coefficient alpha when the SEM is estimated from the data.

The SEM is used to estimate confidence intervals (CIs) for true score 𝑇 (Allen & Yen, 1979, p. 89). The narrower the CI, the more precise is the estimate of 𝑇. CIs are computed as follows. An observed total score 𝑋 is taken as an estimate of true score 𝑇 such that 𝑇 = 𝑋 , and it is assumed that the SEM is the standard error of a normal distribution with mean 𝑇. When a 95% CI is taken, the respondent’s true score 𝑇 lies in the interval 𝑋 ± 1.96𝑆 in 95% of the hypothetical test replications. However, in certain practical settings such as in personnel selection, “few organizations can wait to be 95%

sure of success” (Smith & Smith, 2005, p. 126). Apart from the incorrect interpretation of

CIs expressed here, the result is that in these settings a lower confidence level is often

chosen (e.g., 68% CI meaning 𝑋 ± 𝑆 ), implying that organizations are willing to take a higher risk of making an incorrect decision for individual respondents.

 

1.2.2 Method

We did a computational study to illustrate the relation between test length,

reliability and measurement precision. Let 𝐽 be the number of items in the test, and let the score on item 𝑗 be denoted by 𝑋 . Items may be dichotomously scored (i.e., 0 for an incorrect answer and 1 for a correct answer), or polytomously scored (e.g., 0, 1, …, 𝑚 for rating-scale items). We define the range of the scale as the difference between the

maximum possible total score and the minimum possible total score. For 𝐽 dichotomous

 

 

Introduction

9

items, the scale range equals 𝐽, and for 𝐽 rating-scale items the scale range equals 𝐽 × 𝑚. For dichotomous items, we studied how the ratio of the CI and the scale range, henceforth

denoted relative CI, relates to test length.

For 1,000 respondents, item scores were simulated using the item response model

known as the Rasch model (Embretson & Reise, 2000, p. 67; Rasch, 1980). The Rasch

model is defined as follows. Instead of a true score, the model uses a latent variable,

denoted 𝜃, as the person variable of interest. Without much loss of generality, we assumed that 𝜃 has a standard normal distribution. Items are characterized by their difficulty, here denoted 𝛿 , which is expressed on the same scale as the latent person variable 𝜃. The Rasch model expresses the probability of a 1 score on dichotomous item 𝑗 as a function of the latent variable 𝜃 and the difficulty 𝛿 of the item as 𝑃 𝑋 = 1 𝜃, 𝛿 = exp [𝑎 𝜃 − 𝛿 ]1 + exp [𝑎 𝜃 − 𝛿 ] . (1.1) Constant 𝑎 expresses the common discrimination power of the 𝐽 items in the test. The higher 𝑎, the higher the probability that a respondent with a low 𝜃 value relative to the item location 𝛿 scores 0 and a respondent with a high 𝜃 value relative to 𝛿 scores 1. Because an increase of 𝑎 in all items causes an increase of total-score reliability, in a simulation study 𝑎 can be used to manipulate the reliability of the total-score. For 𝐽 = 40, we chose item difficulty values between –1.5 and 1.5 such that distances between adjacent values were equal throughout. Item 1 is the easiest item

implying that out of all 40 items it has the highest probability of a 1 score for each 𝜃 value, and item 40 is the most difficult item implying the lowest probability for each 𝜃 value. By choosing 𝑎 = 2.9, we found that for 𝐽 = 40 the reliability estimated by coefficient alpha equaled .96. This is high but not unrealistic for a 40-item test. Next, to obtain tests

consisting of 20, 15, 10, and 5 items, we removed items from the 40-item test, such that in

 

 

Chapter 1

10

each test the item difficulties of the remaining items were spread at approximately equal

distances between –1.5 and 1.5.

 

1.2.3 Results

Table 1.1 shows coefficient alpha as an estimate of total-score reliability, the SEM,

the width of the 68% CI and the 95% CI, and the relative CIs. The table shows that

removing items from the test caused coefficient alpha to decrease from .96 for 𝐽 = 40, to .70 for 𝐽 = 5. The latter alpha value is still acceptable for some practical applications (Kline, 2000, p. 524). The SEM and the width of the CI also decreased as the test grew

shorter. For example, for 𝐽 = 40 the SEM was 2.12 and the 95% CI covered a range of 8.27 scale points, but for 𝐽 = 5 the SEM was 0.74 and the 95% CI covered only 2.92 scale points.

 

Table 1.1: Test length and measurement precision for five test lengths.

Confidence level Coefficient 68% 95% 𝐽 Alpha SEM CI Relative CI CI Relative CI

40 .96 2.12 4.22 .11 8.27 .21 20 .92 1.49 2.98 .15 5.85 .29 15 .90 1.31 2.62 .17 5.13 .34 10 .85 1.06 2.12 .21 4.16 .42 5 .70 0.74 1.49 .30 2.92 .58

 

Smaller SEMs and CIs suggest greater measurement precision but this would be the

wrong conclusion, which is shown by the relative CIs which increased substantially as

scale range decreased. Figure 1.2 shows the relative CI at the midpoints of the scale. As 𝐽 decreases a larger part of the scale becomes unreliable. This means that only if respondents

differ to a large degree will their differences on the scale be significant. However, the vast

 

 

Introduction