Development and Validation of the Atrial Fibrillation Effect on QualiTy-of-Life (AFEQT) Questionnaire in Patients With Atrial FibrillationClinical Perspective
Background— Atrial fibrillation (AF) has a deleterious impact on health-related quality-of-life (HRQoL), but measuring this outcome is difficult. A comprehensive, validated, disease-specific questionnaire to measure the spectrum of QoL domains affected by AF and its treatment is not available. We developed and validated a 20-item questionnaire, Atrial Fibrillation Effect on QualiTy-of-life (AFEQT), in a 6-center, prospective, observational study.
Methods and Results— Factor analyses established 4 conceptual domains (Symptoms, Daily Activities, Treatment Concern, and Treatment Satisfaction) from which individual domain and global scores were calculated. Participants from 6 centers completed the AFEQT at baseline, at month 1, and at month 3. Psychometric analyses included internal consistency and known-group validity. Test-retest reliability was assessed by comparing 1-month changes in scores among those with no change in therapy. Effect size was used to assess responsiveness after intervention. Among 219 patients age 62±11.9 years, 94% completed the AFEQT at baseline and 3 months; 66% had paroxysmal, 24% persistent, 5% longstanding persistent, and 5% permanent AF. Internal consistency was >0.88 for all scales. Lower AFEQT scores were observed with increased AF severity, categorized as asymptomatic, mild, moderate and severe, respectively: 71.2±20.6, 71.3±19.2, 57.9±19.0, and 42.0±21.2. Intraclass correlations for Overall, Symptoms, Daily Activities, Treatment Concern, and Satisfaction scores were 0.8, 0.5, 0.8, 0.7, and 0.7, respectively. Changes in 3-month scores were larger after ablation than with pharmacological adjustments, and both were greater than those observed in stable patients.
Conclusions— This initial validation of AFEQT supports its use as an outcome in studies and a means to clinically follow patients with AF.
Atrial Fibrillation (AF) is the most common cardiac arrhythmia, affecting more than 5.6 million people in the United States and with a projected prevalence of 15.9 million by 2050.1 Although stroke prevention, rate, and rhythm control are important goals of AF treatment, minimizing symptoms, physical limitations, and the quality-of-life (QoL) decrements associated with AF are equally important.2 A measurement tool to accurately and reliably quantify the effect of AF on patients' QoL would be useful for both clinical and research purposes. Although some symptom scales exist, there is currently no validated, comprehensive, disease-specific measure to quantify the impact of AF and its treatment on the full spectrum of patients' health status, including their symptoms, function, and QoL among English speakers.3–8
Clinical Perspective on p 25
To address this gap, we developed a novel disease-specific health status instrument, the Atrial Fibrillation Effect on QualiTy-of-life (AFEQT) questionnaire, to explicitly measure patients' perceptions of their symptoms, functional impairment, treatment concerns, and satisfaction with treatment. By adhering to the strict methodological criteria established by the Food and Drug Administration in its draft recommendations for patient-reported outcomes,9 we captured patients' perspectives on the most important manifestations of AF in their lives and prospectively validated the AFEQT instrument, establishing its reproducibility in stable patients and its sensitivity to clinical change, or responsiveness, in patients undergoing therapeutic interventions. AFEQT is intended as a tool that could reflect the impact of AF from patients' perspectives, provide an outcome measure for use in clinical trials and serve as a means for quantifying and improving the quality of AF treatment.10
Development of the AFEQT: Item Generation (Phase I) and Reduction/Refinement (Phase II)
A detailed description of the initial development of AFEQT is provided in the online-only Data Supplement. In brief, the AFEQT tool was developed based on a literature review of the QoL of patients with AF and interviews with patients and clinical experts to generate 117 potentially important candidate questions and to create a conceptual framework for how the clinical manifestations of AF impact the lives of patients. To assess AF patients' perceptions of the importance of each potential item, the list was administered to 148 AF patients with a rating questionnaire that provided 5 responses, ranging from “not important” (1) to “extremely important” (5). An open-ended response item was also included so patients could add issues that were not included on the original list. After distilling the potential number of questions down to 42 (online-only Data Supplement), cognitive interviews were conducted 1:1 with 12 patients to insure that the questionnaire's instructions, questions, and response options were easy to understand and answer. Revisions to wordings and responses were then incorporated and an additional 12 patients were interviewed to confirm the clarity and comprehensiveness of the instrument.
Overview of Study Design to Establish the Psychometric Properties of the AFEQT Questionnaire
An overview of the study design is shown in Figure 1. To establish the validity, reliability, and responsiveness of the AFEQT questionnaire, we planned to prospectively enroll a sample of 60 patients in 3 subgroups of AF patients, based on our expectations of their changes in health status. We thus recruited patients who we expected to be relatively stable, those we thought would have a small improvement in their symptoms, and a group in whom we anticipated a more substantial change in their clinical status. Specifically, group 1 (patients who were not expected to have a change in their treatment; the stable group) included stable patients with no planned changes in their treatment; group 2 (patients expected to have, on average, a small improvement) was composed of patients having a planned adjustment to their AF medications; and group 3 (patients expected to experience a relatively large change in their health status) were those planning to undergo AF ablation. Because there were no estimates of means or standard deviations for the AFEQT tool, sample size was defined by the collective experience of the authors and no formal power calculations could be created. If patients' “planned” treatment strategy changed before the follow-up assessment, these patients were switched into the appropriate treatment group before scoring or analyzing their results. For example, if at time of enrollment, a patient was scheduled to have an ablation (group 3), but the procedure was postponed beyond the 3-month follow-up, the patient was switched to group 1.
Each patient completed a battery of existing instruments, as described below, to better refine the AFEQT instrument and to determine the validity, reliability, and responsiveness of the final questionnaire. For groups 2 and 3, questionnaires were completed before any changes in AF treatment. For all study groups, questionnaires were also completed at 1 and 3 months after the initial visit. In addition to the questionnaires, demographic and medical history data of enrolled patients were collected at baseline. Patients and physicians also completed an Atrial Fibrillation Global Change Form at month 3 to provide a global assessment of the change in patients' AF status over the course of observation. Each institution's ethics review board for protection of human subjects approved the study protocol.
Establishing the Validity of the AFEQT Questionnaire
Procedures and Participants
We prospectively recruited patients from one Canadian and 5 US sites between August 2008 and July 2009. English speaking adults with documented paroxysmal, persistent, longstanding persistent, or permanent AF mirroring the frequency and type of AF reported in clinical practice were eligible to participate. Definitions of AF type were those of the 2007 HRS/EHRA/ECAS Expert Consensus Atrial Fibrillation Criteria (online-only Data Supplement Table 1A). Patients were approached to participate at the time of their scheduled clinic or procedural visit and participating patients signed an informed consent.
Study Instruments Used in Psychometric Analyses
The initial 42-item AFEQT questionnaire was developed to assess the impact of AF and its treatment on patients' symptoms, functioning, and daily activities through the following six domains: Symptoms, Social Functioning, Physical Functioning, Emotional Functioning, Treatment Concerns, and Treatment Satisfaction. It was developed for use as a self-administered outcome measure, using a 4-week recall frame with equidistant (similar differences in clinical states across the range of values), 7-point Likert responses ranging from the most severe limitation/symptoms to no limitation/symptoms. The AFEQT Overall score consists of 39 of the 42 items because the 3 satisfaction items do not assess patients' health status. The raw scores of 1 to 7 were transformed to a 0 to 100 scale, where a score of 0 indicated the most severe symptoms or disability and a score of 100 indicated no limitation or disability. Thus, higher scores on the AFEQT instrument indicate better health status. For calculation of the AFEQT Overall summary and subscale scores, calculations were based on actual responses, thereby adjusting the score for missing responses: .
The 36-Item Short Form Health Survey is a well-validated and reliable general health status measure.11 It is self-administered and assesses 8 domains: physical functioning, role limitations due to physical health, bodily pain, general health perceptions, vitality, social functioning, role limitations due to emotional problems, and mental health.11 Summary Mental and Physical Component scores are generated using a proprietary algorithm supplied by the instrument developers.11 A 4-week recall period is used and the scoring method used normalized responses to 50±10, with higher scores indicating better health status, as recommended by its authors.
The EuroQol (EQ-5D) is a well-validated, generic health-related quality-of-life measure.12 It is self-administered and has 5 dimensions assessing mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. It also includes a visual analog scale that asks respondents to rate their current health from 0 to 100, with 0 representing death and 100 indicating perfect health. The 5-item questionnaire can also be transformed to a societal-based utility score, ranging from 0 to 1, with higher scores reflecting better health status.12
University of Toronto Atrial Fibrillation Severity Scale Version 3
The Atrial Fibrillation Severity Scale (AFSS) is a 19-item self-administered questionnaire developed to capture subjective and objective ratings of AF related symptoms, health care utilization, and AF disease burden, including frequency, duration, and severity of episodes.13 The AF symptom burden score is derived from the AFSS summary score that averages the frequency, duration, and patient perceived severity of AF episodes. A higher score indicates greater AF burden.13
Symptom Checklist: Frequency and Severity, Version 3
The Symptom Checklist (SCL) is a self-administered questionnaire with a 4-week recall period that contains 16 questions and assesses AF symptom frequency and severity, with scores ranging from 0 to 64 and 0 to 48, respectively. Higher scores indicate more frequent and worse symptoms severity.14
Generalized Anxiety Disorder-7
The Generalized Anxiety Disorder (GAD)-7 is a self-administered 7-item Generalized Anxiety Disorder scale that is a subset of the PRIME MD questionnaire.15 GAD-7 total score for the 7 items ranges from 0 to 21. Scores of 5, 10, and 15 represent cut-points for mild, moderate, and severe anxiety, respectively.15
AF Patient and Physician Global Change Forms
Following the approaches of Jaeschke and Juniper16,17 we developed single-item surveys that quantify, from patients' and physicians' perspectives, the direction (no change/improvement/worsening) and magnitude of clinical change that has occurred between enrollment and the follow-up visit. The scale is administered by first asking whether any clinical change has occurred (deterioration versus no change versus improvement) and, if a clinical change has occurred, the magnitude of clinical change is assessed using a 15-point Likert-type response ranging from a very large and important clinically important deterioration to a very large and clinically important improvement.
Standard sociodemographic information was collected (age, race, sex, marital status, highest education level, and current occupation/work status) as well as type, duration, current AF severity, and last episode of AF before enrollment. Additional AF-related medical history items, including previous cardioversions, current AF medications, and most recent ejection fraction were also collected.
Psychometric Validation of the AFEQT
The first step in this validation study was to confirm the underlying factor structure of the AFEQT instrument for use in measuring symptom burden and activity limitation for AF patients. Then, using data collected throughout this study (Figure 1), we sought to evaluate the reliability, construct validity, and responsiveness of the AFEQT Overall and domain scores in patients who were presumed to be stable and in those with a change in their treatment.
Factor analysis (FA) is a statistical technique that is used to uncover the latent structure for a large set of items to determine the number of factors underlying the items and to identify which items measure the same underlying factor.18 Items that are highly correlated with each other are grouped together to form factors. Exploratory FA, which made no a priori assumptions about number of underlying factors or the connection of items to specific factors, was conducted in development phases of AFEQT (see online-only Data Supplement). In this study, we subjected 39 of the 42 items (excluding the treatment satisfaction scale) to a confirmatory FA to see whether the same construct was observed with a priori expectations, based on our preliminary development of the instrument (online-only Data Supplement) and input from clinical experts. Because the 3 satisfaction items do not assess patients' health status and were not included in the Overall AFEQT score, they were not included in the FA. Based on the confirmatory findings, the resulting number of questions was subjected to the reliability, validity, and responsiveness analyses described below.
Internal consistency, or reliability, examines consistency of items within a scale and quantifies the degree to which each item is measuring aspects of the same underlying domain. In this analysis, internal reliability of AFEQT and its subscales was examined using Cronbach coefficient α, in which a value of 0.90 or higher is excellent and 0.80 or higher is sufficient.19
The intraclass correlation coefficient (ICC) was used to assess test-retest reliability. The ICC measures how stable responses are over time (≥0.7), using change in AFEQT scores at month 1 from a subset of group 1 patients who reported no change in their clinical status on their Patient Global Change Form. Month 1 AFEQT data were used instead of month 3 because it is closer to the baseline time period and patients would be less likely to have had a change in their health status.
Convergent and Divergent Validity
When no clear gold standard exists for quantifying a property, such as HRQoL, the most assuring method to establish the validity of a new test is convergent validity, in which the new measure is shown to be correlated with other measures that are thought to quantify the same concept. Such correlations are considered to be high when the correlation coefficients are ≥0.4. Conversely, divergent validity is demonstrated when domain items that are thought to measure different concepts have low correlations (r<0.4).20 We examined the convergent and divergent validity of AFEQT subscales and total scores by estimating its association with well established questionnaires, including the SF-36, EQ-5D SCL, AFSS, and GAD-7. These were done by computing Pearson correlation coefficients between AFEQT total and domain scores with each SF-36 domain score, EQ-5D total score, SCL severity and frequency scores, AFSS total symptoms and total AF burden scores, and GAD-7 score. In general, AFEQT domains that assess physical functions (ie, daily activities) were hypothesized to correlate strongly (r≥0.4) with the Physical Function and Role Physical SF-36 domains but poorly (<0.4) with the mental or emotional function domain instruments (ie, Mental Health SF-36 domain or GAD-7 that assess anxiety). We also hypothesized that because the AFEQT Symptom domain and the Symptoms Checklist (SCL) Severity Score and AFSS total symptoms score are all measuring the construct of AF symptoms, we would expect that patients' AFEQT Symptom domain score would highly (r>0.4) and negatively correlated to SCL severity score and AFSS total symptoms scores (note that a negative correlation is predicted in this case because low scores on the AFEQT Symptom but high scores on the SCL severity score and AFSS total symptoms score all indicate worse symptom condition).
Known Group Validity
Known group validity assesses whether AFEQT can discriminate between groups that are known to be clinically different.20 AFEQT Overall scores at baseline were evaluated and compared between patients grouped according to physicians' clinical evaluation of symptom severity at baseline. Physicians' clinical assessment of their patients' AF symptoms was explicitly collected on the case report forms in the following categories: asymptomatic, mild, moderate, and severe. We hypothesized that patients with more severe AF symptoms would have worse health status as indicated by lower AFEQT Overall scores. We also compared AFEQT scores of patients who had AF symptoms within versus greater than 4 weeks from baseline. This was used as an additional estimate of the association between the severity of symptoms, assuming that more recent episodes of AF would be considered by patients to be more severe than more remote episodes, with the hypothesis that patients who reported AF symptoms within the past 4 weeks would have lower Overall AFEQT scores than those who had AF symptoms >4weeks before completing AFEQT.
The responsiveness of an instrument is its ability to detect clinically meaningful changes in a patient's health status over time. Changes in AFEQT Overall and domain scores as well as other instruments' scores from baseline to month 3 in group 2 and 3 patients were used to evaluate responsiveness to change over time.
A variety of responsiveness statistics are available.21 We selected effect size as a measure of the change in instruments' scores within each group and calculated it for all questionnaires. To calculate the effect size, the change in mean scores for each instrument was divided by the standard deviation (SD) of mean scores at baseline for that instrument, an approach recommended for quantifying the magnitude and meaning of changes in health status measures.22 As a benchmark for the interpretation of the effect sizes, we used the Cohen approach of defining effect sizes of 0.2, 0.5, and 0.8 as indicative of small, moderate, and large clinical changes.23 A series of paired t tests were conducted to compare changes in scores for all questionnaires.
Handling of Missing Data
Missing value calculations of summary scores were addressed according to the directions of the authors of each instrument. For AFEQT, at least 50% of completed responses for each domain are required to calculate a meaningful score. Throughout all analyses, probability values of less than 0.05 were considered statistically significant. All reported probability values were based on 2-tailed tests. All statistical analyses were performed with the SAS systems version 9.1 (Cary, NC.).
Sociodemographic and Medical Characteristics of Participants
Of 219 participants who consented to participate, 5 were excluded because of having had a previous AF ablation, leaving 214 who met study criteria. Overall, 34 (16%) patients had a change in “planned” treatment strategy as 3 previously planned ablations were postponed, so these patients were switched into group 1; 1 previously documented as having no change in therapy was ablated, so this patient was switched into group 3; and 30 previously documented as no change in therapy had change in pharmacological therapy and were switched to group 2. Final allocation of the 214 patients were as follows: 68 were stable and did not have a change in their treatment (group 1), 68 received AF medication adjustments (group 2), 77 underwent AF ablation (group 3), and 1 patient was not allocated to any subgroup because treatment data were missing. This patient was not included in any subsequent validation analysis but only accounted for demographically.
The sociodemographic and medical characteristics of the 214 qualified respondents are presented in Table 1 by total population and stratified by subgroups. For the total population, most respondents were white (97%), married (75%), male (58%), college-educated (51%), and had a mean age of 62±11.9 years. Sixty-six percent of participants had paroxysmal AF, 24% persistent AF, 5% had longstanding persistent AF, and 5% had permanent AF. Mean duration of AF history was approximately 6 years and based on physicians' assessment, >70% of participants had mild-moderate severity of AF symptoms. Most recent AF episodes were 38.5±148.8) days before enrollment. Forty-five percent of the subjects had previously received cardioversion and the majority were treated with either rhythm or rate control antiarrhythmic agents. Similar trends for demographic and medical characteristics were observed in all 3 subgroups except for most recent AF episodes, which were 148.5±327.2 days, 27.9±71.4 days, and 29±51.6 days for groups 1, 2, and 3 respectively. Of the 213 qualified patients, 210 participants (68 in group 1, 66 in group 2, and 76 in group 3) had evaluable data at month 3.
The initial 42-item AFEQT took patients an average of 9.3 minutes (range, 3 to 43 minutes; SD, 6.4; median, 8) to complete at baseline and 7.3 minutes (range, 2 to 30 minutes; SD 3.5; median 6) to complete at month 3. Although this completion time appears to be adequate when compared with other questionnaires, such as the SF-36, about 13% of patients took ≥15 minutes to complete this 42-item version questionnaire. Except for the item asking patients about their sexual relationship that had excessive (15%) missing response, the completion rates of the 42-item AFEQT questionnaire was high, with completion rates for any item ranging from 94% to 100% at baseline and 92% to 100% at month 3.
A confirmatory factor analysis of the 39 items (excluding the 3 satisfaction items) to test the construct of the instrument, as compared with a priori expectations, indicated 3 rather than 5 domains. The activity-related items from previous Physical Function and Social Function domains collapsed into Factor 1, disease and treatment concern items from previous Emotional Function and Treatment Concerns domains pooled into Factor 2, and the same symptoms related items remained as Factor 3 (Table 2). On further review of each item from these 3 new factors, although some items had acceptable loading (0.4) onto the appropriate factors, others had much higher loading (0.66). In addition, although 1 item had excessive missing data, others were not responsive at follow-up, duplicative, or were not relevant to most patients. Given that 13% of patients took >15 minutes to complete this 42-item version questionnaire and the greater feasibility of deploying a shorter instrument in future studies, the authors agreed to further shorten this initial 42-item AFEQT questionnaire. As a result, 21 items were deleted because they either had a correlation r<0.66 to a specified factor or other reasons, as listed above. Detailed reasons for deletion of each item are listed in Table 2. Although items 4 (“lightheadedness or dizziness”) and 25 (“feeling worried or anxious that AF can start anytime”) had a correlation of <0.66 to factor 3 and 2, respectively, clinicians thought that they were clinically important items to retain because these symptoms and concerns are so common among AF patients. The resulting questionnaire included 4 items in Symptoms, 8 in Daily Activities, and 6 in Treatment Concern domains.
Separate item correlation testing, conducted for the fourth Satisfaction domain, revealed that 1 of the 3 satisfaction items, quantifying patients' satisfaction with health care provider about their treatment, did not correlate with the other 2 satisfaction items that assessed patients' perception of the quality of their AF treatment. As such, that item was removed, leaving the final version of AFEQT with 20 items (online-only Data Supplement), which was used for the formal psychometric testing reported below. A copy of the instrument, its scoring instructions, and implementation manual and a license for its use can be obtained at www.AFEQT.org.
Although the greatest insight into the impact of AF on patients' health status is likely to be obtained by evaluating the domains individually, the Overall score was designed to provide a more comprehensive picture of patients' health status. To ensure that the Overall score adequately represented the individual domains, correlation analyses revealed that each AFEQT domain score highly and significantly (P<0.0001) correlated to the Overall AFEQT score. The Symptoms domain correlated at r=0.7, Activities domain at r=0.9, and Treatment Concerns domain at r=0.8 to Overall summary AFEQT score. One can thus conclude that a patient who is severely affected by AF based on an Overall low AFEQT score is likely to have low scores for the other domains.
Cronbach α reliability coefficient was >0.88 for the AFEQT Overall score and the 4 domains: Symptoms (0.95), Daily Activities (0.94), Treatment Concern (0.90), and Treatment Satisfaction (0.88), supporting internal consistency of AFEQT.
ICCs, calculated for the 43 patients from group 1 who responded that they were “about the same” in their overall well-being with respect to their AF at month 1, were all acceptable: Overall (0.8), Daily Activities (0.8), Treatment Concern (0.7), and Treatment Satisfaction (0.7), indicating acceptable test-retest reliability of AFEQT, except for the Symptoms domain that was at 0.5.
Convergent and Divergent Validity
We hypothesized that in general, AFEQT domains that assess physical functions (ie, daily activities) would correlate strongly with Physical Function or Role Physical SF-36 domains but poorly with mental or emotional function domain instruments (ie, Mental Health SF-36 domain or GAD-7 that assess anxiety). As expected, results in Table 3 showed that AFEQT Daily Activities domain had higher correlations to SF-36 Physical Functioning, Role Physical, Social Functioning, Vitality domains, and PCS score as compared with SF-36 Mental Health, Role Emotional domains, and MCS score. Conversely, the AFEQT Treatment Concern domain had lower correlations to SF-36 Physical Functioning, Role Physical, Social Functioning, Vitality domains, and PCS scores as compared with the SF-36 Mental Health, Role Emotional domains, and MCS score. The GAD-7 instrument (that assesses anxiety) also had higher absolute correlation to AFEQT Treatment Concern domain as compared with AFEQT Daily Activities domain, whereas the AFEQT Symptoms domain had higher correlations to Symptoms Checklist Severity and Frequency Scores and AFSS Total Symptoms Score than to the GAD-7 score, EQ-5D, or any of the and SF-36 domains scores. All correlations were highly statistically significant (P<0.0001). Table 3 further confirms that the direction of correlation was as expected. Satisfaction items did not have a potential criterion standard with which they could be compared.
Known Group Validity
To establish the ability of the AFEQT Global score to discriminate among different severities of AF patients' disease, as per clinicians' assessments, we compared the mean scores in this domain according to physicians' clinical categorization of patients into asymptomatic, mild, moderate, or severe AF. The corresponding AFEQT Global scores for the asymptomatic, mild, moderate, and severe groups were 71.2±20.6, 71.3±19.2, 57.9±19.0, and 42.0±21.2, respectively (P<0.001). Discriminant validity was also evaluated by comparing AFEQT Global scores among patients who had AF symptoms within 4 versus >4weeks from baseline AFEQT completion. Of the 158 patients who had AF symptoms within the past 4 weeks, their mean Overall AFEQT score was 57.1±21.8, and of the 52 patients who had AF symptoms >4 weeks before completing the questionnaire, their mean Overall AFEQT score was 66.6±22.7 (P<0.01).
In contrast to the smaller mean changes in stable patients in group 1, those in groups 2 and 3 demonstrated substantially larger changes in 3-month scores (Tables 4 and 5). The AFEQT Overall score increased by 5.14±14.4 points in group 1 (no treatment change), by 9.8±20.0 points in group 2 patients (treated with pharmacological therapy), and by 23.0±19.7 points in group 3 patients (treated with AF ablation). Using the Cohen effect size responsiveness to assess clinically meaningful changes, these mean changes from baseline were congruent with moderate and large responsiveness, as shown by effect sizes of 0.5 to 1.1 when compared with the smaller effect size of 0.2 in group 1. Greater effect sizes were also observed for group 3 Symptoms, Daily Activities, Treatment Concern, and Treatment Satisfaction domains when compared with the same domains for group 2 (Figure 2). Although magnitude of responsiveness of SCL and AFSS were comparable to AFEQT, generic instruments such as SF-36 and EQ-5D resulted in lesser effect sizes for either groups 2 or 3 (Figure 2).
Quantifying patients' perceptions of their disease with patient-reported outcomes is becoming an increasingly important method for defining the efficacy of new treatments and determining the quality of health care. To assess such outcomes, disease-specific questionnaires must be constructed to be valid measures of the constructs that they purport to quantify, to be reproducible in stable patients, and to be sensitive to clinical change. By following the draft patient-reported outcome guidelines issued by the Food and Drug Administration,9 we developed a new, comprehensive, disease-specific health status measure for patients with AF. We then conducted a series of steps to refine and reduce the instrument and to demonstrate its validity, reliability, and responsiveness in a distinct cohort of patients. The final 20-item instrument provides a 4-item Symptoms score, an 8-item Daily Activities score, a 6-item Treatment Concerns score, and a 2-item Treatment Satisfaction scale. The first 3 of these domains can be grouped to form an Overall score that was also shown to be valid, reliable, and responsive to clinical change.
In contrast to generic HRQoL instruments, disease-specific questionnaires, like AFEQT, allow patients to quantify their limitations and the extent to which they are attributable to a specific disease. In prior studies, lack of specificity and sensitivity to change in AF status has been reported with AF studies that used the SF-36, including the AIRCRAFT and CTAF studies.24,25 Whether the lack of an observed treatment effect on patients' health status outcomes is due to limited therapeutic benefits or insensitivity of the SF-36 is unknown, but there have been numerous examples of the increased sensitivity of disease-specific measures, as compared with the SF-36, in other cardiovascular conditions.26,27 Although there have been other attempts to create disease-specific health status measures in AF such as the AFSS,13 SCL,14 and more recently, the Swedish short symptom rating scale28 and the Brazilian quality of life in AF patients (QLAF),7 these instruments do not capture the disease impact on patients' physical and emotional function. Another recently published instrument, AFQLQ, which assesses symptoms, severity of symptoms, anxiety, and limitation of daily activities related to AF and AF treatment, was used in a Japanese population and is not yet available in English.29 Similarly, the AF-QoL, an AF-specific questionnaire assessing physical and psychometric components, was recently validated, but only in the Spanish population.8 The concurrent activities of so many groups to develop disease-specific health status measures underscores the global perception of the importance of such measures and the failure of existing instruments to capture the range of clinical manifestations of AF. The AFSS evaluates AF treatment and symptoms, whereas the SCL, short symptom rating scale, and QLAF focus on impact of AF symptoms on patients. These measures do not, however, quantify the other important domains of daily activities and treatment concerns associated with AF. By broadening the range of health status measured, we believe that the AFEQT instrument represents a significant methodological advance over previous measures by explicitly generating items from patients' perspectives and broadening the framework to include daily activities, treatment concerns, and satisfaction with treatment in a single instrument.
AFEQT was also shown to be an internally reliable instrument as indicated by very high Cronbach α coefficients (>0.88). It was also a reliable instrument as supported by satisfactory test-retest ICC in stable patients for Overall score, Treatment Concern, Daily Activities, and Treatment Satisfaction domain scores. However, the ICC was 0.5 for the Symptom domain score. This slightly lower ICC may be attributable to the nature of patients' AF disease whose symptoms are more variable.
This study also showed appropriate construct validity of AFEQT by demonstrating adequate convergent and divergent correlations (r≥and <0.4, respectively) of AFEQT domains with other commonly used and well-established questionnaires. The findings supported our hypotheses that AFEQT Treatment Concern domains correlated better (r≥0.4) to other mental or emotional function domain instruments when compared to other physical function domain instruments and that the AFEQT Daily Activities domains correlated less well (r<0.4) to other mental or emotional function domain instruments.
Importantly, these preliminary findings suggest that the AFEQT instrument is quite sensitive in discriminating the severity of patients' AF and in detecting changes in their AF over time. For example, the AFEQT Overall scores were able to distinguish between patients with asymptomatic/mildly symptomatic AF, as compared with those having moderate/severe AF and patients who had AF symptoms within 4 weeks versus > 4 weeks from questionnaire completion. It was also able to distinguish, in a proportional and anticipated way, the changes in patients' conditions associated with no change in treatment as compared with changes in pharmacological therapy and ablation therapy. AFEQT was also more responsive than the generic SF-36 and EQ-5D questionnaires, and as responsive to disease-specific instruments, such as the SCL and AFSS.
Although we accomplished the primary goal of this study by providing preliminary evidence to support the reliability, initial validity, and responsiveness of this questionnaire for use in an AF population, our findings should be interpreted in the context of the following potential limitations. First, although we conducted extensive interviews and surveys to develop the initial AFEQT, we may have inadvertently missed important subsets of patients who may experience their AF differently than those studied. In particular, most of our patients were symptomatic, and there may be subtle manifestations of the disease in seemingly asymptomatic individuals that were not captured. In addition, AF is predominantly a disease of the elderly and the mean age of our population, 62.1±12 years, was lower than the majority of AF patients. However, most clinical trials in AF have often involved patients of a similar mean age. Finally, we used a longer instrument in our validation studies that was condensed after these studies to minimize the response burden while preserving the desired psychometric properties of the tool. It would be valuable to replicate these analyses in a future population to confirm the validity, reliability, and responsiveness of the final tool.
The AFEQT questionnaire is a novel, disease-specific HRQoL instrument explicitly developed for use as an outcome measure in clinical trials, as a tool for disease management, and as a potential marker of health care quality. It was developed with patient input and shown to be feasible, reliable, valid, and responsive to treatment. Future work applying the AFEQT in a variety of applications will enable patients' experiences of their AF, and its treatment, to be a foundation for a more thorough understanding of how best to elevate the efficacy of AF therapy and the quality of caring for AF patients.
Sources of Funding
The study was supported by the St Jude Medical, Atrial Fibrillation Division.
John Spertus, Paul Dorian, Steve Lewis, Anil Bhandari, and Caroline Burk received consulting fees from St Jude Medical.
The online-only Data Supplement is available at http://circep.ahajournals.org/cgi/content/full/CIRCEP.110.958033/DC1.
- Received June 16, 2010.
- Accepted November 18, 2010.
- © 2011 American Heart Association, Inc.
- Miyasaka Y,
- Barnes M,
- Gersh B,
- Cha S,
- Bailey K,
- Abhayaratna W,
- Seward J,
- Tsang T
- Calkins H,
- Brugada J,
- Packer D,
- Calkins H,
- Brugada J,
- Packer DL,
- Cappato R,
- Chen SA,
- Crijns HJ,
- Damiano RJ Jr.,
- Davies DW,
- Haines DE,
- Haissaguerre M,
- Iesaka Y,
- Jackman W,
- Jais P,
- Kottkamp H,
- Kuck KH,
- Lindsay BD,
- Marchlinski FE,
- McCarthy PM,
- Mont JL,
- Morady F,
- Nademanee K,
- Natale A,
- Pappone C,
- Prystowsky E,
- Raviele A,
- Ruskin JN,
- Shemin RJ
- Arribas F,
- Ormaetxe JM,
- Peinado R,
- Perulero N,
- Ramírez P,
- Badia X
Patient-reported outcome measures: use in medical product development to support labeling claims. Available at: http://www.fda.gov/OHRMS/DOCKETS/98fr/06d-0044-gdl0001.pdf.
- Spertus J
- Brooks R,
- Rabin R,
- de Charro F
- Bubien R,
- Kay G,
- Jenkins L
- Streiner DL,
- Norman GR
- Fayers P,
- Hays R
- Hays R,
- Revicki DA
- Cohen J
- Weerasooriya R,
- Davis M,
- Powell A,
- Szili-Torok T,
- Shah C,
- Whalley D,
- Kanagaratnam L,
- Heddle W,
- Leitch J,
- Perks A,
- Ferguson L,
- Bulsara M
- Harden M,
- Nystrom B,
- Kulich K,
- Carlsson J,
- Bengtson A,
- Edvardsson N
Although the impact of atrial fibrillation (AF) on health-related quality-of-life is acknowledged by patients and clinicians, there is currently no comprehensive, validated, disease-specific questionnaire to capture the extent to which AF affects patients. Knowledge of health-related quality-of-life impairment has been limited to data generated from generic and symptom-focused questionnaires rather than an instrument with indicators identified as relevant and important to patients. To address these limitations, the authors developed and validated a patient-guided AF health-related quality-of-life questionnaire. Key differentiation from commonly used AF questionnaires in development of the Atrial Fibrillation Effect on QualiTy-of-life (AFEQT) includes conducting of focus groups to ensure that generated items were from patients and perception of their importance. In addition, psychometric validation of AFEQT included internal consistency and known group validity, which assesses whether AFEQT can discriminate between groups that are known to be clinically different. Test-retest reliability was assessed by comparing 1-month changes in scores among those with no change in therapy and effect size was used to assess responsiveness of AFEQT following standard of care interventions. Based on the initial validation of the AFEQT questionnaire, results support its use as an outcome measure in clinical studies to assess the impact of AF and its treatment on patients.