Data fusion: its advantage in public health

Khalil, George Magdy

Data fusion: its advantage in public health

Khalil, George Magdy

2015

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Maximizing the utility of surveys while not adding questions is of utmost importance to surveillance systems. Public health agencies need to keep the ever-decreasing number of participants from breaking off after an interview is started. A common reason a participant breaks off is due to the length of the survey. It is therefore important that organizations conducting surveillance investigate innovative techniques of combining data from multiple, less extensive surveys. Data fusion is one such technique that has been used to integrate databases to save time and money. Health insurance status is a good topic to use for the validation of data fusion because this variable is common to many data sources and has a body of literature documenting factors associated with being insured. Besides data availability, respondents are thought to be accurate in reporting health insurance status and type (Call et al., 2008a). The goal of this research was to create "statistical twins" based on health insurance status from two data sources. Matched respondents were considered "statistical twins" and used to test whether data fusion is an effective method of predicting a variable not originally asked in the survey, given the respondents profile. Data from the Behavioral Risk Factor Surveillance Systems (BRFSSs) survey and the National Health Interview Survey (NHIS) were matched by first harmonizing the variables from the two data sources. A propensity score was calculated, which was then used toperform Mahalanobis and Nearest Neighbor matching across the two surveys. The efficiency of the match was then validated: 88.2% of the 297,734 BRFSS respondents reported being covered by a health insurance, while 83.0% of the 27, 921 NHIS respondents reported currently being insured. Propensity scores were left-modal for both the NHIS and the BRFSS. Quantile- Quantile (QQ) plots, which plot the quantiles of one data set against another data revealed that after the match, the empirical distributions were similar in the BRFSS and NHIS groups. Compared to the original BRFSS dataset, the 2-to1 Nearest Neighbor (NN) algorithm was the closest to the BRRFSS respondents (86.2% [86.0, 86.50] versus 88.2% [88.1, 88.3], respectively). This is quite good considering national estimates differ by a few percentage points from survey to survey. Our imputed estimates are not within the confidence interval of the BRFSS. However, being within the narrow BRFSS confidence interval may be too rigorous a standard because of the very large sample size of the BRFSS. Sensitivities and specificities reveal that 2-to-1 NN with replacement and Mahalanobis were more accurate than Nearest Neighbor methods with caliper, without replacement and 1-to-1 matching.

Details

Record ID

8385

Record Created

2024-12-05

Title

Data fusion: its advantage in public health

Author

Khalil, George Magdy

College or School

College of Public Health

Department

Health Promotion and Behavior

Date

2015

Publisher

University of Georgia

Content Type

Dissertation

Language

English

Dissertation/ Thesis Note

Graduate

Degree Type

Doctor of Public Health

Name of Granting Institution

University of Georgia, Winter 2015

Year Degree Granted

2015

Keywords

Data Fusion; Data Integration, Matching, BRFSS, NHIS

Record Appears in

College, School, or Unit > College of Public Health > Health Promotion and Behavior
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949334748502959

PDF

Statistics

Download Full History