Basic Survey Data

The survey data was collected through 8 waves of data collection over a four year period (see Table 5: Basic and Network Survey Questionnaires for dates when data was collected) using Qualtrics software to design and implement the survey instrument and to collect online answers from participants.  Collected data was exported into Stata (a statistical and data processing application) for cleaning and processing, and then merged with previous cleaned data waves.  The resulting 8 waves of merged survey data contain for each study participant information on variables from each of the surveys that they completed.   

722 students took at least one of the surveys, 698 of whom enrolled in the study.  Some students took the survey and filled out consent forms but decided not to participate.  We include their limited data here because if they were named in network surveys by study participants they receive a five digit case number code indicated we have survey data on them.   

The data file exists as both a Stata .dta file and a raw .csv file. A Stata generated codebook was created (see Basic Survey Codebook).  There are over 3000 variables.  The file is organized by wave. A suffix  “_N” is attached to a variable name indicating which of the 8 survey waves the variable’s values come from.  Within waves, variables are listed in the data file roughly in the order in which questions were asked in the survey.  All non-identifying raw variables are contained in this file, along with numerous constructed variables from the original raw ones.  The values of constructed variables for a set of raw variables usually follows the set of raw variables from which they were constructed.  For example after the 44 items in the Big Five Personality Inventory there are the 5 constructed personality measures: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. 

Copies of the Qualtrics survey instruments for each wave are in the Documentation SectionTable 5: Basic and Network Survey Questionnaires provides links to the .pdf and MS Word .doc version of each survey. 

Not all variables are asked in each wave.  Table 4:  Basic Survey Questions by Wave, contains information on which sets of variables were asked in which waves.  Variable names generated by Qualtrics begin with “Q” and have the form “Qn.m” where “n” refers to the section bloc of the survey and “m” for the question’s number within that block.  These names appear in the .doc and .pdf copies of the survey. During data cleaning using Stata, all names were changed to meaningful variable names indicating what the variable measures.  Variable name labels for the names were added and include information on the question (including sometimes Qualtrics question numbers).  This way users can go to the .pdf or .doc file for a specific survey and look up the exact question for the variable of interest.   Users can also use Table 4  to identify where in the survey specific questions are located.   We employed many mental health and social-psychological scale systems developed by others. Table 6: Basic Survey Scales contains links to documents with the items we used and the methods for scale construction.  

Here is a brief summary of the data collected through these surveys:

    • Family Background & Demographic Variables:  
      • Information on parents’ education, income, religion, race/ethnicity, place of birth, immigration status, and languages spoken. 
      • Information on where a student is from is numerically coded in terms of region so that researchers can identify students that came from the same region of the US.  
      • Information on a participant’s gender, sexual orientation and identities, and race/ethnicity.  A simple”race” variable is constructed, but users can construct their own indicators from the very detailed race and ethnicity questions asked in the first survey. 
    • Student Background: Information on a participant’s prior schooling, activities, clubs, interests, religious affiliation, and aspirations.
    • Self-reported course, grade information, and major:  Students were asked to list courses taken each semester and their semester GPA.  The course identification numbers we use are based off of Notre Dame’s Course Record Numbers (CRNs) provided by the respondent and are formatted as YYSSAANNNN, where YY is academic year from 20YY-20(YY+1), and SS is either 01 for Fall, 02 for Spring and 03 for summer.  AA is a 2-digit code for the department or area the course is in, again anonymized but so that two CRNs with the same AA are courses in the same area. NNNN is the anonymized CRN number for a course in a specific area in a specific semester. This coding allows users to identify pairs of students who took the same class and pairs of students who took classes in the same department (either in the same semester or across different semesters).  All self-reported course and grade data are reported together at the end of the data file. See note in Basic Survey Codebook for these variables.  Note that codes for CRNs used in the basic survey are different from the coding of CRNs in the administrative data obtained for the Registrar.  We discuss that data and the coding in the Course and Grades Data section.  

      Students were asked to list their majors, which are numerically coded so that researchers can identify two students with the same major and if a student changes major, but not the exact major.  Users can identify whether 2 students’ majors are in the same college (if not the exact same major) using the numerical code (see note in Basic Survey Codebook
    • Statuses:  Information on whether a student works, is a student-athlete, and dorm residence.  Numeric codes for dorms allow users to identify students residing in the same dorm and changes in dorm residency, but not the specific dorm.  In later surveys, students are asked about study abroad experiences. In order to maintain anonymity, the place where a student studied is numerically coded allowing researchers to identify two participants who studied abroad in the same place, but not to identify the place.  
    • Activities:  A battery of questions ask about the frequency of various activities and things the student did during the semester such as drug and alcohol consumption, exercise, volunteering, discussing health issues, playing video games, listening to music, and reading for pleasure.
    • Clubs:  The names of up to 10 clubs the student belonged to were collected in each wave.  Clubs are coded using the code schema in Table 7:  Consolidated Activities & Clubs Coding Schema allowing users to identify two participants who are in similar types of clubs  but not to identify users who are in the exact same club. See discussion below on coding.
    • Cultural Activities:  Open-ended questions were asked about cultural activities attended and coded using the schema in Table 7.   See discussion below on coding.
    • Physical Activities:  Participants were asked in the Network Survey to list up to five physical activities that they like to do.  We then asked them whether each of their alters likes to do each of these things in order to get data on similarity of interests.  This data is available in the Network Survey Data file, but we also merge the coded physical activity data for the study participant into this basic survey data file. Codes used are the same for Clubs and Cultural Activities and can be found in Table 7.
    • Musical Tastes: Participants were asked about the types of music they like and dislike.  In waves 2, 5 and 8 asked as open-ended questions and then coded. In wave 7 respondents were asked to select genres.
    • Personality – Big 5:  The standard BIG 5 inventory was used to compute measures of Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. 
    • Social-Psychological Scales:  As detailed in Table 6  (with links to documentation on each scale) , we employed a number of well-known measures:
      • Self-Esteem
      • Trust
      • Self-Regulation (two versions, See Table 6 )
      • SELSA ( Social and Emotional Loneliness Scale for Adults), see Table 6 for types
      • Self-Efficacy (two versions)
  • Health Related Questions
      • Subjective assessments of overall health and happiness
      • Diet
      • Height and weight
      • Health Satisfaction (Body Image) Inventory
      • Sexual orientation and activity
      • Disability and conditions
      • Medication and drug use
      • Hereditary diseases and conditions
      • Major Life Changes [coding of up to three mentions in terms of (1) type of event and (2) whether a positive or negative event. See codebook]
      • Disabilities, mental health issues and additional health issues measured at various times with open-ended questions.  Two codes were created for each mention, whether the condition was chronic or acute (code A), and the type of condition — physical/bodily, mental, sleep, disease (code B). 
    • Mental Health: As detailed in Table 6  (with links to documentation on each scale) we asked questions for the following scales:
      • STAI (State-Trait Anxiety Inventory) 
      • BAI (Beck Anxiety Inventory)
      • Stress 
      • BDI (Beck Depression Inventory). 
      • CES-D (Center for Epidemiologic Studies – Depression Scale)
    • Sleep  (see Table 6 )
      • PSQI (Pittsburgh Sleep Quality Index)
      • MEQ (Morningness -Eveningness Questionnaire )
  • Physical Activity and Sleep expectations
  • Political Attitudes and Views  
  • Computer, Phone, Fitbit and Technology Usage 
  • Fitbit Use 

All variables are numerically coded with value labels supplied in the Stata data file indicating the meaning of numeric codes.  In some cases, notes in the codebook provide additional information on the meaning of the code. Variables that contain within their name “rc” indicate that the original variable was a text variable from an open-ended question that was manually coded into categories represented by the value labels.  Variables that were asked in Qualtrics as “check all that apply” are code “1” for presence of trait, and missing otherwise.  

A major coding effort involved coding the numerous clubs, cultural activities, and physical activities students engaged in.  A combined unified coding scheme was developed and a protocol implemented for manually coding all text entries.  We wanted to develop a unified system because we expected there to be overlap in clubs and activities.  The coding scheme is detailed in Table 7:  Consolidated Activities & Clubs Coding Schema, and was used to code all the club, cultural activities and physical activity text data from open-ended questions. 

All participants enrolled in the NetHealth study at the time of the release of a survey were asked to complete the survey.  For Tier 1 subjects recruited prior to arriving on campus, the Wave 1 survey was completed prior to arrival at ND. For Tier 2 subjects, the first survey was completed in the Fall of 2015.  Tier 3 subjects who joined the study in the Spring of 2016 completed a combined Wave 1 and Wave 2 survey. When merging survey data for Tier 3 subjects, information from answers to Wave 1 questions are flagged as Wave 1 survey items so that researchers can easily compare for example background questions (on race, religion, family background, high school) of participants in all the tiers.  

For the final survey (Wave 8) we asked all current and former NetHealth participants to take the survey.  Response rates were low for former participants, but their data is included.  

Note:  Not all variables are included in this released Basic Survey data.  Any identifying information has been omitted. This includes information obtained from questions about phone numbers, email and other addresses, and names used by the respondent.