Collection Lists

Collection Lists

Simplified Overview of Potential Collection Points to be Explored

Participants have the option to skip any question or measurement.

  • Acoustic properties of the breathing (listening to the lungs)
    • From the chest
    • From the mouth
    • From the nose
    • Tapping on chest
  • Breathing rate
    • Both audio and accelerometer
  • Breathing volume (accelerometer with phone on chest)
  • Blood oxygen using standard thumbprint scanners (not investigated yet, but reports suggest they may be similar to Fitbit which can sense blood oxygen levels)
  • Vocal cords
  • Pupil dilation
  • Background noise
  • Manually completed survey questions (age, gender, COVID-19 status, shortness of breath, etc.)
  • Changes compared to individual’s own previous baseline measurements
  • Any other ideas that come up as the project iterates daily (the platform will allow changes to data collection to iterate even hourly if needed)

More Examples of a Detailed List of Derivatives

These are examples of what the collection questions and platform should be able to support. Not all machine learning teams need to agree on the usefulness of each one, as each team can generate their own metadata that other teams can later leverage (more details about that below).

As you can see, this extends much further than individuals and doctors would be able to observe manually.

  • Version of app and version of collection list
  • Date and time of sample
  • Manually inputted information
    • Age (requires 16+ or age of consent)
    • Physical gender (male, female, “prefer not to say / other”)
    • Location: only options being “Canada” or “Outside Canada” for now, for privacy.
    • When was the last time you smoked more than 100 tobacco cigarettes in a month: Recently (Less than a year ago), 1-4 years ago, 5+ years ago, Never
    • Weight category (underweight, athletic, average, overweight)
    • Known pre-existing lung related health issues
    • Known pre-existing lung related heart issues
    • Ethnicity? (not yet certain if this will be relevant)
    • Pregnant
    • Model of phone (so we know what sensors and features it has)
    • Current symptoms
      • Fever (define)
      • Shortness of breath
      • Cough
      • Stuffy nose
      • Runny nose
    • Medically determined COVID-19 diagnosis within the past two weeks (none, doctor negative, doctor positive, lab test negative, lab test positive)
      • Maximum answer of all samples for that individual
    • Currently in a hospital (no, in waiting rooms, admitted)
    • Overall how do you physically feel compared to two weeks ago (if sick for longer than that, then compared to before the onset of symptoms)
      • Maximum answer of all samples for that individual
    • Self-assessment of how accurately they were able to complete the samples and the how conducive their environment was for the process (probably invalid, poor, so-so, normal, very good)
  • Cleaning data for numerous data sets
    • Cropping out initial “getting into position” noise
    • Cutting out talking or loud sporadic noises
    • Cropping to start and end on a breath rather than in middle
    • Normalizing volumes across the individual’s own past measurements
  • Background noise (individual doing nothing)
    • Average DB
    • Max DB
    • Average within frequencies of particular concern
    • Max DB within frequencies of particular concern
  • Phone on chest with normal breathing (5 minutes)
    • Accelerometer distance travelled
      • Approximate lung capacity
      • Average angle of phone
      • Change to lung capacity compared to baselines
  • Phone on chest with deep slow breathing (~30 seconds)
    • Left and right lungs separate
    • Potentially one in the middle or both higher versus lower?
    • Accelerometer distance travelled
      • Approximate lung capacity
      • Average angle of phone
      • Change to lung capacity compared to baselines
  • Actual cough or contrived coughing (at least 15 with pauses)
  • Phone between chest and stomach (~30 seconds)
    • Ratio of diaphragm versus chest breathing
  • Phone near or in mouth (to be determined) with normal breathing and jaw open wide (2 minutes)
  • Phone near or in mouth (to be determined) with deep slow breathing and jaw open wide (~30 seconds)
  • Phone near or in mouth (to be determined) saying long “ahhhh” while exhaling for 4 seconds x 3 times
  • Phone near or beside nose
  • Applicable to several breath-related samples
    • Audio where multiple-microphone phones have the auxiliary microphones used to noise cancel or isolate the desired target sound.
    • Breaths per hour
    • Normalized for a set Db of breath sounds (regardless of background noise, just of the breath sound itself, if different)
    • Variability between each breath (standard deviation and other metrics)
    • Filtered audio to specific frequencies
    • List of start of each breath to chop audio into individual breaths
    • Removal of background noise (based on background noise sample)
    • Accelerometer data to indicate how stable the phone was during samples
    • Frequency correlation to COVID-19 acoustics
  • Combined Derivatives
    • Normalizing breaths per hour by age or versus age average
    • Normalizing breaths per hour by gender or versus gender average
    • Normalizing breaths per hour by age and gender
    • All metrics as compared to:
      • previous measurement
      • first measurement where asymptomatic and not diagnosed
      • difference between first diagnosed and symptomatic and asymptomatic
      • acceleration of breathing rate or other metrics (as when graphed out over many samples)
    • Combining all of above into a single probability factor and possibly a confidence of probability factor.
    • Misc
      • Is the first breath after moving into position very different than the others?

Example Collection Concerns

  • Safety first (better to be safe than to improve collection)
  • Posture (even lying down, no pillow if possible so head is straight)
  • Head facing straight
  • Position of microphone
  • Model of microphone
  • Whether microphone should be covered to reduce vibration (as with mouth measurements)
  • Whether a book or blanket should be placed over the phone to couple the chest and phone more closely
  • Performing measurements while alone in the room
  • Possible background noise from fans or furnaces (turn off if possible)
  • Indoors (so no wind)
  • No talking entire time
  • Define deep breathing: should we say all the way and out to the maximum they can, or just generally a few deep breaths like during relaxation techniques?

Misc Brainstorming

  • rapid push on a chest (themselves or someone else) versus breathing?
  • Listening to abdominal sounds? (may be a couple days before respiratory symptoms emerge; are there noises before then?)

Contributing To This List

Visit the Machine Learning and Data Analysis page, if you are a development team.

If you are a medical professional wishing to get involved, even for a few hours; contact us at

If you are seeking to point something out or suggest a few things but otherwise not seeking to be heavily involved, please connect with us on social media; we want to hear from you! You can find us on Twitter and Facebook.