Catching Some Z’s: An Analysis of Factors of a Good Night’s Sleep
This study sets out to determine whether exercise frequency, bedtime, smoking status, alcohol consumption, caffeine consumption, age, or gender can affect our sleep quality, and to find different ways we can change our lifestyle in order to increase our sleep quality. From examining the data, we can find some very strong evidence that exercising more, going to bed around 22:00 – 23:00, not smoking, drinking at most 1 oz of alcohol in a day, drinking more caffeine in a day, and being 30 – 40 years old can all greatly increase sleep quality by maximizing how much of our sleep is deep sleep and by minimizing how many times we wake up during the middle of the night.
Do people who exercise generally have higher quality sleep?
Is there any correlation between bedtime and the amount and quality of deep sleep someone gets?
Do smoking, alcohol, and caffeine negatively impact the quality and amount of deep sleep we get in a night or just sleep in general?
Do age or gender have an impact on the amount and quality of sleep?
Signifigance
Sleep is something that impacts almost all aspects of our daily lives. It gives us energy throughout the day, and sleep also allows us to recharge mentally and physically. However, it often seems like we can’t get enough of it. This study will help to give us some answers as to what we can do to get more and better quality sleep. Sleep is a very complex process with multiple moving parts, and this study will help to shine a light on what factors affect our sleep, and how we can improve our sleep.
Background
Sleep is primarily composed of 3 broad stages: light sleep, deep sleep, and REM sleep. As you sleep, you cycle through these stages throughout the night, with each light sleep stage growing longer and deep and REM stages growing shorter throughout the night. According to sleepfoundation.org, the functions of each sleep stage are as follows:
From these descriptions, it is easy to see that deep sleep is one of the most important stage of sleep, since it is directly responsible for how effective our sleep is and for how rested we actually feel in the morning. As a result, I will focus on factors that specifically impact how much deep sleep we typically get in a night.
Data Source
The data used for this study was collected by students from ENSIAS National School for Computer Science in Morocco. Primarily, it was collected through a collection self-reported surveys, actigraphy (monitoring of sleep and activity cycles), and polysomnography (recording of vitals and brain activity during sleep).
For this study, I primarily decided to use deep sleep percentage and the number of awakenings in a given night as a measure of quality of sleep, and I also examine how some of these factors also affect the total amount of sleep in a given night. The higher the deep sleep percentage and the less awakenings, the higher quality the sleep is. Specifically, I examine relationships between variables that I believe could have some tangible effect on our quality of sleep, such as:
Exercise Frequency, Bedtime, Alcohol Consumption, Caffeine Consumption, Awakenings, and Age are all classified as numerical variables initially. However, upon working with the data set, I found that that it would be more appropriate to reclassify these variables as factors, since there are relatively few unique values between observations. I also found that the results from the analysis of age and bedtime in relation to deep sleep percentage and awakenings were easier to work with and interpret when put into age and time groups.
Throughout the analysis, I use a combination of box plots separated by group to see relationships between each categorical variable and awakening and to see differences between each group of the categorical variables. In addition, I also use box plots and violin plots separated by group to see the impact that each variable has on the percentage of deep sleep that people get in a given night. Additionally, throughout the data set there are some observations that are missing some of the values I am interested in. However, they make up a very small proportion of the data. As a result, I decided to replace all missing values with the mode of the relevant category. Additionally, there was only one observation in Caffeine Consumption of 100 mg, so I decided to remove it for clarity.
Variables
Analysis of Awakenings and Exercise Frequency
From the percentage bar chart of awakenings by exercise frequency, we can see a few things that immediately jump out. From the study, people who exercise 5 times a week are much more likely than other groups to have 0 awakenings through the night, and if they do wake up, it will only be once. People who exercise 4 times a week also are much more likely than people who exercise less to only wake up 0-1 times during a given night. Additionally, as people exercise more frequently, they are much less likely to wake up 4 times during a given night. As a broad, overall trend, we can see that the more people exercise, the less likely they are to wake up multiple times during the night. From this, it is fair to say, based on my earlier criteria for high quality sleep, that exercising more frequently can increase sleep quality.
Analysis of Deep Sleep Percentage and Exercise Frequency
From the box plots of deep sleep percentage by exercise frequency, we can see a few trends. First, the median deep sleep percentage of all exercise frequency groups hovers between 56% - 60%. Additionally, we can see that the distribution of deep sleep percentage is heavily left skewed for people who exercise 0-1 times per week, and it is slightly right skewed for people who exercise 2 - 5 times per week. The distribution of people who exercise 1 time a week has the highest spread, with people who don’t exercise at all falling closely behind. People who exercise 5 times a week have the tightest spread, and people who exercise 4 times a week is the only category that has the highest median value with relatively low spread and few medians. From this, we can see that people who exercise typically have consistently better sleep quality than those who exercise 0-1 times a week.
Analysis of Awakenings and Bedtime
After looking at the conditional distribution of awakenings by bedtime, we can see a few trends. First, people who go to bed from 1-2:00 are much more likely than other groups to wake up 4 times in a given night. Additionally, people who go to bed from 22-23:00 are the most likely to have 0 - 1 awakenings throughout the night. We can also see an important overall trend. Starting from 22 - 23:00, the later someone goes to bed, the more likely they are to wake up during the night more than 1 time. Additionally, people who go to bed too early, like from 21- 22:00, are also much more likely to wake up more than once in a night. From this graph, we can see that the best time to go to bed to minimize awakenings is from 22 - 23:00.
Analysis of Deep Sleep Percentage and Bedtime
From the violin plots that show the distribution of deep sleep percentage by bedtime, we can see a few different things. First, the median of all groups hovers around 58%. However, the groups primarily vary from each other in terms of spread and shape. People who go to sleep from 21 - 23:00 have the smallest spread of data.Additionally, by looking at the probability density portion of the violin plot, we can also see that people who go to sleep from 21 - 22:00 are more likely to get about 60% deep sleep, and that people who go to bed from 22 - 23:00 are more likely to get 58% deep sleep in a given night. Additionally, we can see that the distribution for people who go to bed from 23 - 24:00 and from 1 - 3:00 have a much wider spread and are extremely left skewed. This means that these ranges are much more inconsistent in deep sleep percentage. From the probability density portion of the plot, we can see this, with how the peaks of probability density are much shallower than those of the previously mentioned groups. The amount of deep sleep someone in these groups is much more varied, and as a result inconsistent.
Analysis of Awakenings and Smoking Status
When looking at the conditional distribution of awakenings by smoking status, one thing becomes very clear. Primarily, both smokers and non-smokers share incredibly similar distributions of awakenings. Nonsmokers are marginally more likely to have 0 or 4 awakenings than smokers, and smokers are more likely to wake up 1-3 times in a given night than nonsmokers. However, the differences between the distribution smoker and nonsmoker awakenings are so marginal that they are not statistically significant. As a result, we can say that smoking doesn’t seem to have a strong influence on whether someone wakes up more often or not.
Analysis of Deep Sleep Percentage and Smoking Status
There are some key differences between the distribution of deep sleep percentage between smokers and nonsmokers based off of the box plots. People who do not smoke have a slightly higher median deep sleep percentage than smokers. Additionally, the nonsmoking group has the highest overall observation of deep sleep percentage. The distribution of smoker deep sleep percentage is heavily left skewed with very high spread, while the distribution of nonsmoker deep sleep percentage is slightly right skewed, with a much tighter spread. From this, we can conclude that, while the median percentage of deep sleep is comparable between the two categories, nonsmokers typically and more consistently have a higher deep sleep percentage than smokers do. As a result, we can say that there seems to be a relationship between smoking and poor sleep quality.
Analysis of Total Hours Asleep and Smoking Status
Once again, there are some key differences between the distribution of hours asleep for smokers and nonsmokers. While both distributions are slightly left skewed, it is important to note that nonsmokers have a higher median amount sleep in a night when compared to smokers by about 1 hour. Additionally, the distribution of hours asleep for smokers has a much higher spread than nonsmokers. As a result, it is fair to say that nonsmokers typically get more sleep in a given night than smokers, and the amount of sleep that nonsmokers get is much more consistent than that which smokers get.
Analysis of Awakenings and Alcohol Consumption
When looking at the conditional distribution of awakenings by alcohol consumption, we can notice a few general trends. First, people who have 0 oz 24 hours before bedtime are much more likely to have 0-1 awakenings in a night than to wake up 2 or more times. People who only have 1 oz follow a similar trend, but they are more likely to wake up 1 time than people who have no alcohol. From there, people who drink 2 oz or more are much more likely to wake up 2 or more times, and people who drink 4-5 oz of alcohol are much more likely than other groups to wake up 4 times in a given night. From this, we can see a general trend. The more alcohol someone drinks, the more likely they are to wake up in the night and, the more likely they are to wake up multiple times.
Analysis of Deep Sleep Percentage and Alcohol Consumption
After examining the box plots that show the distribution of deep sleep percentage by alcohol consumption, we can see a few crucial things. People who drink no alcohol have the second highest median percentage of deep sleep, and the distribution of deep sleep percentage for this group has the smallest spread. People who drink 1 oz of alcohol have the highest median percentage of all categories and the second smallest spread of all groups. However, the distribution for people who drink 1 oz is heavily left skewed. For people who drink 2 - 5 oz of alcohol in a day, the median value quickly drops off for people who only drink 0-1 oz of alcohol. Additionally, the distribution for people who drink 2 or more oz also has a very large spread. From this, we can see the trend that an individual’s deep sleep percentage is greatly and negatively affected by drinking more than 1 oz of alcohol. Not only are they more likely to get less deep sleep in a given night, but they also are very inconsistent in the amount of deep sleep they get when compared to those who drink 0-1 oz of alcohol.
Analysis of Total Hours Asleep and Alcohol Consumption
From the distribution of hours asleep by alcohol consumption, we can notice a few important things. First, people who drink 0 oz of alcohol 24 hours before bedtime have the highest amount of hours asleep among all groups, while also being slightly left skewed. Additionally, people who drink only 1 oz of alcohol have the second highest median amount of hours asleep, following closely behind people who drink no alcohol. From 2 oz of and alcohol and beyond, the median hours asleep for all groups noticeably drops off from the median value of about 6.3 for people who drink 0-1 oz of alcohol, with the median for most other groups hovering around 5 hours asleep. Additionally, the spread for all groups is about the same, but the groups who drink 2 or more oz of alcohol have a slightly larger spread than those who drink 0 - 1 oz of alcohol. From this, we can conclude that people who drink 0 - 1 oz of alcohol typically get more hours of sleep in a given night than those who drink 2 or more oz of alcohol in a day.
Analysis of Awakenings and Caffeine Consumption
When looking at the conditional distribution of awakenings by caffeine consumption, there are a few important things to note. Surprisingly, among all groups people who drink no caffeine in a day are more likely to wake up multiple times during in the night than not and are the most likely to wake up in general. Also, people who drink 200 mg of caffeine in a day are much more likely to have 0 awakenings during the night when compared to other groups, and they are also the least likely to wake up 3 - 4 times during a given night. People who drink 75 mg of caffeine are also the most likely to wake up 1 time during a night and the least likely to wake up more than once. However, among the rest of the groups, the distribution of awakenings is relatively similar. From this plot, we can see that people who drink a lot of caffeine are more likely to wake up 0-1 times in a given night than people who drink no to small amounts of caffeine. Additionally, the less caffeine someone drinks, the more likely they are to wake up 4 times in a night.
Analysis of Deep Sleep Percentage and Caffeine Consumption
When looking at the box plot, we can see a few things with how caffeine consumption affects deep sleep percentage. People who drink 75 milligrams have the highest median deep sleep percentage and the tightest spread. People who have 200 milligrams of caffeine have the second highest median deep sleep percentage, with a moderate spread and no outliers. People who drink no caffeine or 50 milligrams of caffeine have the lowest median amount of deep sleep, and are also heavily left skewed with very high spread. Overall, there doesn’t seem to be much of a trend since the scatter plots seem to fluctuate between groups. While there doesn’t seem to be an overall trend, 75 milligrams of caffeine seems to be the optimal amount of caffeine to drink to maximize deep sleep.
Analysis of Total Hours Asleep and Caffeine Consumption
When looking at the distribution of hours asleep by caffeine consumption, a few things immediately stand out. First, people who drink 200 mg of caffeine in a day have the highest median hours of sleep of 6.6 hours. Additionally, the distribution of hours asleep for 200 mg of caffeine has the smallest spread. Overall, however, the remaining groups of caffeine consumption have very comparable distributions. The median value of hours asleep hovers around 6 hours, and they all have very similar spread with no outliers. From this, we can say that drinking less than 200 mg of caffeine doesn’t seem to have an effect on time asleep. However, people who drink 200 mg of caffeine seem to more consistently get more sleep in a night than others.
Analysis of Awakenings and Age
After looking at the percentage bar chart of awakenings by age group, we can see a few things. First, people from age 30 - 50 are the most likely to have one awakening or less in a night. People aged 20 and under and 60 - 70 are the most likely to wake up more than once in a night, and people age 50 - 60 are the most likely to wake up 4 times in one night. Additionally, people who are 20 or under are also the most likely to wake up during the night in general. From this, we can also see a general trend between age and the number of awakenings in a given night. The younger or older someone is, the more likely they are to wake up at all, and the more likely they are to wake up multiple times. The more middle aged you are (30 - 50), the less likely you are to wake up during the night, and the less likely that you will wake up more than once in a night.
Analysis of Deep Sleep Percentage and Age
When examining the distribution of deep sleep percentage by age group, we see a few notable things. First, people who are 20 and under have the smallest median percentage of deep sleep at 35% and a very wide spread. However, it is also important to note that this distribution is heavily right skewed. Among all other groups, the median deep sleep percentage hovers around 60%. People aged 20-30 and 60-70, however, have a very large spread and are also heavily left skewed. People from these age groups typically are inconsistent with the percentage of deep sleep that they actually get in a night. Additionally, people aged 30 - 50 have the lowest spread overall, so people from this age range generally have a consistent deep sleep percentage. Once again, middle aged people tend to have the highest quality sleep, because they most consistently have the highest deep sleep percentage.
Analysis of Total Hours Asleep and Age
When looking at the box plots of the distribution of hours asleep by age, we can see a few important things. First, people aged 20 or under typically get the least amount of sleep with a median value of 4.96 hours. People aged 30 - 40 tend to get the most sleep of all groups, with a median value of 6.44 hours. All other groups have very comparable median values, and all groups also have very similar spread. From this, it is easy to see that people aged 30 - 40 typically get the most amount of sleep, while people 20 and under most consistently get the least amount of sleep among all age groups.
Analysis of Awakenings and Gender
After looking at the conditional distribution of awakenings by gender, we can see a few notable things. First, females are more likely to have 0 awakenings in a given night, and females are also more likely to wake up 1 time in a night. Males, on the other hand, are more likely to wake up more than 1 time in a given night. Males overall are more likely to wake up more often in a night than females are. The difference, however, is relatively small.
Analysis of Deep Sleep Percentage and Gender
When looking at the distribution of deep sleep percentage by gender, we see a slightly different story from awakenings and gender. Females have a slightly higher median deep sleep percentage of 59% when compared to the male distribution’s median of 58%. However, this is a very marginal difference of little significance. However, the distribution for males has a significantly lower spread than the distribution for females. The distribution for females is also heavily left skewed, while the distribution for males is slightly right skewed. From this, we can see that males typically have a higher deep sleep percentage than females, despite the median percentage being about the same.
Analysis of Total Hours Asleep and Gender
When looking at the boxplots of Hours asleep by gender, we can note a few important things. First, the median value of time asleep for females is slightly higher than males at 6.12 hours compared to 6.02 hours. However, this difference is insignificant. Additionally, the distribution from the first to third quartile of both genders is nearly identical. However, the distribution for males has a slightly smaller spread than the distribution for females. Overall, we can see that gender has very little impact on the amount of time someone sleeps for.
Do people who exercise generally have higher quality sleep?
People who exercise absolutely tend to have higher quality sleep than people who don’t. We can see that people who exercise more are much more likely than people who don’t to have 0 awakenings throughout the night. Additionally, people who exercise more are more likely to only wake up once during the night than people who exercise less, and people who don’t regularly exercise are the most likely to wake up 4 times in a night. The more we exercise, the less likely we are to wake up during the middle of the night. Additionally, people who exercise more are more likely to consistently have more deep sleep in a night than people who don’t. From our previous definition of quality sleep, we can easily say that people who exercise have higher quality sleep because more of their sleep consists of deep sleep and they are also considerably less likely to wake up in the night than people who don’t exercise.
Is there any correlation between bedtime and the quality and amount of deep sleep someone gets?
There is a relationship between someone’s bedtime and the amount of deep sleep they get. While the median value of deep sleep tends to stay the same across different bedtime groups, the spread of the distribution grows the later it is. Additionally, the spread of deep sleep percentage also grows if our bedtime is too early. This leaves us with a Goldilocks situation, where some bedtimes are too early, and others are too late. From looking at the violin plots, we can see that the best bedtime for deep sleep is from 22:00 - 23:00, since people who go to bed at this time most consistently see high amounts of deep sleep. Additionally, when looking at awakenings, people who go to bed during this time are also the least likely to wake up in the middle of the night, and if they do wake up they are also the least likely of all bedtimes to wake up more than once. As a result of these two findings, in order to get the best possible sleep that we can, we should aim to go to bed around 22:00 - 23:00.
Do smoking, alcohol, and caffeine negatively impact the quality and amount of deep sleep we get in a night or just sleep in general?
Smoking does not seem to have any effect on the number of times that we wake up in the middle of the night. However, smoking has a very visible effect on the amount of deep sleep and sleep in general that we get on a given night. People who don’t smoke tend to have a higher deep sleep percentage than people who do smoke. Additionally, people who don’t smoke also tend to get more hours of sleep in a night than people who do smoke. In order to maximize sleep quality, it is better to not smoke.
Drinking alcohol seems to have a very big impact on the amount of deep sleep and sleep overall that we get in a night. People who don’t drink alcohol are the least likely to wake up in the middle of the night, and only drinking 1 oz of alcohol only increases your chances of waking up once in a night. However, people who drink 2 oz of alcohol or more are much more likely to wake up in the night and to wake up multiple times. People who drink 2 oz or more of alcohol also tend to have significantly less deep sleep in a given night, and they also spend significantly less time asleep. From this, we can conclude that if you want to drink alcohol and still have the best possible sleep, it is best to limit yourself to 1 oz of alcohol.
Caffeine has a very counter intuitive relationship to sleep based off this study. Drinking more caffeine makes you less likely to wake up during the middle of the night, and if you do wake up, it is very unlikely that you will wake up more than twice. Additionally, drinking less caffeine makes it slightly more likely that you will wake up in the middle of the night. Additionally, people who drink more caffeine throughout the day typically have more deep sleep in a night. Specifically, drinking 75 mg of caffeine seems to yield the highest and most consistent deep sleep percentage. Additionally, people who drink more caffeine throughout the day also tend to sleep more than people who drink less caffeine. Surprisingly enough, to increase sleep quality we should drink more caffeine throughout the day.
Do age or gender have an impact on the amount and quality of sleep
Age does seem to have some impact on the amount and quality of our sleep. People in their 30s tend to have the fewest awakenings throughout the night, the highest and most consistent deep sleep percentage, and the most sleep overall. However, people younger than 30 tend to wake up more often, get less deep sleep, and get less sleep overall. Additionally, people over 30 also tend to follow the same trend, waking up more often, getting less deep sleep, and getting less sleep overall. The younger or older you are, the lower your sleep quality. On the other hand, those in their 30’s typically experience higher quality sleep.
Gender also seems to have an impact on sleep quality. Overall, males tend to wake up slightly more often than females do. Males, however, typically get more deep sleep in a given night than females do. Between the two genders, there is almost no difference between the amount of sleep that people get. Overall, between the two genders its a bit of a wash as to which one has better sleep quality. Males are more likely to wake up in the night, but they are also more likely to get more deep sleep in a night than females.
One major limitation of this study is that sleep is influenced by a lot of different and complex components that can intertwine and interact with each other. Attributing sleep quality to the percentage of deep sleep and the number of awakenings is a very large simplification that excludes many other potential factors.
Another limitation of the study is that many of the variables, such as Caffeine Consumption and Alcohol Consumption, have a vary narrow range of given values. As a result, in this study I was forced to examine these variables as categorical variables, where in reality if I had more data I would have treated them as quantitative variables. This limited the analysis that I could perform in this study, and this can potentially hide relationships between variables. Additionally, Caffeine Consumption and Alcohol Consumption note the total amount of caffeine and alcohol consumption in the past 24 hours before bedtime. This is a very large time span, and there’s a good chance that the alcohol and caffeine put in the body has already made its way through the body’s system. Caffeine and alcohol could have very important effects on our quality of sleep if they are ingested closer to bedtime, and we wouldn’t be able to see it using this data.
For future studies, I would include more variables that could potentially affect sleep, such as diet or screen time before bed in order to gain a more accurate sense of sleep quality factors. Additionally, the data that I would collect would be more in depth and have a greater breadth of values, so that in the future I can more easily treat these variables as quantitative data.
---
title: "Catching Some Z's"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: journal
primary: "#042975"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
```
<style>
.chart-title { /* chart_title */
font-size: 22px; # old 20
}
body{ /* Normal */
font-size: 18px; #old 16
}
.section.sidebar {
background-color: rgba(255, 255, 255, 1);
}
</style>
<head>
<base target="_blank">
</head>
Introduction
===
Column {data-width=450}
-----------------------------------------------------------------------
### Abstract
<font size = 4>**Catching Some Z's: An Analysis of Factors of a Good Night's Sleep**</font>
This study sets out to determine whether exercise frequency, bedtime, smoking status, alcohol consumption, caffeine consumption, age, or gender can affect our sleep quality, and to find different ways we can change our lifestyle in order to increase our sleep quality. From examining the data, we can find some very strong evidence that exercising more, going to bed around 22:00 – 23:00, not smoking, drinking at most 1 oz of alcohol in a day, drinking more caffeine in a day, and being 30 – 40 years old can all greatly increase sleep quality by maximizing how much of our sleep is deep sleep and by minimizing how many times we wake up during the middle of the night.
### Research Questions
- Do people who exercise generally have higher quality sleep?
- Is there any correlation between bedtime and the amount and quality of deep sleep someone gets?
- Do smoking, alcohol, and caffeine negatively impact the quality and amount of deep sleep we get in a night or just sleep in general?
- Do age or gender have an impact on the amount and quality of sleep?
Column {.tabset data-width=550}
-----------------------------------------------------------------------
### Background and Signifigance
**Signifigance**
Sleep is something that impacts almost all aspects of our daily lives. It gives us energy throughout the day, and sleep also allows us to recharge mentally and physically. However, it often seems like we can't get enough of it. This study will help to give us some answers as to what we can do to get more and better quality sleep. Sleep is a very complex process with multiple moving parts, and this study will help to shine a light on what factors affect our sleep, and how we can improve our sleep.
**Background**
Sleep is primarily composed of 3 broad stages: light sleep, deep sleep, and REM sleep. As you sleep, you cycle through these stages throughout the night, with each light sleep stage growing longer and deep and REM stages growing shorter throughout the night. According to sleepfoundation.org, the functions of each sleep stage are as follows:
- Light sleep serves as the transition between wakefulness and REM sleep to deep sleep.
- Deep sleep is responsible for repairing and restorative processes. During this phase, the body repairs itself, and the mind has a chance to rest and consolidate memories from the day.
- REM sleep serves as the transition between deep sleep to light sleep and wakefulness. Dreams occur during this phase of sleep.
From these descriptions, it is easy to see that deep sleep is one of the most important stage of sleep, since it is directly responsible for how effective our sleep is and for how rested we actually feel in the morning. As a result, I will focus on factors that specifically impact how much deep sleep we typically get in a night.
**Data Source**
The data used for this study was collected by students from ENSIAS National School for Computer Science in Morocco. Primarily, it was collected through a collection self-reported surveys, actigraphy (monitoring of sleep and activity cycles), and polysomnography (recording of vitals and brain activity during sleep).
### Methods
For this study, I primarily decided to use deep sleep percentage and the number of awakenings in a given night as a measure of quality of sleep, and I also examine how some of these factors also affect the total amount of sleep in a given night. The higher the deep sleep percentage and the less awakenings, the higher quality the sleep is. Specifically, I examine relationships between variables that I believe could have some tangible effect on our quality of sleep, such as:
- Exercise Frequency
- Bedtime
- Smoking Status
- Alcohol Consumption
- Caffeine Consumption
- Age
- Gender
Exercise Frequency, Bedtime, Alcohol Consumption, Caffeine Consumption, Awakenings, and Age are all classified as numerical variables initially. However, upon working with the data set, I found that that it would be more appropriate to reclassify these variables as factors, since there are relatively few unique values between observations. I also found that the results from the analysis of age and bedtime in relation to deep sleep percentage and awakenings were easier to work with and interpret when put into age and time groups.
Throughout the analysis, I use a combination of box plots separated by group to see relationships between each categorical variable and awakening and to see differences between each group of the categorical variables. In addition, I also use box plots and violin plots separated by group to see the impact that each variable has on the percentage of deep sleep that people get in a given night. Additionally, throughout the data set there are some observations that are missing some of the values I am interested in. However, they make up a very small proportion of the data. As a result, I decided to replace all missing values with the mode of the relevant category. Additionally, there was only one observation in Caffeine Consumption of 100 mg, so I decided to remove it for clarity.
Data at a Glance
===
```{r dataSetup}
pacman::p_load(DT, knitr, tidyverse, plotly, conflicted)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
sleep <- read_csv("Sleep_Efficiency.csv", col_types = "ddfTTdddddfffff")
colnames(sleep) <- make.names(colnames(sleep))
sleep$Bedtime <- format(sleep$Bedtime, "%Y-%m-%d %H:%M:%S")
sleep$Wakeup.time <- format(sleep$Wakeup.time, "%Y-%m-%d %H:%M:%S")
sleep <- sleep %>%
mutate(Hours.asleep = Sleep.duration * Sleep.efficiency)
mode <- function(df, x) {
x_sym <- sym(x) # Converts the string x to a symbol.
value <- df %>%
count(!!x_sym) %>% # Uses the unquote-splice operator !! to ensure that x_sym is treated as a column name.
arrange(desc(n)) %>%
slice(1) %>% # Selects the first row and extracts the value of the mode.
pull(!!x_sym)
return(value)
}
#Remove Missing values from Awakenings and replace them with the mode
temp_mode <- mode(sleep, "Awakenings")
sleep$Awakenings[is.na(sleep$Awakenings)] <- temp_mode
#Remove missing values from Caffeine Consumption and replace them with the mode
temp_mode <- mode(sleep, "Caffeine.consumption")
sleep$Caffeine.consumption[is.na(sleep$Caffeine.consumption)] <- temp_mode
#Remove missing values from Alcohol Consumption and replace them with the mode
temp_mode <- mode(sleep, "Alcohol.consumption")
sleep$Alcohol.consumption[is.na(sleep$Alcohol.consumption)] <- temp_mode
#Remove missing values from Exercise Frequency and replace them with the mode
temp_mode <- mode(sleep, "Exercise.frequency")
sleep$Exercise.frequency[is.na(sleep$Exercise.frequency)] <- temp_mode
### Formatting for pop up text boxes
font <- list(
family = "Arial",
size = 15,
color = "white"
)
label <- list(
bgcolor = "#707372",
bordercolor = "transparent",
font = font
)
```
Column {data-width = 650}
-----------------------------------------------------------------------
### Data
```{r show_table}
datatable(sleep, rownames = FALSE, colnames = (c("ID", "Age", "Gender", "Bedtime", "Wakeup Time", "Sleep Duration", "Sleep Efficiency", "REM Sleep Percentage", "Deep Sleep Percentage", "Light Sleep Percentage", "Awakenings", "Caffeine Consumption (mg)", "Alcohol Consumption (oz)", "Smoking Status", "Exercise Frequency", "Hours Asleep")),
options = list(columnDefs = list(list(className = 'dt-center', targets = 1:5)), pageLength = 20))
```
Column {data-width = 350}
-----------------------------------------------------------------------
### Variable Explanations
**Variables**
- ID = A unique identifier for each test subject
- Age = Age of subject
- Gender = Male / female
- Bedtime = Year-Month-Date-Time
- Wakeup time = Year-Month-Date-Time
- Sleep duration = Amount of time between bedtime and wakeup time
- Sleep efficiency = Proportion of time in bed vs time asleep
- REM sleep percentage = Percentage of time spent in REM sleep
- Deep Sleep Percentage = Percentage of time spent in deep sleep
- Light Sleep Percentage = Percentage of time spent in light sleep
- Awakenings = # of times subject woke up during the night
- Caffeine Consumption = Amount of caffeine consumed during past 24 hrs before bedtime (mg)
- Alcohol Consumption = Amount of alcohol consumed 24 hours before bedtime (oz)
- Smoking status = Yes/No
- Exercise Frequency = # of times the subject exercises per week
- Hours Asleep = Amount of time actually spent asleep
- Calculated from Sleep duration * Sleep efficiency
Exercise
===
Column {.tabset data-width=550}
-----------------------------------------------------------------------
### Awakenings
```{r ExerciseAwakenings}
sleep$Exercise.frequency <- factor(sleep$Exercise.frequency, levels = c("0.0","1.0","2.0","3.0","4.0","5.0"))
sleep$Awakenings <- factor(sleep$Awakenings, levels = c("0.0","1.0","2.0","3.0","4.0","5.0"))
Awakenings_percent <- sleep %>%
count(Exercise.frequency, Awakenings) %>%
group_by(Exercise.frequency) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Exercise.frequency, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Exercise Frequency: ", Exercise.frequency, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Exercise Frequency") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Exercise Frequency") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Deep Sleep Percentage
```{r ExerciseSleep}
ggplot(sleep, aes(x = Exercise.frequency, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Exercise Frequency") +
ylab("Deep Sleep Percentage ") +
ggtitle("Distribution of Deep Sleep Percentage by Exercise Frequency") +
theme(plot.title = element_text(hjust = 0.5, size = 18), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
Column {data-width=450}
----------------------
### Analysis
**Analysis of Awakenings and Exercise Frequency**
From the percentage bar chart of awakenings by exercise frequency, we can see a few things that immediately jump out. From the study, people who exercise 5 times a week are much more likely than other groups to have 0 awakenings through the night, and if they do wake up, it will only be once. People who exercise 4 times a week also are much more likely than people who exercise less to only wake up 0-1 times during a given night. Additionally, as people exercise more frequently, they are much less likely to wake up 4 times during a given night. As a broad, overall trend, we can see that the more people exercise, the less likely they are to wake up multiple times during the night. From this, it is fair to say, based on my earlier criteria for high quality sleep, that exercising more frequently can increase sleep quality.
**Analysis of Deep Sleep Percentage and Exercise Frequency**
From the box plots of deep sleep percentage by exercise frequency, we can see a few trends. First, the median deep sleep percentage of all exercise frequency groups hovers between 56% - 60%. Additionally, we can see that the distribution of deep sleep percentage is heavily left skewed for people who exercise 0-1 times per week, and it is slightly right skewed for people who exercise 2 - 5 times per week. The distribution of people who exercise 1 time a week has the highest spread, with people who don't exercise at all falling closely behind. People who exercise 5 times a week have the tightest spread, and people who exercise 4 times a week is the only category that has the highest median value with relatively low spread and few medians. From this, we can see that people who exercise typically have consistently better sleep quality than those who exercise 0-1 times a week.
Bedtime
===
Column {.tabset data-width=550}
-------------------------------
### Awakenings
```{r BedtimeAwakenings}
# Process the 'Bedtime' column
sleep$Hour <- hour(sleep$Bedtime)
# Function to categorize hours
categorize_hour <- function(hour) {
if (hour >= 21) {
return(paste(hour, "-", hour + 1, ":00", sep=""))
} else if (hour <= 3) {
return(paste(hour, "-", hour + 1, ":00", sep=""))
} else {
return(NA)
}
}
sleep$Time_Category <- sapply(sleep$Hour, categorize_hour)
sleep$Time_Category <- factor(sleep$Time_Category, levels = c("21-22:00", "22-23:00", "23-24:00","0-1:00", "1-2:00", "2-3:00"))
Awakenings_percent <- sleep %>%
count(Time_Category, Awakenings) %>%
group_by(Time_Category) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Time_Category, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Time Category: ", Time_Category, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Bedtime (Miltary Time)") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Bedtime") +
theme(axis.text.x = element_text(angle=20, hjust=1), plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Deep Sleep Percentage
```{r BedtimeSleep}
p <- plot_ly(data = sleep,
x = ~Time_Category,
y = ~Deep.sleep.percentage,
type = 'violin',
fillcolor = 'turquoise',
box = list(visible = T, width = 0.1,
line = list(color = 'black'))) %>%
plotly::layout(
title = "Distribution of Deep Sleep Percentage by Bedtime",
xaxis = list(title = "Bedtime (Military Time)", titlefont = list(size = 22), tickfont = list(size = 18)),
yaxis = list(title = "Deep Sleep Percentage", titlefont = list(size = 22), tickfont = list(size = 18)),
violinmode = "overlay"
)
p %>%
style(hoverlabel = label)
```
Column {data-width=450}
----------------------
### Analysis
**Analysis of Awakenings and Bedtime**
After looking at the conditional distribution of awakenings by bedtime, we can see a few trends. First, people who go to bed from 1-2:00 are much more likely than other groups to wake up 4 times in a given night. Additionally, people who go to bed from 22-23:00 are the most likely to have 0 - 1 awakenings throughout the night. We can also see an important overall trend. Starting from 22 - 23:00, the later someone goes to bed, the more likely they are to wake up during the night more than 1 time. Additionally, people who go to bed too early, like from 21- 22:00, are also much more likely to wake up more than once in a night. From this graph, we can see that the best time to go to bed to minimize awakenings is from 22 - 23:00.
**Analysis of Deep Sleep Percentage and Bedtime**
From the violin plots that show the distribution of deep sleep percentage by bedtime, we can see a few different things. First, the median of all groups hovers around 58%. However, the groups primarily vary from each other in terms of spread and shape. People who go to sleep from 21 - 23:00 have the smallest spread of data.Additionally, by looking at the probability density portion of the violin plot, we can also see that people who go to sleep from 21 - 22:00 are more likely to get about 60% deep sleep, and that people who go to bed from 22 - 23:00 are more likely to get 58% deep sleep in a given night. Additionally, we can see that the distribution for people who go to bed from 23 - 24:00 and from 1 - 3:00 have a much wider spread and are extremely left skewed. This means that these ranges are much more inconsistent in deep sleep percentage. From the probability density portion of the plot, we can see this, with how the peaks of probability density are much shallower than those of the previously mentioned groups. The amount of deep sleep someone in these groups is much more varied, and as a result inconsistent.
Smoking
===
Column {.tabset data-width=550}
-------------------------------
### Awakenings
```{r smokingAwakenings}
Awakenings_percent <- sleep %>%
count(Smoking.status, Awakenings) %>%
group_by(Smoking.status) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Smoking.status, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Smoking Status: ", Smoking.status, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Smoking Status") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Smoking Status") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Deep Sleep Percentage
```{r smokingDeepSleep}
ggplot(sleep, aes(x = Smoking.status, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Smoking Status") +
ylab("Deep Sleep Percentage") +
ggtitle("Distribution of Deep Sleep Percentage by Smoking Status") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Hours Asleep
```{r smokingAsleep}
ggplot(sleep, aes(x = Smoking.status, y = Hours.asleep)) +
geom_boxplot(fill = "turquoise") +
xlab("Smoking Status") +
ylab("Hours Asleep") +
ggtitle("Distribution of Hours Asleep by Smoking Status") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
Column {data-width=450}
----------------------
### Analysis
**Analysis of Awakenings and Smoking Status**
When looking at the conditional distribution of awakenings by smoking status, one thing becomes very clear. Primarily, both smokers and non-smokers share incredibly similar distributions of awakenings. Nonsmokers are marginally more likely to have 0 or 4 awakenings than smokers, and smokers are more likely to wake up 1-3 times in a given night than nonsmokers. However, the differences between the distribution smoker and nonsmoker awakenings are so marginal that they are not statistically significant. As a result, we can say that smoking doesn't seem to have a strong influence on whether someone wakes up more often or not.
**Analysis of Deep Sleep Percentage and Smoking Status**
There are some key differences between the distribution of deep sleep percentage between smokers and nonsmokers based off of the box plots. People who do not smoke have a slightly higher median deep sleep percentage than smokers. Additionally, the nonsmoking group has the highest overall observation of deep sleep percentage. The distribution of smoker deep sleep percentage is heavily left skewed with very high spread, while the distribution of nonsmoker deep sleep percentage is slightly right skewed, with a much tighter spread. From this, we can conclude that, while the median percentage of deep sleep is comparable between the two categories, nonsmokers typically and more consistently have a higher deep sleep percentage than smokers do. As a result, we can say that there seems to be a relationship between smoking and poor sleep quality.
**Analysis of Total Hours Asleep and Smoking Status**
Once again, there are some key differences between the distribution of hours asleep for smokers and nonsmokers. While both distributions are slightly left skewed, it is important to note that nonsmokers have a higher median amount sleep in a night when compared to smokers by about 1 hour. Additionally, the distribution of hours asleep for smokers has a much higher spread than nonsmokers. As a result, it is fair to say that nonsmokers typically get more sleep in a given night than smokers, and the amount of sleep that nonsmokers get is much more consistent than that which smokers get.
Alcohol
===
Column {.tabset data-width=550}
-------------------------------
### Awakenings
```{r AlcoholAwakenings}
sleep$Alcohol.consumption <- factor(sleep$Alcohol.consumption, levels = c("0.0", "1.0", "2.0", "3.0", "4.0", "5.0"))
Awakenings_percent <- sleep %>%
count(Alcohol.consumption, Awakenings) %>%
group_by(Alcohol.consumption) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Alcohol.consumption, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Alcohol Consumption: ", Alcohol.consumption, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Alcohol Consumption (oz)") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Alcohol Consumption") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Deep Sleep Percentage
```{r AlcoholDeepSleep}
sleep$Alcohol.consumption <- factor(sleep$Alcohol.consumption, levels = c("0.0", "1.0", "2.0", "3.0", "4.0", "5.0"))
ggplot(sleep, aes(x = Alcohol.consumption, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Alcohol Consumption (oz)") +
ylab("Deep Sleep Percentage") +
ggtitle("Distribution of Deep Sleep Percentage by Alcohol Consumption") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Hours Asleep
```{r AlcoholAsleep}
ggplot(sleep, aes(x = Alcohol.consumption, y = Hours.asleep)) +
geom_boxplot(fill = "turquoise") +
xlab("Alcohol Consumption (oz)") +
ylab("Hours Asleep") +
ggtitle("Distribution of Hours Asleep by Alcohol Consumption") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
Column {data-width=450}
----------------------
### Analysis
**Analysis of Awakenings and Alcohol Consumption**
When looking at the conditional distribution of awakenings by alcohol consumption, we can notice a few general trends. First, people who have 0 oz 24 hours before bedtime are much more likely to have 0-1 awakenings in a night than to wake up 2 or more times. People who only have 1 oz follow a similar trend, but they are more likely to wake up 1 time than people who have no alcohol. From there, people who drink 2 oz or more are much more likely to wake up 2 or more times, and people who drink 4-5 oz of alcohol are much more likely than other groups to wake up 4 times in a given night. From this, we can see a general trend. The more alcohol someone drinks, the more likely they are to wake up in the night and, the more likely they are to wake up multiple times.
**Analysis of Deep Sleep Percentage and Alcohol Consumption**
After examining the box plots that show the distribution of deep sleep percentage by alcohol consumption, we can see a few crucial things. People who drink no alcohol have the second highest median percentage of deep sleep, and the distribution of deep sleep percentage for this group has the smallest spread. People who drink 1 oz of alcohol have the highest median percentage of all categories and the second smallest spread of all groups. However, the distribution for people who drink 1 oz is heavily left skewed. For people who drink 2 - 5 oz of alcohol in a day, the median value quickly drops off for people who only drink 0-1 oz of alcohol. Additionally, the distribution for people who drink 2 or more oz also has a very large spread. From this, we can see the trend that an individual's deep sleep percentage is greatly and negatively affected by drinking more than 1 oz of alcohol. Not only are they more likely to get less deep sleep in a given night, but they also are very inconsistent in the amount of deep sleep they get when compared to those who drink 0-1 oz of alcohol.
**Analysis of Total Hours Asleep and Alcohol Consumption**
From the distribution of hours asleep by alcohol consumption, we can notice a few important things. First, people who drink 0 oz of alcohol 24 hours before bedtime have the highest amount of hours asleep among all groups, while also being slightly left skewed. Additionally, people who drink only 1 oz of alcohol have the second highest median amount of hours asleep, following closely behind people who drink no alcohol. From 2 oz of and alcohol and beyond, the median hours asleep for all groups noticeably drops off from the median value of about 6.3 for people who drink 0-1 oz of alcohol, with the median for most other groups hovering around 5 hours asleep. Additionally, the spread for all groups is about the same, but the groups who drink 2 or more oz of alcohol have a slightly larger spread than those who drink 0 - 1 oz of alcohol. From this, we can conclude that people who drink 0 - 1 oz of alcohol typically get more hours of sleep in a given night than those who drink 2 or more oz of alcohol in a day.
Caffeine
===
Column {.tabset data-width=550}
-------------------------------
### Awakenings
```{r CaffeineAwakenings}
sleep <- sleep %>%
filter(Caffeine.consumption != "100.0") ### Only one observation of 100 mg of Caffeine
sleep$Caffeine.consumption <- factor(sleep$Caffeine.consumption, levels = c("0.0", "25.0", "50.0", "75.0", "100.0", "200.0"))
Awakenings_percent <- sleep %>%
count(Caffeine.consumption, Awakenings) %>%
group_by(Caffeine.consumption) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Caffeine.consumption, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Caffeine Consumption: ", Caffeine.consumption, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Caffeine Consumption (mg)") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Caffeine Consumption") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Deep Sleep Percentage
```{r CaffeineDeepSleep}
ggplot(sleep, aes(x = Caffeine.consumption, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Caffeine Consumption") +
ylab("Deep Sleep Percentage") +
ggtitle("Distribution of Deep Sleep Percentage by Caffeine Consumption (mg)") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Hours Asleep
```{r CaffeineAsleep}
ggplot(sleep, aes(x = Caffeine.consumption, y = Hours.asleep)) +
geom_boxplot(fill = "turquoise") +
xlab("Caffeine Consumption (mg)") +
ylab("Hours Asleep") +
ggtitle("Distribution of Hours Asleep by Caffeine Consumption (mg)") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
Column {data-width=450}
----------------------
### Analysis
**Analysis of Awakenings and Caffeine Consumption**
When looking at the conditional distribution of awakenings by caffeine consumption, there are a few important things to note. Surprisingly, among all groups people who drink no caffeine in a day are more likely to wake up multiple times during in the night than not and are the most likely to wake up in general. Also, people who drink 200 mg of caffeine in a day are much more likely to have 0 awakenings during the night when compared to other groups, and they are also the least likely to wake up 3 - 4 times during a given night. People who drink 75 mg of caffeine are also the most likely to wake up 1 time during a night and the least likely to wake up more than once. However, among the rest of the groups, the distribution of awakenings is relatively similar. From this plot, we can see that people who drink a lot of caffeine are more likely to wake up 0-1 times in a given night than people who drink no to small amounts of caffeine. Additionally, the less caffeine someone drinks, the more likely they are to wake up 4 times in a night.
**Analysis of Deep Sleep Percentage and Caffeine Consumption**
When looking at the box plot, we can see a few things with how caffeine consumption affects deep sleep percentage. People who drink 75 milligrams have the highest median deep sleep percentage and the tightest spread. People who have 200 milligrams of caffeine have the second highest median deep sleep percentage, with a moderate spread and no outliers. People who drink no caffeine or 50 milligrams of caffeine have the lowest median amount of deep sleep, and are also heavily left skewed with very high spread. Overall, there doesn't seem to be much of a trend since the scatter plots seem to fluctuate between groups. While there doesn't seem to be an overall trend, 75 milligrams of caffeine seems to be the optimal amount of caffeine to drink to maximize deep sleep.
**Analysis of Total Hours Asleep and Caffeine Consumption**
When looking at the distribution of hours asleep by caffeine consumption, a few things immediately stand out. First, people who drink 200 mg of caffeine in a day have the highest median hours of sleep of 6.6 hours. Additionally, the distribution of hours asleep for 200 mg of caffeine has the smallest spread. Overall, however, the remaining groups of caffeine consumption have very comparable distributions. The median value of hours asleep hovers around 6 hours, and they all have very similar spread with no outliers. From this, we can say that drinking less than 200 mg of caffeine doesn't seem to have an effect on time asleep. However, people who drink 200 mg of caffeine seem to more consistently get more sleep in a night than others.
Biological Factors
===
Column {.tabset data-width=600}
----------------------------------
### Age Awakenings
```{r AgeAwakenings}
sleep <- sleep %>%
mutate(Age.group = case_when(
Age < 20 ~ "20-",
Age >= 20 & Age <30 ~ "20-30",
Age >= 30 & Age < 40 ~ "30-40",
Age >= 40 & Age < 50 ~ "40-50",
Age >= 50 & Age < 60 ~ "50-60",
Age >= 60 & Age < 70 ~ "60-70"
))
Awakenings_percent <- sleep %>%
count(Age.group, Awakenings) %>%
group_by(Age.group) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Age.group, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Age Group: ", Age.group, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Age Group") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Age Group") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Age Deep Sleep
```{r AgeDeepSleep}
ggplot(sleep, aes(x = Age.group, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Age Group") +
ylab("Deep Sleep Percentage") +
ggtitle("Distribution of Deep Sleep Percentage by Age Group") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Age Hours Asleep
```{r AgeAsleep}
ggplot(sleep, aes(x = Age.group, y = Hours.asleep)) +
geom_boxplot(fill = "turquoise") +
xlab("Age Group") +
ylab("Hours Asleep") +
ggtitle("Distribution of Hours Asleep by Age") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Gender Awakenings
```{r GenderAwakenings}
Awakenings_percent <- sleep %>%
count(Gender, Awakenings) %>%
group_by(Gender) %>%
mutate(freq = round(n / sum(n), 4)) %>%
ungroup()
ggplot(Awakenings_percent, aes(x = Gender, y = freq, fill = Awakenings)) +
geom_bar(aes(text = paste0(
"Gender: ", Gender, "\n",
"Awakenings: ", Awakenings, "\n",
"Percent: ", paste0(freq*100, "%"))),
stat = "identity", position = "stack") +
scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.2)) +
xlab("Gender") +
ylab("Percentage of Awakenings") +
ggtitle("Distribution of Awakenings by Gender") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) +
scale_fill_discrete(name="Awakenings") +
scale_fill_manual(values = c("salmon", "#fce24e", "#02de7f", "#34ebdb", "#b673f0"))-> p
ggplotly(p, tooltip = "text") %>%
style(hoverlabel = label)
```
### Gender Deep Sleep
```{r GenderDeepSleep}
ggplot(sleep, aes(x = Gender, y = Deep.sleep.percentage)) +
geom_boxplot(fill = "turquoise") +
xlab("Gender") +
ylab("Deep Sleep Percentage") +
ggtitle("Distribution of Deep Sleep Percentage by Gender") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
### Gender Hours Asleep
```{r GenderAsleep}
ggplot(sleep, aes(x = Gender, y = Hours.asleep)) +
geom_boxplot(fill = "turquoise") +
xlab("Gender") +
ylab("Hours Asleep") +
ggtitle("Distribution of Hours Asleep by Gender") +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 16)) -> p
ggplotly(p) %>%
style(hoverlabel = label)
```
Column {.tabset data-width=400}
----------------------------------
### Age Analysis
**Analysis of Awakenings and Age**
After looking at the percentage bar chart of awakenings by age group, we can see a few things. First, people from age 30 - 50 are the most likely to have one awakening or less in a night. People aged 20 and under and 60 - 70 are the most likely to wake up more than once in a night, and people age 50 - 60 are the most likely to wake up 4 times in one night. Additionally, people who are 20 or under are also the most likely to wake up during the night in general. From this, we can also see a general trend between age and the number of awakenings in a given night. The younger or older someone is, the more likely they are to wake up at all, and the more likely they are to wake up multiple times. The more middle aged you are (30 - 50), the less likely you are to wake up during the night, and the less likely that you will wake up more than once in a night.
**Analysis of Deep Sleep Percentage and Age**
When examining the distribution of deep sleep percentage by age group, we see a few notable things. First, people who are 20 and under have the smallest median percentage of deep sleep at 35% and a very wide spread. However, it is also important to note that this distribution is heavily right skewed. Among all other groups, the median deep sleep percentage hovers around 60%. People aged 20-30 and 60-70, however, have a very large spread and are also heavily left skewed. People from these age groups typically are inconsistent with the percentage of deep sleep that they actually get in a night. Additionally, people aged 30 - 50 have the lowest spread overall, so people from this age range generally have a consistent deep sleep percentage. Once again, middle aged people tend to have the highest quality sleep, because they most consistently have the highest deep sleep percentage.
**Analysis of Total Hours Asleep and Age**
When looking at the box plots of the distribution of hours asleep by age, we can see a few important things. First, people aged 20 or under typically get the least amount of sleep with a median value of 4.96 hours. People aged 30 - 40 tend to get the most sleep of all groups, with a median value of 6.44 hours. All other groups have very comparable median values, and all groups also have very similar spread. From this, it is easy to see that people aged 30 - 40 typically get the most amount of sleep, while people 20 and under most consistently get the least amount of sleep among all age groups.
### Gender Analysis
**Analysis of Awakenings and Gender**
After looking at the conditional distribution of awakenings by gender, we can see a few notable things. First, females are more likely to have 0 awakenings in a given night, and females are also more likely to wake up 1 time in a night. Males, on the other hand, are more likely to wake up more than 1 time in a given night. Males overall are more likely to wake up more often in a night than females are. The difference, however, is relatively small.
**Analysis of Deep Sleep Percentage and Gender**
When looking at the distribution of deep sleep percentage by gender, we see a slightly different story from awakenings and gender. Females have a slightly higher median deep sleep percentage of 59% when compared to the male distribution's median of 58%. However, this is a very marginal difference of little significance. However, the distribution for males has a significantly lower spread than the distribution for females. The distribution for females is also heavily left skewed, while the distribution for males is slightly right skewed. From this, we can see that males typically have a higher deep sleep percentage than females, despite the median percentage being about the same.
**Analysis of Total Hours Asleep and Gender**
When looking at the boxplots of Hours asleep by gender, we can note a few important things. First, the median value of time asleep for females is slightly higher than males at 6.12 hours compared to 6.02 hours. However, this difference is insignificant. Additionally, the distribution from the first to third quartile of both genders is nearly identical. However, the distribution for males has a slightly smaller spread than the distribution for females. Overall, we can see that gender has very little impact on the amount of time someone sleeps for.
Conclusions
===
Column { data-height=700}
-------------------------
### Conclusions
**Do people who exercise generally have higher quality sleep?**
People who exercise absolutely tend to have higher quality sleep than people who don't. We can see that people who exercise more are much more likely than people who don't to have 0 awakenings throughout the night. Additionally, people who exercise more are more likely to only wake up once during the night than people who exercise less, and people who don't regularly exercise are the most likely to wake up 4 times in a night. The more we exercise, the less likely we are to wake up during the middle of the night.
Additionally, people who exercise more are more likely to consistently have more deep sleep in a night than people who don't. From our previous definition of quality sleep, we can easily say that people who exercise have higher quality sleep because more of their sleep consists of deep sleep and they are also considerably less likely to wake up in the night than people who don't exercise.
**Is there any correlation between bedtime and the quality and amount of deep sleep someone gets?**
There is a relationship between someone's bedtime and the amount of deep sleep they get. While the median value of deep sleep tends to stay the same across different bedtime groups, the spread of the distribution grows the later it is. Additionally, the spread of deep sleep percentage also grows if our bedtime is too early. This leaves us with a Goldilocks situation, where some bedtimes are too early, and others are too late. From looking at the violin plots, we can see that the best bedtime for deep sleep is from 22:00 - 23:00, since people who go to bed at this time most consistently see high amounts of deep sleep. Additionally, when looking at awakenings, people who go to bed during this time are also the least likely to wake up in the middle of the night, and if they do wake up they are also the least likely of all bedtimes to wake up more than once. As a result of these two findings, in order to get the best possible sleep that we can, we should aim to go to bed around 22:00 - 23:00.
**Do smoking, alcohol, and caffeine negatively impact the quality and amount of deep sleep we get in a night or just sleep in general?**
Smoking does not seem to have any effect on the number of times that we wake up in the middle of the night. However, smoking has a very visible effect on the amount of deep sleep and sleep in general that we get on a given night. People who don't smoke tend to have a higher deep sleep percentage than people who do smoke. Additionally, people who don't smoke also tend to get more hours of sleep in a night than people who do smoke. In order to maximize sleep quality, it is better to not smoke.
Drinking alcohol seems to have a very big impact on the amount of deep sleep and sleep overall that we get in a night. People who don't drink alcohol are the least likely to wake up in the middle of the night, and only drinking 1 oz of alcohol only increases your chances of waking up once in a night. However, people who drink 2 oz of alcohol or more are much more likely to wake up in the night and to wake up multiple times. People who drink 2 oz or more of alcohol also tend to have significantly less deep sleep in a given night, and they also spend significantly less time asleep. From this, we can conclude that if you want to drink alcohol and still have the best possible sleep, it is best to limit yourself to 1 oz of alcohol.
Caffeine has a very counter intuitive relationship to sleep based off this study. Drinking more caffeine makes you less likely to wake up during the middle of the night, and if you do wake up, it is very unlikely that you will wake up more than twice. Additionally, drinking less caffeine makes it slightly more likely that you will wake up in the middle of the night. Additionally, people who drink more caffeine throughout the day typically have more deep sleep in a night. Specifically, drinking 75 mg of caffeine seems to yield the highest and most consistent deep sleep percentage. Additionally, people who drink more caffeine throughout the day also tend to sleep more than people who drink less caffeine. Surprisingly enough, to increase sleep quality we should drink more caffeine throughout the day.
**Do age or gender have an impact on the amount and quality of sleep**
Age does seem to have some impact on the amount and quality of our sleep. People in their 30s tend to have the fewest awakenings throughout the night, the highest and most consistent deep sleep percentage, and the most sleep overall. However, people younger than 30 tend to wake up more often, get less deep sleep, and get less sleep overall. Additionally, people over 30 also tend to follow the same trend, waking up more often, getting less deep sleep, and getting less sleep overall. The younger or older you are, the lower your sleep quality. On the other hand, those in their 30's typically experience higher quality sleep.
Gender also seems to have an impact on sleep quality. Overall, males tend to wake up slightly more often than females do. Males, however, typically get more deep sleep in a given night than females do. Between the two genders, there is almost no difference between the amount of sleep that people get. Overall, between the two genders its a bit of a wash as to which one has better sleep quality. Males are more likely to wake up in the night, but they are also more likely to get more deep sleep in a night than females.
Discussion {data-orientation=rows}
===
Row { data-height=700}
-------------------------
### Limitations and Future Study
One major limitation of this study is that sleep is influenced by a lot of different and complex components that can intertwine and interact with each other. Attributing sleep quality to the percentage of deep sleep and the number of awakenings is a very large simplification that excludes many other potential factors.
Another limitation of the study is that many of the variables, such as Caffeine Consumption and Alcohol Consumption, have a vary narrow range of given values. As a result, in this study I was forced to examine these variables as categorical variables, where in reality if I had more data I would have treated them as quantitative variables. This limited the analysis that I could perform in this study, and this can potentially hide relationships between variables. Additionally, Caffeine Consumption and Alcohol Consumption note the total amount of caffeine and alcohol consumption in the past 24 hours before bedtime. This is a very large time span, and there's a good chance that the alcohol and caffeine put in the body has already made its way through the body's system. Caffeine and alcohol could have very important effects on our quality of sleep if they are ingested closer to bedtime, and we wouldn't be able to see it using this data.
For future studies, I would include more variables that could potentially affect sleep, such as diet or screen time before bed in order to gain a more accurate sense of sleep quality factors. Additionally, the data that I would collect would be more in depth and have a greater breadth of values, so that in the future I can more easily treat these variables as quantitative data.
Row { data-height=300}
-------------------------
### About the Author
I am Bryan Kohler, a junior undergraduate student at the University of Dayton. I am pursuing a Bachelors Degree in Computer Science with a minor in Data Analytics, and I expect to graduate in May 2024.
You can connect with me via [Linkedin](https://www.linkedin.com/in/bryan-kohler/)
Column {.sidebar data-width=500 data-padding=10}
------------------------
<img src="sheeps4.jpg" width="100%" height="auto" style="float: right;">
\n
\n
\n
\n
### References
**Data Source**
- The dataset used in this project can be downloaded at [kaggle](https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency)
**Sleep Background Information**
- The background information about deep sleep and the different sleep stages can be found at [sleepfoundation.org](https://www.sleepfoundation.org/stages-of-sleep/deep-sleep)