Which of the following is NOT an assumption of a t-test?
The variances of the groups being compared are equal (for independent samples t-test).
The observations are independent.
The data is normally distributed.
The data is categorical.
Suppose 60% of emails in your inbox are spam and 40% are legitimate. Also, 95% of spam emails contain the word 'free,' while only 1% of legitimate emails do. If an email contains the word 'free,' what's the probability it's spam?
0.57
0.05
0.004
0.996
In multiple regression, what does a high variance inflation factor (VIF) indicate?
Low multicollinearity among predictor variables
Heteroscedasticity in the residuals
High multicollinearity among predictor variables
A good fit of the regression model
An autoregressive model (AR) uses ______ values of the time series to predict future values.
Future
Past
Average
Random
What is the purpose of calculating a correlation coefficient?
To measure the strength and direction of the linear relationship between two variables.
To determine the cause-and-effect relationship between two variables.
To test the significance of the difference between two means.
To predict the value of one variable based on the value of another variable.
A 95% confidence interval for a population mean is calculated to be (60, 80). What is the correct interpretation of this interval?
If we were to repeatedly sample from this population, 95% of the time the sample mean would fall between 60 and 80.
If we were to repeatedly construct confidence intervals using this method, 95% of them would contain the true population mean.
We are 95% confident that the sample mean falls between 60 and 80.
There is a 95% probability that the true population mean falls between 60 and 80.
Which of the following sampling techniques is most likely to introduce bias?
Convenience sampling
Simple random sampling
Stratified random sampling
Systematic sampling
If a residual plot shows a funnel shape (increasing spread of residuals as the predicted value increases), what does this indicate?
Heteroscedasticity
Autocorrelation
Homoscedasticity
Multicollinearity
In PCA, what does a scree plot help determine?
The amount of variance explained by each variable
The presence of multicollinearity
The correlation between principal components
The optimal number of principal components to retain
In simple linear regression, what does the slope of the regression line represent?
The point where the regression line crosses the y-axis
The average value of the dependent variable
The strength of the relationship between the variables
The change in the dependent variable for a one-unit change in the independent variable