Advantages and Disadvantages of Linear Regression

At 5/27/2023

1. What are the disadvantages of the linear regression model?

One of the most significant demerits of the linear model is that it is sensitive and dependent on the outliers. It can affect the overall result. Another notable demerit of the linear model is overfitting. Similarly, underfitting is also a significant disadvantage of the linear model.

Why Naive Bayes is called Naive?

We call it naive because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications: we consider that these predictors are independent we consider that all the predictors have an equal effect on the outcome (like the day being windy does not have more importance in deciding to play golf or not).

3. How does Random Forest handle missing values?

The Random Forest methods encourage two ways of handling missing values: Drop data points with missing values. This is not recommended due to the fact that all the available data points is not used. Fill in the missing values with the median (for numerical values) or mode (for categorical values). This method will brush too broad a stroke for datasets with many gaps and significant structure.
There are other methods of filling in missing values such as calculating the similarity between the missing features, and the missing values estimated by weighting.

4. Why does XGBoost perform better than SVM?

In case of missing values, XGB is internally designed to handle missing values. The missing values are interpreted in such a way that if there endures any trend in the missing values, it is captured by the model. Users are required to supply a different value than other observations and pass that as a parameter. XGBoost tries different things as it encounters a missing value on each node and learns which path to take for missing values in future. On the other hand, Support Vector Machine (SVM) does not perform well with the missing data and it is always a better option to impute the missing values before running SVM.