Let's take a look at a few examples that should help to clarify the distinction between the two types of extreme values. One advantage of the case in which we have only one predictor is that we can look at simple scatter plots in order to identify any outliers and influential data points. Outliers and high-leverage data points have the potential to be influential, but we generally have to investigate further to determine whether or not they are actually influential. Note that - for our purposes - we consider a data point to be an outlier only if it is extreme with respect to the other y values, not the x values.Ī data point is influential if it unduly influences any part of regression analysis, such as the predicted responses, the estimated slope coefficients, or the hypothesis test results. With multiple predictors, extreme x values may be particularly high or low for one or more predictors or may be "unusual" combinations of predictor values (e.g., with two predictors that are positively correlated, an unusual combination of predictor values might be a high value of one predictor paired with a low value of the other predictor). With a single predictor, an extreme x value is simply one that is particularly high or low. A data point has high leverage if it has "extreme" predictor x values.An outlier is a data point whose response y does not follow the general trend of the rest of the data.In this section, we learn the distinction between outliers and high-leverage observations. studentized deleted residuals (or externally studentized residuals).(unstandardized) deleted residuals (or PRESS prediction errors).studentized residuals (or internally studentized residuals).This lesson addresses all these issues using the following measures: Once we've identified any outliers and/or high-leverage data points, we then need to determine whether or not the points actually have an undue influence on our model. Thus, it is important to know how to detect outliers and high-leverage data points. It is also possible for an observation to be both an outlier and have high leverage. Thus, there is a distinction between outliers and high-leverage observations, and each can impact our regression analyses differently. 1 2 Due to foreshortening, nearby objects show a larger parallax than farther objects, so parallax can be used to determine distances. On the other hand, if an observation has a particularly unusual combination of predictor values (e.g., one predictor has a very different value for that observation compared with all the other data observations), then that observation is said to have high leverage. Parallax is a displacement or difference in the apparent position of an object viewed along two different lines of sight and is measured by the angle or half-angle of inclination between those two lines. If an observation has a response value that is very different from the predicted value based on a model, then that observation is called an outlier. In this lesson, we learn about how data observations can potentially be influential in different ways.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |