What is the variable used to make a prediction?

Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable.

Table of Contents Show

Example 1: Make Predictions with a Simple Linear Regression Model
Example 2: Make Predictions with a Multiple Linear Regression Model
On Using Confidence Intervals
Cautions on Making Predictions
Additional Resources

One of the most common reasons for fitting a regression model is to use the model to predict the values of new observations.

We use the following steps to make predictions with a regression model:

Step 1: Collect the data.
Step 2: Fit a regression model to the data.
Step 3: Verify that the model fits the data well.
Step 4: Use the fitted regression equation to predict the values of new observations.

The following examples show how to use regression models to make predictions.

Example 1: Make Predictions with a Simple Linear Regression Model

Suppose a doctor collects data for height (in inches) and weight (in pounds) on 50 patients.

She then fits a simple linear regression model using “weight” as the predictor variable and “height” as the response variable.

The fitted regression equation is as follows:

Height = 32.7830 + 0.2001*(weight)

After checking that the assumptions of the linear regression model are met, the doctor concludes that the model fits the data well.

He can then use the model to predict the height of new patients based on their weight.

For example, suppose a new patient weighs 170 pounds. Using the model, we would predict that this patient would have a height of 66.8 inches:

Height = 32.7830 + 0.2001*(170) = 66.8 inches

Example 2: Make Predictions with a Multiple Linear Regression Model

Suppose an economist collects data for total years of schooling, weekly hours worked, and yearly income on 30 individuals.

He then fits a multiple linear regression model using “total years of schooling” and “weekly hours worked” as the predictor variable and “yearly income” as the response variable.

The fitted regression equation is as follows:

Income = 1,342.29 + 3,324.33*(years of schooling) + 765.88*(weekly hours worked)

After checking that the assumptions of the linear regression model are met, the economist concludes that the model fits the data well.

He can then use the model to predict the yearly income of a new individual based on their total years of schooling and weekly hours worked.

For example, suppose a new individual has 16 years of total schooling and works an average of 40 hours per week. Using the model, we would predict that this individual would have a yearly income of $85,166.77:

Income = 1,342.29 + 3,324.33*(16) + 765.88*(45) = $85,166.77

On Using Confidence Intervals

When using a regression model to make predictions on new observations, the value predicted by the regression model is known as a point estimate.

Although the point estimate represents our best guess for the value of the new observation, it’s unlikely to exactly match the value of the new observation.

So, to capture this uncertainty we can create a confidence interval – a range of values that is likely to contain a population parameter with a certain level of confidence.

For example, instead of predicting that a new individual will be 66.8 inches tall, we may create the following confidence interval:

95% Confidence Interval = [64.8 inches, 68.8 inches]

We would interpret this interval to mean that we’re 95% confident that the true height of this individual is between 64.8 inches and 68.8 inches.

Cautions on Making Predictions

Keep in mind the following when using a regression model to make predictions:

1. Only use the model to make predictions within the range of data used to estimate the regression model.

For example, suppose we fit a regression model using the predictor variable “weight” and the weight of individuals in the sample we used to estimate the model ranged between 120 pounds and 180 pounds.

It would be invalid to use the model to estimate the height of an individual who weighted 200 pounds because this falls outside of the range of the predictor variable that we used to estimate the model.

It’s possible that the relationship between weight and height is different outside of the range of 120 to 180 pounds, so we shouldn’t use the model to estimate the height of an individual who weighs 200 pounds.

2. Only use the model to make predictions for the population you sampled.

For example, suppose the population that an economist draws a sample from all lives in a particular city.

We should only use the fitted regression model to predict the yearly income of individuals in this city since the entire sample that was used to fit the model lived in this city.

Additional Resources

Introduction to Simple Linear Regression
Introduction to Multiple Linear Regression
Introduction to Confidence Intervals
The Four Assumptions of Linear Regression

{"appState":{"pageLoadApiCallsStatus":true},"articleState":{"article":{"headers":{"creationTime":"2016-03-26T15:38:40+00:00","modifiedTime":"2021-12-21T20:36:48+00:00","timestamp":"2022-06-22T19:37:03+00:00"},"data":{"breadcrumbs":[{"name":"Academics & The Arts","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33662"},"slug":"academics-the-arts","categoryId":33662},{"name":"Math","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33720"},"slug":"math","categoryId":33720},{"name":"Statistics","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33728"},"slug":"statistics","categoryId":33728}],"title":"Using Linear Regression to Predict an Outcome","strippedTitle":"using linear regression to predict an outcome","slug":"using-linear-regression-to-predict-an-outcome","canonicalUrl":"","seo":{"metaDescription":"Linear regression is a commonly used way to predict the value of a variable when you know the value of other variables.","noIndex":0,"noFollow":0},"content":"Statistical researchers often use a linear relationship to predict the (average) numerical value of Y for a given value of X using a straight line (called the regression line).\r\n\r\nIf you know the slope and the y-intercept of that regression line, then you can plug in a value for X and predict the average value for Y. In other words, you predict (the average) Y from X.\r\n\r\nIf you establish at least a moderate correlation between X and Y through both a correlation coefficient and a scatterplot, then you know they have some type of linear relationship.\r\nNever do a regression analysis unless you have already found at least a moderately strong correlation between the two variables. (A good rule of thumb is it should be at or beyond either positive or negative 0.50.) If the data don’t resemble a line to begin with, you shouldn’t try to use a line to fit the data and make predictions (but people still try).\r\nBefore moving forward to find the equation for your regression line, you have to identify which of your two variables is X and which is Y. When doing correlations, the choice of which variable is X and which is Y doesn’t matter, as long as you’re consistent for all the data. But when fitting lines and making predictions, the choice of X and Y does make a difference.\r\n\r\nSo how do you determine which variable is which? In general, Y is the variable that you want to predict, and X is the variable you are using to make that prediction. For example, say you are using the number of times a population of crickets chirp to predict the temperature. In this case you would make the variable Y the temperature, and the variable X the number of chirps. Hence Y can be predicted by X using the equation of a line if a strong enough linear relationship exists.\r\n\r\nStatisticians call the X-variable (cricket chirps in this example) the explanatory variable, because if X changes, the slope tells you (or explains) how much Y is expected to change in response. Therefore, the Y variable is called the response variable. Other names for X and Y include the independent and dependent variables, respectively.\r\n\r\nIn the case of two numerical variables, you can come up with a line that enables you to predict Y from X, if (and only if) the following two conditions are met:\r\n<ul class=\"level-one\">\r\n \t<li>\r\nThe scatterplot must form a linear pattern.\r\n</li>\r\n \t<li>\r\nThe correlation, r, is moderate to strong (typically beyond 0.50 or –0.50).\r\n</li>\r\n</ul>\r\nSome researchers actually don’t check these conditions before making predictions. Their claims are not valid unless the two conditions are met.\r\n\r\nBut suppose the correlation is high; do you still need to look at the scatterplot? Yes. In some situations the data have a somewhat curved shape, yet the correlation is still strong; in these cases making predictions using a straight line is still invalid. Predictions in these cases need to be made based on other methods that use a curve instead.","description":"Statistical researchers often use a linear relationship to predict the (average) numerical value of Y for a given value of X using a straight line (called the regression line).\r\n\r\nIf you know the slope and the y-intercept of that regression line, then you can plug in a value for X and predict the average value for Y. In other words, you predict (the average) Y from X.\r\n\r\nIf you establish at least a moderate correlation between X and Y through both a correlation coefficient and a scatterplot, then you know they have some type of linear relationship.\r\nNever do a regression analysis unless you have already found at least a moderately strong correlation between the two variables. (A good rule of thumb is it should be at or beyond either positive or negative 0.50.) If the data don’t resemble a line to begin with, you shouldn’t try to use a line to fit the data and make predictions (but people still try).\r\nBefore moving forward to find the equation for your regression line, you have to identify which of your two variables is X and which is Y. When doing correlations, the choice of which variable is X and which is Y doesn’t matter, as long as you’re consistent for all the data. But when fitting lines and making predictions, the choice of X and Y does make a difference.\r\n\r\nSo how do you determine which variable is which? In general, Y is the variable that you want to predict, and X is the variable you are using to make that prediction. For example, say you are using the number of times a population of crickets chirp to predict the temperature. In this case you would make the variable Y the temperature, and the variable X the number of chirps. Hence Y can be predicted by X using the equation of a line if a strong enough linear relationship exists.\r\n\r\nStatisticians call the X-variable (cricket chirps in this example) the explanatory variable, because if X changes, the slope tells you (or explains) how much Y is expected to change in response. Therefore, the Y variable is called the response variable. Other names for X and Y include the independent and dependent variables, respectively.\r\n\r\nIn the case of two numerical variables, you can come up with a line that enables you to predict Y from X, if (and only if) the following two conditions are met:\r\n<ul class=\"level-one\">\r\n \t<li>\r\nThe scatterplot must form a linear pattern.\r\n</li>\r\n \t<li>\r\nThe correlation, r, is moderate to strong (typically beyond 0.50 or –0.50).\r\n</li>\r\n</ul>\r\nSome researchers actually don’t check these conditions before making predictions. Their claims are not valid unless the two conditions are met.\r\n\r\nBut suppose the correlation is high; do you still need to look at the scatterplot? Yes. In some situations the data have a somewhat curved shape, yet the correlation is still strong; in these cases making predictions using a straight line is still invalid. Predictions in these cases need to be made based on other methods that use a curve instead.","blurb":"","authors":[{"authorId":9121,"name":"Deborah J. Rumsey","slug":"deborah-j-rumsey","description":"Deborah Rumsey, PhD, is an auxiliary faculty member and program specialist in department of statistics at The Ohio State University. An author of several Dummies books, she is a fellow of the American Statistical Association.","_links":{"self":"https://dummies-api.dummies.com/v2/authors/9121"}}],"primaryCategoryTaxonomy":{"categoryId":33728,"title":"Statistics","slug":"statistics","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33728"}},"secondaryCategoryTaxonomy":{"categoryId":0,"title":null,"slug":null,"_links":null},"tertiaryCategoryTaxonomy":{"categoryId":0,"title":null,"slug":null,"_links":null},"trendingArticles":null,"inThisArticle":[],"relatedArticles":{"fromBook":[{"articleId":208650,"title":"Statistics For Dummies Cheat Sheet","slug":"statistics-for-dummies-cheat-sheet","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/208650"}},{"articleId":188342,"title":"Checking Out Statistical Confidence Interval Critical Values","slug":"checking-out-statistical-confidence-interval-critical-values","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/188342"}},{"articleId":188341,"title":"Handling Statistical Hypothesis Tests","slug":"handling-statistical-hypothesis-tests","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/188341"}},{"articleId":188343,"title":"Statistically Figuring Sample Size","slug":"statistically-figuring-sample-size","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/188343"}},{"articleId":188336,"title":"Surveying Statistical Confidence Intervals","slug":"surveying-statistical-confidence-intervals","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/188336"}}],"fromCategory":[{"articleId":263501,"title":"10 Steps to a Better Math Grade with Statistics","slug":"10-steps-to-a-better-math-grade-with-statistics","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/263501"}},{"articleId":263495,"title":"Statistics and Histograms","slug":"statistics-and-histograms","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/263495"}},{"articleId":263492,"title":"What is Categorical Data and How is It Summarized?","slug":"what-is-categorical-data-and-how-is-it-summarized","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/263492"}},{"articleId":209320,"title":"Statistics II For Dummies Cheat Sheet","slug":"statistics-ii-for-dummies-cheat-sheet","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/209320"}},{"articleId":209293,"title":"SPSS For Dummies Cheat Sheet","slug":"spss-for-dummies-cheat-sheet","categoryList":["academics-the-arts","math","statistics"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/209293"}}]},"hasRelatedBookFromSearch":false,"relatedBook":{"bookId":282603,"slug":"statistics-for-dummies-2nd-edition","isbn":"9781119293521","categoryList":["academics-the-arts","math","statistics"],"amazon":{"default":"https://www.amazon.com/gp/product/1119293529/ref=as_li_tl?ie=UTF8&tag=wiley01-20","ca":"https://www.amazon.ca/gp/product/1119293529/ref=as_li_tl?ie=UTF8&tag=wiley01-20","indigo_ca":"http://www.tkqlhce.com/click-9208661-13710633?url=https://www.chapters.indigo.ca/en-ca/books/product/1119293529-item.html&cjsku=978111945484","gb":"https://www.amazon.co.uk/gp/product/1119293529/ref=as_li_tl?ie=UTF8&tag=wiley01-20","de":"https://www.amazon.de/gp/product/1119293529/ref=as_li_tl?ie=UTF8&tag=wiley01-20"},"image":{"src":"https://www.dummies.com/wp-content/uploads/statistics-for-dummies-2nd-edition-cover-9781119293521-203x255.jpg","width":203,"height":255},"title":"Statistics For Dummies","testBankPinActivationLink":"","bookOutOfPrint":true,"authorsInfo":"Deborah J. Rumsey, PhD, is an Auxiliary Professor and Statistics Education Specialist at The Ohio State University. She is the author of Statistics For Dummies, Statistics II For Dummies, Statistics Workbook For Dummies, and Probability For Dummies.","authors":[{"authorId":34805,"name":"Deborah J. Rumsey","slug":"deborah-j.-rumsey","description":" Deborah J. Rumsey, PhD, is an Auxiliary Professor and Statistics Education Specialist at The Ohio State University. She is the author of Statistics For Dummies, Statistics II For Dummies, Statistics Workbook For Dummies, and Probability For Dummies. ","_links":{"self":"https://dummies-api.dummies.com/v2/authors/34805"}}],"_links":{"self":"https://dummies-api.dummies.com/v2/books/"}},"collections":[],"articleAds":{"footerAd":"<div class=\"du-ad-region row\" id=\"article_page_adhesion_ad\"><div class=\"du-ad-unit col-md-12\" data-slot-id=\"article_page_adhesion_ad\" data-refreshed=\"false\" \r\n data-target = \"[{"key":"cat","values":["academics-the-arts","math","statistics"]},{"key":"isbn","values":["9781119293521"]}]\" id=\"du-slot-62b36f5f91b54\"></div></div>","rightAd":"<div class=\"du-ad-region row\" id=\"article_page_right_ad\"><div class=\"du-ad-unit col-md-12\" data-slot-id=\"article_page_right_ad\" data-refreshed=\"false\" \r\n data-target = \"[{"key":"cat","values":["academics-the-arts","math","statistics"]},{"key":"isbn","values":["9781119293521"]}]\" id=\"du-slot-62b36f5f922c2\"></div></div>"},"articleType":{"articleType":"Articles","articleList":null,"content":null,"videoInfo":{"videoId":null,"name":null,"accountId":null,"playerId":null,"thumbnailUrl":null,"description":null,"uploadDate":null}},"sponsorship":{"sponsorshipPage":false,"backgroundImage":{"src":null,"width":0,"height":0},"brandingLine":"","brandingLink":"","brandingLogo":{"src":null,"width":0,"height":0},"sponsorAd":null,"sponsorEbookTitle":null,"sponsorEbookLink":null,"sponsorEbookImage":null},"primaryLearningPath":"Advance","lifeExpectancy":"Five years","lifeExpectancySetFrom":"2021-07-12T00:00:00+00:00","dummiesForKids":"no","sponsoredContent":"no","adInfo":"","adPairKey":[]},"status":"publish","visibility":"public","articleId":169714},"articleLoadedStatus":"success"},"listState":{"list":{},"objectTitle":"","status":"initial","pageType":null,"objectId":null,"page":1,"sortField":"time","sortOrder":1,"categoriesIds":[],"articleTypes":[],"filterData":{},"filterDataLoadedStatus":"initial","pageSize":10},"adsState":{"pageScripts":{"headers":{"timestamp":"2022-08-08T18:59:11+00:00"},"adsId":0,"data":{"scripts":[{"pages":["all"],"location":"header","script":"\r\n<script src=\"https://cdn.optimizely.com/js/10563184655.js\"></script>","enabled":false},{"pages":["all"],"location":"header","script":"\r\n<script>var _comscore = _comscore || [];_comscore.push({ c1: \"2\", c2: \"15097263\" });(function() {var s = document.createElement(\"script\"), el = document.getElementsByTagName(\"script\")[0]; s.async = true;s.src = (document.location.protocol == \"https:\" ? \"https://sb\" : \"http://b\") + \".scorecardresearch.com/beacon.js\";el.parentNode.insertBefore(s, el);})();</script><noscript><img src=\"https://sb.scorecardresearch.com/p?c1=2&c2=15097263&cv=2.0&cj=1\" /></noscript>\r\n","enabled":true},{"pages":["all"],"location":"footer","script":"\r\n<script type='text/javascript'>\r\n(function(){var g=function(e,h,f,g){\r\nthis.get=function(a){for(var a=a+\"=\",c=document.cookie.split(\";\"),b=0,e=c.length;b<e;b++){for(var d=c[b];\" \"==d.charAt(0);)d=d.substring(1,d.length);if(0==d.indexOf(a))return d.substring(a.length,d.length)}return null};\r\nthis.set=function(a,c){var b=\"\",b=new Date;b.setTime(b.getTime()+6048E5);b=\"; expires=\"+b.toGMTString();document.cookie=a+\"=\"+c+b+\"; path=/; \"};\r\nthis.check=function(){var a=this.get(f);if(a)a=a.split(\":\");else if(100!=e)\"v\"==h&&(e=Math.random()>=e/100?0:100),a=[h,e,0],this.set(f,a.join(\":\"));else return!0;var c=a[1];if(100==c)return!0;switch(a[0]){case \"v\":return!1;case \"r\":return c=a[2]%Math.floor(100/c),a[2]++,this.set(f,a.join(\":\")),!c}return!0};\r\nthis.go=function(){if(this.check()){var a=document.createElement(\"script\");a.type=\"text/javascript\";a.src=g;document.body&&document.body.appendChild(a)}};\r\nthis.start=function(){var t=this;\"complete\"!==document.readyState?window.addEventListener?window.addEventListener(\"load\",function(){t.go()},!1):window.attachEvent&&window.attachEvent(\"onload\",function(){t.go()}):t.go()};};\r\ntry{(new g(100,\"r\",\"QSI_S_ZN_5o5yqpvMVjgDOuN\",\"https://zn5o5yqpvmvjgdoun-wiley.siteintercept.qualtrics.com/SIE/?Q_ZID=ZN_5o5yqpvMVjgDOuN\")).start()}catch(i){}})();\r\n</script><div id='ZN_5o5yqpvMVjgDOuN'></div>\r\n","enabled":false},{"pages":["all"],"location":"header","script":"\r\n<script>\r\n (function(h,o,t,j,a,r){\r\n h.hj=h.hj||function(){(h.hj.q=h.hj.q||[]).push(arguments)};\r\n h._hjSettings={hjid:257151,hjsv:6};\r\n a=o.getElementsByTagName('head')[0];\r\n r=o.createElement('script');r.async=1;\r\n r.src=t+h._hjSettings.hjid+j+h._hjSettings.hjsv;\r\n a.appendChild(r);\r\n })(window,document,'https://static.hotjar.com/c/hotjar-','.js?sv=');\r\n</script>","enabled":false},{"pages":["article"],"location":"header","script":" <script src=\"//get.s-onetag.com/bffe21a1-6bb8-4928-9449-7beadb468dae/tag.min.js\" async defer></script>","enabled":true},{"pages":["homepage"],"location":"header","script":"<meta name=\"facebook-domain-verification\" content=\"irk8y0irxf718trg3uwwuexg6xpva0\" />","enabled":true},{"pages":["homepage","article","category","search"],"location":"footer","script":"\r\n<noscript>\r\n<img height=\"1\" width=\"1\" src=\"https://www.facebook.com/tr?id=256338321977984&ev=PageView&noscript=1\"/>\r\n</noscript>\r\n","enabled":true}]}},"pageScriptsLoadedStatus":"success"},"navigationState":{"navigationCollections":[{"collectionId":287568,"title":"BYOB (Be Your Own Boss)","hasSubCategories":false,"url":"/collection/for-the-entry-level-entrepreneur-287568"},{"collectionId":293237,"title":"Be a Rad Dad","hasSubCategories":false,"url":"/collection/be-the-best-dad-293237"},{"collectionId":294090,"title":"Contemplating the Cosmos","hasSubCategories":false,"url":"/collection/theres-something-about-space-294090"},{"collectionId":287563,"title":"For Those Seeking Peace of Mind","hasSubCategories":false,"url":"/collection/for-those-seeking-peace-of-mind-287563"},{"collectionId":287570,"title":"For the Aspiring Aficionado","hasSubCategories":false,"url":"/collection/for-the-bougielicious-287570"},{"collectionId":291903,"title":"For the Budding Cannabis Enthusiast","hasSubCategories":false,"url":"/collection/for-the-budding-cannabis-enthusiast-291903"},{"collectionId":291934,"title":"For the Exam-Season Crammer","hasSubCategories":false,"url":"/collection/for-the-exam-season-crammer-291934"},{"collectionId":287569,"title":"For the Hopeless Romantic","hasSubCategories":false,"url":"/collection/for-the-hopeless-romantic-287569"},{"collectionId":287567,"title":"For the Unabashed Hippie","hasSubCategories":false,"url":"/collection/for-the-unabashed-hippie-287567"},{"collectionId":292186,"title":"Just DIY It","hasSubCategories":false,"url":"/collection/just-diy-it-292186"}],"navigationCollectionsLoadedStatus":"success","navigationCategories":{"books":{"0":{"data":[{"categoryId":33512,"title":"Technology","hasSubCategories":true,"url":"/category/books/technology-33512"},{"categoryId":33662,"title":"Academics & The Arts","hasSubCategories":true,"url":"/category/books/academics-the-arts-33662"},{"categoryId":33809,"title":"Home, Auto, & Hobbies","hasSubCategories":true,"url":"/category/books/home-auto-hobbies-33809"},{"categoryId":34038,"title":"Body, Mind, & Spirit","hasSubCategories":true,"url":"/category/books/body-mind-spirit-34038"},{"categoryId":34224,"title":"Business, Careers, & Money","hasSubCategories":true,"url":"/category/books/business-careers-money-34224"}],"breadcrumbs":[],"categoryTitle":"Level 0 Category","mainCategoryUrl":"/category/books/level-0-category-0"}},"articles":{"0":{"data":[{"categoryId":33512,"title":"Technology","hasSubCategories":true,"url":"/category/articles/technology-33512"},{"categoryId":33662,"title":"Academics & The Arts","hasSubCategories":true,"url":"/category/articles/academics-the-arts-33662"},{"categoryId":33809,"title":"Home, Auto, & Hobbies","hasSubCategories":true,"url":"/category/articles/home-auto-hobbies-33809"},{"categoryId":34038,"title":"Body, Mind, & Spirit","hasSubCategories":true,"url":"/category/articles/body-mind-spirit-34038"},{"categoryId":34224,"title":"Business, Careers, & Money","hasSubCategories":true,"url":"/category/articles/business-careers-money-34224"}],"breadcrumbs":[],"categoryTitle":"Level 0 Category","mainCategoryUrl":"/category/articles/level-0-category-0"}}},"navigationCategoriesLoadedStatus":"success"},"searchState":{"searchList":[],"searchStatus":"initial","relatedArticlesList":[],"relatedArticlesStatus":"initial"},"routeState":{"name":"Article3","path":"/article/academics-the-arts/math/statistics/using-linear-regression-to-predict-an-outcome-169714/","hash":"","query":{},"params":{"category1":"academics-the-arts","category2":"math","category3":"statistics","article":"using-linear-regression-to-predict-an-outcome-169714"},"fullPath":"/article/academics-the-arts/math/statistics/using-linear-regression-to-predict-an-outcome-169714/","meta":{"routeType":"article","breadcrumbInfo":{"suffix":"Articles","baseRoute":"/category/articles"},"prerenderWithAsyncData":true},"from":{"name":null,"path":"/","hash":"","query":{},"params":{},"fullPath":"/","meta":{}}},"dropsState":{"submitEmailResponse":false,"status":"initial"},"sfmcState":{"status":"initial"},"profileState":{"auth":{},"userOptions":{},"status":"initial"}}

Statistical researchers often use a linear relationship to predict the (average) numerical value of Y for a given value of X using a straight line (called the regression line).

If you know the slope and the y-intercept of that regression line, then you can plug in a value for X and predict the average value for Y. In other words, you predict (the average) Y from X.

If you establish at least a moderate correlation between X and Y through both a correlation coefficient and a scatterplot, then you know they have some type of linear relationship.

Never do a regression analysis unless you have already found at least a moderately strong correlation between the two variables. (A good rule of thumb is it should be at or beyond either positive or negative 0.50.) If the data don’t resemble a line to begin with, you shouldn’t try to use a line to fit the data and make predictions (but people still try).

Before moving forward to find the equation for your regression line, you have to identify which of your two variables is X and which is Y. When doing correlations, the choice of which variable is X and which is Y doesn’t matter, as long as you’re consistent for all the data. But when fitting lines and making predictions, the choice of X and Y does make a difference.

So how do you determine which variable is which? In general, Y is the variable that you want to predict, and X is the variable you are using to make that prediction. For example, say you are using the number of times a population of crickets chirp to predict the temperature. In this case you would make the variable Y the temperature, and the variable X the number of chirps. Hence Y can be predicted by X using the equation of a line if a strong enough linear relationship exists.

Statisticians call the X-variable (cricket chirps in this example) the explanatory variable, because if X changes, the slope tells you (or explains) how much Y is expected to change in response. Therefore, the Y variable is called the response variable. Other names for X and Y include the independent and dependent variables, respectively.

In the case of two numerical variables, you can come up with a line that enables you to predict Y from X, if (and only if) the following two conditions are met:

The scatterplot must form a linear pattern.
The correlation, r, is moderate to strong (typically beyond 0.50 or –0.50).

Some researchers actually don’t check these conditions before making predictions. Their claims are not valid unless the two conditions are met.

But suppose the correlation is high; do you still need to look at the scatterplot? Yes. In some situations the data have a somewhat curved shape, yet the correlation is still strong; in these cases making predictions using a straight line is still invalid. Predictions in these cases need to be made based on other methods that use a curve instead.