Using PCA vs Linear RegressionMaking sense of principal component analysis, eigenvectors & eigenvaluesAnalysis of compounds using PCA - selecting the right PCA “type” for the data…?Should I use dummy variables or just assign numerical values to categorical predictors in regression / PCA?PCA too slow when both n,p are large: Alternatives?How to weight composites based on PCA with longitudinal data?My multiple linear regressions give the same coefficient estimates. How do I fix that?

Is there any physical evidence for motion?

Is there a name for the phenomenon of false positives counterintuitively outstripping true positives

Is the word "attendance" derived from "at ten dance"?

529 accounts for multiple kids

Is there an unambiguous name for the social/political theory "liberalism" without "leftist"?

Twelve Labours - #02 Lernaean Hydration

Palindrome and Reverse a String Problems (JavaScript, Python)

Which audio encoders in FFmpeg support 8 kHz?

What abilities can the Hex spell target on an adult white dragon?

Should I use muryou or tada 無料 or 只?

Had J. K. Rowling seen This Is Spinal Tap before writing Harry Potter and the Philosopher's Stone?

Which (if any) if the "most major" airport / field / strip on Antartica?

SpaceX Starship landing on Moon or Mars: why doesn't it fall over?

Duality in mixed integer linear programs

Is it plausible for a certain area of a continent to be/remain/become uninhabited for a long period of time?

Why doesn't the nucleus have "nucleus-probability cloud"?

How do I get softer pictures in sunlight, like in this commercial?

How much damage should a creature take if it is walking across lava while wearing a Ring of Water Walking?

How do I update sudo package version?

If equal temperament divides an octave into 12 equal parts, why are the hertz differences not the same but 12ths of two?

Why do the Romance languages use definite articles, when Latin doesn't?

Why are telemedicine services regional?

What is the easiest way to list all the user:group found in a tarball?

Samples of old guidance software



Using PCA vs Linear Regression


Making sense of principal component analysis, eigenvectors & eigenvaluesAnalysis of compounds using PCA - selecting the right PCA “type” for the data…?Should I use dummy variables or just assign numerical values to categorical predictors in regression / PCA?PCA too slow when both n,p are large: Alternatives?How to weight composites based on PCA with longitudinal data?My multiple linear regressions give the same coefficient estimates. How do I fix that?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









2














$begingroup$


I'm looking to analyzing data from a study and previous studies that are similar have used either PCA or hierarchical linear regression to analyze the data. I've used both PCA and linear regression previously. From my understanding PCA breaks the data down into principal components and is useful for learning what factors may be strong indicators of our dependent variable, and that linear regression can be used to compare correlation.



How should I be approaching this? If I'm simply wanting to find out what correlates the strongest with my studies dependent variable what would be the best option? Can I use both PCA and then hierarchical linear regression?










share|cite|improve this question












$endgroup$





















    2














    $begingroup$


    I'm looking to analyzing data from a study and previous studies that are similar have used either PCA or hierarchical linear regression to analyze the data. I've used both PCA and linear regression previously. From my understanding PCA breaks the data down into principal components and is useful for learning what factors may be strong indicators of our dependent variable, and that linear regression can be used to compare correlation.



    How should I be approaching this? If I'm simply wanting to find out what correlates the strongest with my studies dependent variable what would be the best option? Can I use both PCA and then hierarchical linear regression?










    share|cite|improve this question












    $endgroup$

















      2












      2








      2


      1



      $begingroup$


      I'm looking to analyzing data from a study and previous studies that are similar have used either PCA or hierarchical linear regression to analyze the data. I've used both PCA and linear regression previously. From my understanding PCA breaks the data down into principal components and is useful for learning what factors may be strong indicators of our dependent variable, and that linear regression can be used to compare correlation.



      How should I be approaching this? If I'm simply wanting to find out what correlates the strongest with my studies dependent variable what would be the best option? Can I use both PCA and then hierarchical linear regression?










      share|cite|improve this question












      $endgroup$




      I'm looking to analyzing data from a study and previous studies that are similar have used either PCA or hierarchical linear regression to analyze the data. I've used both PCA and linear regression previously. From my understanding PCA breaks the data down into principal components and is useful for learning what factors may be strong indicators of our dependent variable, and that linear regression can be used to compare correlation.



      How should I be approaching this? If I'm simply wanting to find out what correlates the strongest with my studies dependent variable what would be the best option? Can I use both PCA and then hierarchical linear regression?







      regression pca






      share|cite|improve this question
















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited May 29 at 0:19









      Ben

      39.1k2 gold badges51 silver badges170 bronze badges




      39.1k2 gold badges51 silver badges170 bronze badges










      asked May 28 at 20:01









      4ntibody4ntibody

      111 bronze badge




      111 bronze badge























          4 Answers
          4






          active

          oldest

          votes


















          6
















          $begingroup$

          PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.



          Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.



          Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.



          If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.






          share|cite|improve this answer










          $endgroup$






















            1
















            $begingroup$

            These techniques are not exclusive, and they can be complimentary.



            PCA is a dimension reduction technique. The number of dimensions in your dataset corresponds to the number of observations you have per case. For example, imagine your data is survey data, and you administered a 100 item questionnaire. Each individual who completed the questionnaire is represented by a single point in 100 dimensional space. The goal of PCA is to simplify this space in such a way that the distribution of points is preserved in fewer dimensions. This simplification can help you to describe the data more elegantly, but it can also reveal the dominant trends in your data. A great explanation of PCA can be found here: Making sense of principal component analysis, eigenvectors & eigenvalues



            Hierarchical linear regression is used to determine whether a predictor (or set of predictors) explains variance in an outcome variable over and above some other predictor (or set of predictors). For example, you may want to know if exercising (IV1) or eating well (IV2) is a better predictor of cardiovascular health (DV). Hierarchical linear regression can help answer this question.



            If your data is complex (i.e. you have many variables) you can apply PCA to reduce the number of variables/find the "latent variables". These latent variables can then be used in the hierarchical linear regression.



            Best of luck!






            share|cite|improve this answer












            $endgroup$














            • $begingroup$
              Thank you for everybody's quick comments and insight! I now know what i need to do.
              $endgroup$
              – 4ntibody
              May 28 at 21:21



















            0
















            $begingroup$

            As other answers have said, PCA and Linear Regression (in general) are different tools.



            PCA is an unsupervised method (only takes in data, no dependent variables) and Linear regression (in general) is a supervised learning method. If you have a dependent variable, a supervised method would be suited to your goals.



            If you're trying to find out which variables in your data capture most of the variation in the data, PCA is a useful tool.






            share|cite|improve this answer










            $endgroup$






















              0
















              $begingroup$

              If you are just looking for correlation between variables, you can estimate this simply with the correlation coefficient. It will tell you the strength of the correlation between two variables.






              share|cite|improve this answer










              $endgroup$
















                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "65"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );














                draft saved

                draft discarded
















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410516%2fusing-pca-vs-linear-regression%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown


























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                6
















                $begingroup$

                PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.



                Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.



                Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.



                If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.






                share|cite|improve this answer










                $endgroup$



















                  6
















                  $begingroup$

                  PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.



                  Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.



                  Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.



                  If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.






                  share|cite|improve this answer










                  $endgroup$

















                    6














                    6










                    6







                    $begingroup$

                    PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.



                    Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.



                    Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.



                    If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.






                    share|cite|improve this answer










                    $endgroup$



                    PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.



                    Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.



                    Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.



                    If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.







                    share|cite|improve this answer













                    share|cite|improve this answer




                    share|cite|improve this answer










                    answered May 28 at 20:38









                    Peter FlomPeter Flom

                    81.5k13 gold badges118 silver badges230 bronze badges




                    81.5k13 gold badges118 silver badges230 bronze badges


























                        1
















                        $begingroup$

                        These techniques are not exclusive, and they can be complimentary.



                        PCA is a dimension reduction technique. The number of dimensions in your dataset corresponds to the number of observations you have per case. For example, imagine your data is survey data, and you administered a 100 item questionnaire. Each individual who completed the questionnaire is represented by a single point in 100 dimensional space. The goal of PCA is to simplify this space in such a way that the distribution of points is preserved in fewer dimensions. This simplification can help you to describe the data more elegantly, but it can also reveal the dominant trends in your data. A great explanation of PCA can be found here: Making sense of principal component analysis, eigenvectors & eigenvalues



                        Hierarchical linear regression is used to determine whether a predictor (or set of predictors) explains variance in an outcome variable over and above some other predictor (or set of predictors). For example, you may want to know if exercising (IV1) or eating well (IV2) is a better predictor of cardiovascular health (DV). Hierarchical linear regression can help answer this question.



                        If your data is complex (i.e. you have many variables) you can apply PCA to reduce the number of variables/find the "latent variables". These latent variables can then be used in the hierarchical linear regression.



                        Best of luck!






                        share|cite|improve this answer












                        $endgroup$














                        • $begingroup$
                          Thank you for everybody's quick comments and insight! I now know what i need to do.
                          $endgroup$
                          – 4ntibody
                          May 28 at 21:21
















                        1
















                        $begingroup$

                        These techniques are not exclusive, and they can be complimentary.



                        PCA is a dimension reduction technique. The number of dimensions in your dataset corresponds to the number of observations you have per case. For example, imagine your data is survey data, and you administered a 100 item questionnaire. Each individual who completed the questionnaire is represented by a single point in 100 dimensional space. The goal of PCA is to simplify this space in such a way that the distribution of points is preserved in fewer dimensions. This simplification can help you to describe the data more elegantly, but it can also reveal the dominant trends in your data. A great explanation of PCA can be found here: Making sense of principal component analysis, eigenvectors & eigenvalues



                        Hierarchical linear regression is used to determine whether a predictor (or set of predictors) explains variance in an outcome variable over and above some other predictor (or set of predictors). For example, you may want to know if exercising (IV1) or eating well (IV2) is a better predictor of cardiovascular health (DV). Hierarchical linear regression can help answer this question.



                        If your data is complex (i.e. you have many variables) you can apply PCA to reduce the number of variables/find the "latent variables". These latent variables can then be used in the hierarchical linear regression.



                        Best of luck!






                        share|cite|improve this answer












                        $endgroup$














                        • $begingroup$
                          Thank you for everybody's quick comments and insight! I now know what i need to do.
                          $endgroup$
                          – 4ntibody
                          May 28 at 21:21














                        1














                        1










                        1







                        $begingroup$

                        These techniques are not exclusive, and they can be complimentary.



                        PCA is a dimension reduction technique. The number of dimensions in your dataset corresponds to the number of observations you have per case. For example, imagine your data is survey data, and you administered a 100 item questionnaire. Each individual who completed the questionnaire is represented by a single point in 100 dimensional space. The goal of PCA is to simplify this space in such a way that the distribution of points is preserved in fewer dimensions. This simplification can help you to describe the data more elegantly, but it can also reveal the dominant trends in your data. A great explanation of PCA can be found here: Making sense of principal component analysis, eigenvectors & eigenvalues



                        Hierarchical linear regression is used to determine whether a predictor (or set of predictors) explains variance in an outcome variable over and above some other predictor (or set of predictors). For example, you may want to know if exercising (IV1) or eating well (IV2) is a better predictor of cardiovascular health (DV). Hierarchical linear regression can help answer this question.



                        If your data is complex (i.e. you have many variables) you can apply PCA to reduce the number of variables/find the "latent variables". These latent variables can then be used in the hierarchical linear regression.



                        Best of luck!






                        share|cite|improve this answer












                        $endgroup$



                        These techniques are not exclusive, and they can be complimentary.



                        PCA is a dimension reduction technique. The number of dimensions in your dataset corresponds to the number of observations you have per case. For example, imagine your data is survey data, and you administered a 100 item questionnaire. Each individual who completed the questionnaire is represented by a single point in 100 dimensional space. The goal of PCA is to simplify this space in such a way that the distribution of points is preserved in fewer dimensions. This simplification can help you to describe the data more elegantly, but it can also reveal the dominant trends in your data. A great explanation of PCA can be found here: Making sense of principal component analysis, eigenvectors & eigenvalues



                        Hierarchical linear regression is used to determine whether a predictor (or set of predictors) explains variance in an outcome variable over and above some other predictor (or set of predictors). For example, you may want to know if exercising (IV1) or eating well (IV2) is a better predictor of cardiovascular health (DV). Hierarchical linear regression can help answer this question.



                        If your data is complex (i.e. you have many variables) you can apply PCA to reduce the number of variables/find the "latent variables". These latent variables can then be used in the hierarchical linear regression.



                        Best of luck!







                        share|cite|improve this answer















                        share|cite|improve this answer




                        share|cite|improve this answer








                        edited May 28 at 20:42

























                        answered May 28 at 20:22









                        unicoderunicoder

                        315 bronze badges




                        315 bronze badges














                        • $begingroup$
                          Thank you for everybody's quick comments and insight! I now know what i need to do.
                          $endgroup$
                          – 4ntibody
                          May 28 at 21:21

















                        • $begingroup$
                          Thank you for everybody's quick comments and insight! I now know what i need to do.
                          $endgroup$
                          – 4ntibody
                          May 28 at 21:21
















                        $begingroup$
                        Thank you for everybody's quick comments and insight! I now know what i need to do.
                        $endgroup$
                        – 4ntibody
                        May 28 at 21:21





                        $begingroup$
                        Thank you for everybody's quick comments and insight! I now know what i need to do.
                        $endgroup$
                        – 4ntibody
                        May 28 at 21:21












                        0
















                        $begingroup$

                        As other answers have said, PCA and Linear Regression (in general) are different tools.



                        PCA is an unsupervised method (only takes in data, no dependent variables) and Linear regression (in general) is a supervised learning method. If you have a dependent variable, a supervised method would be suited to your goals.



                        If you're trying to find out which variables in your data capture most of the variation in the data, PCA is a useful tool.






                        share|cite|improve this answer










                        $endgroup$



















                          0
















                          $begingroup$

                          As other answers have said, PCA and Linear Regression (in general) are different tools.



                          PCA is an unsupervised method (only takes in data, no dependent variables) and Linear regression (in general) is a supervised learning method. If you have a dependent variable, a supervised method would be suited to your goals.



                          If you're trying to find out which variables in your data capture most of the variation in the data, PCA is a useful tool.






                          share|cite|improve this answer










                          $endgroup$

















                            0














                            0










                            0







                            $begingroup$

                            As other answers have said, PCA and Linear Regression (in general) are different tools.



                            PCA is an unsupervised method (only takes in data, no dependent variables) and Linear regression (in general) is a supervised learning method. If you have a dependent variable, a supervised method would be suited to your goals.



                            If you're trying to find out which variables in your data capture most of the variation in the data, PCA is a useful tool.






                            share|cite|improve this answer










                            $endgroup$



                            As other answers have said, PCA and Linear Regression (in general) are different tools.



                            PCA is an unsupervised method (only takes in data, no dependent variables) and Linear regression (in general) is a supervised learning method. If you have a dependent variable, a supervised method would be suited to your goals.



                            If you're trying to find out which variables in your data capture most of the variation in the data, PCA is a useful tool.







                            share|cite|improve this answer













                            share|cite|improve this answer




                            share|cite|improve this answer










                            answered May 28 at 23:11









                            AlexanderAlexander

                            1211 silver badge5 bronze badges




                            1211 silver badge5 bronze badges
























                                0
















                                $begingroup$

                                If you are just looking for correlation between variables, you can estimate this simply with the correlation coefficient. It will tell you the strength of the correlation between two variables.






                                share|cite|improve this answer










                                $endgroup$



















                                  0
















                                  $begingroup$

                                  If you are just looking for correlation between variables, you can estimate this simply with the correlation coefficient. It will tell you the strength of the correlation between two variables.






                                  share|cite|improve this answer










                                  $endgroup$

















                                    0














                                    0










                                    0







                                    $begingroup$

                                    If you are just looking for correlation between variables, you can estimate this simply with the correlation coefficient. It will tell you the strength of the correlation between two variables.






                                    share|cite|improve this answer










                                    $endgroup$



                                    If you are just looking for correlation between variables, you can estimate this simply with the correlation coefficient. It will tell you the strength of the correlation between two variables.







                                    share|cite|improve this answer













                                    share|cite|improve this answer




                                    share|cite|improve this answer










                                    answered May 29 at 7:39









                                    JuanJuan

                                    1




                                    1































                                        draft saved

                                        draft discarded















































                                        Thanks for contributing an answer to Cross Validated!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410516%2fusing-pca-vs-linear-regression%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown









                                        Popular posts from this blog

                                        Tamil (spriik) Luke uk diar | Nawigatjuun

                                        Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                        Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?