Yishuo School District (33) | SPSS Statistical Analysis (43) Binary Logistic Regression Analysis
SPSS(43).mp37:22来自LearningYard学苑分享兴趣,传播快乐,增长见闻,留下美好! 大家好,这里是小编。欢迎大家继续访问学苑内容,我们将竭诚为您带来更多更好的内容分享。
"Share interest, spread happiness, increase knowledge, and leave a good impression! Hello everyone, this is Xiaobian. Welcome to continue to visit the content of Xueyuan, and we will wholeheartedly bring you more and better content to share.
All the regression analysis variables we mentioned above are quantitative variables, but in real life, dependent variables are both quantitative and qualitative. Dependent variables are qualitative variables, such as negative and positive in medicine, survival and death, purchase behavior in consumption phenomenon or not, IPO in financial phenomenon or not, etc.
There are many statistical analysis methods that can handle qualitative dependent variables, such as discriminant analysis, probit analysis, logistic regression analysis and log linear analysis. Logistic regression analysis is most widely used in social sciences. Logistic regression analysis can be divided into binary logistic regression analysis and multivariate logistic regression analysis according to the number of categories of dependent variables. The dependent variable in the binary logistic regression model can only take two values 1 and 0 (dummy variable). Lets use an example to briefly introduce the binary logistic regression model.
It is an important function of audit verification to diagnose and discover financial enterprises that are in bad operation. Failure to classify audit verification will lead to disastrous consequences. The following chart lists some operating financial ratios of 66 companies, 33 of which went bankrupt two years later (y=0), and 33 of which remained solvent in the same period (y=1). Please fit a logistic regression model with variables x1 (undistributed profits/total assets), x2 (pre tax profits/total assets), and x3 (sales/total assets).
The first step is to analyze and organize data. There are three independent variables, all of which are quantitative data types. The dependent variable is qualitative and has two values (0 and 1), which is a typical problem that can be solved by binary logistic regression. We define three independent variables x1, x2 and x3, and then define the dependent variable y, input data and save it.
Step 2: set binary logistic regression analysis, select the menu "Analysis ->Regression ->Binary Logistic", open the binary logistic regression dialog box, and set it as shown below.
The third step is the main results and analysis.
The figure below shows the summary information of case processing, showing the number of records of data entering the model.
The following figure is the assignment table of dependent variables. In SPSS, by default, the variables with more occurrences in the secondary category are assigned to 1. This example is special. The two cases of the second category variable occur the same number of times. It can be seen from the table that "bankruptcy after two years" is assigned to 0, and "solvency after two years" is assigned to 1
The following figure is the initial classification prediction table of the model. At this time, the model does not contain any independent variables, but only constant items. The left side of the table represents the actual observation value, and the right side represents the prediction value and accuracy of the model. At this time, it is predicted that all companies will still be solvent in two years, and the correct rate of prediction is 50%.
The following two tables show the test results of the model. The coefficient of constant term is 0.000, and its significance probability is 1. It can be seen that the constant term is not significant. The concomitant probabilities of X1, x2 and x3 are 0.000, 0.000 and 0.094 respectively. If 5% confidence is taken, the coefficients of x1 and x2 are significant.
The following figure shows the Omnibus test results of model coefficients. Three test methods are used, namely, the relative likelihood ratio test between steps, the relative likelihood ratio test between blocks and the relative likelihood ratio test between models. Since there is only one independent variable group in this example and all variables are included in the model by forced entry, the results of the three test methods are consistent, and the model has significant statistical significance.
The following figure is a summary of the model, mainly showing the two determination coefficients of the logarithmic likelihood value. From the data point of view, the fitting degree of the model is good.
The following figure shows the classification prediction of the model. At this time, the prediction accuracy of the model has reached 97%.
The following is the fitting result of the Logistic model. From left to right, the table shows the coefficient value (B), standard error (S.E), wald chi square value, degree of freedom (df), significance probability and Exp (B) of variables and constant terms. Since all regression coefficients are positive, the corresponding index will be greater than 1, indicating that the greater the value of x1, x2, x3, the greater the possibility of "solvency in two years time" than "bankruptcy in two years time".
If the probability of the predicted value p is less than 0.5, the sample is classified into the "bankruptcy after two years" group. On the contrary, enter the group of "two years later, right solvency", and the forecast results are shown in the figure below, PRE_ 1 is the predicted probability value, PGR_ 1 indicates the predicted classification result value.
Preview of the next issue: In this issue, we learned the practical operation of nonlinear regression. In the next issue, we will learn about clustering and discriminant analysis.
Thats all for todays sharing. If you have unique ideas about todays article, please leave us a message. Lets meet tomorrow. I wish you a happy day today!
参考资料:百度百科,《SPSS 23 统计分析实用教程》