1 Introduction
What is a latent variable? Whats is the latent class logit model? What does the model do? what are the advantages and disadvantages of using this model?
The latent class logit (lclogit) or discrete mixture logit model is a stata module that uses the EM algorithm to fit latent class conditional logit (Pacifico and Yoo, 2013). Traditionally, latent class models are normally estimated using gradient-based optimization techniques such as Newton-Raphson or Berndt–Hall–Hall–Hausman (BHHH) algorithmBerndt et al. (1974). When these techniques are used and the number of parameter or latent classes increase it becomes difficult to estimate the maximum likelihood and takes more time to calculate the gradient. Bhat (1997); Train (2008) stated that the expectation-maximisation (EM) algo- rithm can be used in place of these traditional algorithms as it makes the model numerically more stable and estimate parameters efficiently with the large number of parameters.
A latent class logit model when compared to the mixed logit model the computational cost is lower and the processing time is faster. The EM algorithm iterates until the maximum likelihood reaches convergence. The discrete mixture approach is more flexible and easier to implement.
2 EM algorithm for latent class logit
Train (2008) used Bhat (1997) research on latent class model to show that the EM algorithm can be used on with a large number of parameters. The EM algorithm that is used in Pacifico and Yoo (2013) will be used in this paper. Let N be the agents, J is the alternatives and T be the choice scenarios. also ynjt denote the alternative that agent n chooses in situation t the alternative j.
Pn(βc) = T∏ t=1
J∏ j=1
(exp(βcxnjt)∑J k=1 exp(βcxnkt)
(1)
Equation 1 shows the choice probability of a of conditional logit where all the variables were previously described expect for Pn which is the choice probability, β is the parameters and C is the classes.
πcn(θ) = exp(θczn)
1 + ∑C−1
l=1 exp(θlzn) (2)
In equation 2 the weighted average of equation one is divided by the classes to get the weight for for class c, which is πcn(θ) θ = (θ1,θ2, …,θc−1 denotes the class membership parameter
lnL(β,θ) = N∑
n=1
ln C∑ c=1
πcn(θ)Pn(βc) (3)
1
The sample log likelihood is depicted in equation 3 which is derived by adding the log unconditional likelihood.
βs+1 = argmaxβ N∑
n=1
C∑ c=1
hcn(β s,θs)lnPn(βc)
θs+1 = argmaxθ N∑
n=1
C∑ c=1
hcn(β s,θs)lnπcn(θ)
(4)
The term being maximized in Equation 4 is the log likelihood function for a logit model with each choice situation of each agent treated as an observation. Let s be the estimates for the sth iteration, hcn(β
s,θs) is the posterior probability.
hcn(β s,θs) =
πcn(θ s)Pn(β
s c∑C
l=1 πln(θ s)Pn(β
s l )
(5)
The updating procedure can be implemented easily in Stata, exploiting clogit and fmlogit routines as follows. β(s+ 1) is computed by fitting a conditional logit model(clogit) C times, each time using hcn(βs,θs)for a particular c to weight observations on each n. θ
s+1 is obtained by fitting a fractional multinomial logit model (fmlogit) that takes hln(βs,θs), h2n(βs,θs),…, hCn(βs,θs) as dependent variables. When zn only includes the constant term so that each class share is the same for all agents, that is, when πcn(θ) = πc(θ), each class share can be directly updated by using the following analytical solution without fitting the fractional multinomial logit model:
πc(θ s+1) =
∑N n=1 hcn(β
s,θs)∑C c=1
∑N n=1 hln(β
s,θs) (6)
3 The lclogit command
The lclogit Stata command user-written program that is a numerically stable, faster and a more cost effective method of estimating nonparametric estimation of mixing distributions. These characteristics allows for the command to estimate a large number of latent classes in a short period of time. In addition, log probabilities and the generate command in stata are used which also reduces the estimation time. The clogit maximum likelihood evaluator is used as the lclogit does not have it’s own.
The results are displayed in a table by using the estimate store and estimate table pro- grams with the the columns labelled as the classes. If the latent classes are 20 and above the results will be in matrix form and no longer a table.
Pacifico and Yoo (2013) stated that their are certain requirements that is needed for the lclogit command such as: Group() and id() which are numeric variables that shows the choice occasion and choice makers respectively. If cross sectional data is being used then the same variable can be selected for each. Another option is the number of latent classes
2
which is selected by using the CAIC and BIC criteria methods. These information criteria also helps to select when the model has been converged, Convergence(). When convergence is declared the threshold stops and the maximum number of iterations, iterate(#) and the log likelihood is specified. The membership(varlist) uses constant independent variables for the fractional multinomial logit model of class membership. s.
4 Post-estimation command: lclogitpr
Pacifico and Yoo (2012) stated that the probabilities of selecting each alternative in the choice occassion can be predicted by the lclogitpr. The options for lclogitpr:
• class(numlist) is the class
• pr0 estimates the unconditional choice probability;
• up estimates the class shares or prior probabilities that the agent is in particular classes.
5 Post-estimation command: lclogitcov
This command shows the choice models coefficients by estimating the variance and co- variance.
”The default setting stores the predicted variances in a set of variables named var 1, var 2, …, where var k is the predicted variance of the coefficient on the kth variable listed in varlist, and to store the predicted covariances in cov 12, cov 13, …, cov 23, …, where cov kj is the predicted covariance between the coefficients on the kth variable and the jth variable in varlist.”(Pacifico and Yoo, 2012, p.631)
• nokeep shows the average covariance matrix and removes the drops the predicted vari- ances and covariances
• varname(stubname) states what the predicted variance should be saved as stubname1, stubname2, ….
• covname(stubname) tates what the predicted covariance should be saved as stub- name12, stubname13, ….
• matrix(name) stores the reported average covariance matrix in a Stata matrix called name.
6 Application
The lclogit command will be used on an example that Pacifico and Yoo (2013) used to estimate latent class logit. The data in the example is used to determine the preference of
3
household’s choice of electricity supplier. The example consist of 100 customers have at least 12 choice situations with 4 suppliers in which they can only choose one. The data contatins the; the price of the contract; length of contract that the supplier offered(years); whether the supplier is a local company (local); Whether the supplier is a well-known company (wknown); Whether the supplier offers a time-of-day rate instead of a fixed rate (tod)and Whether the supplier offers a seasonal rate instead of a fixed rate (seasonal). The data can be seen in Table one where y is the dummy variable for choice; pid and gid are numeric variables that shows the agents and the choice situations.
Table 1: Variables and Data
y price contract local wknown tod seasonal gid pid x1
1. 0 7 5 0 1 0 0 1 1 27
2. 0 9 1 1 0 0 0 1 1 27
3. 0 0 0 0 0 0 1 1 1 27
4. 1 0 5 0 1 1 0 1 1 27
5. 0 7 0 0 1 0 0 2 1 27
6. 0 9 5 0 1 0 0 2 1 27
7. 1 0 1 1 0 1 0 2 1 27
8. 0 0 5 0 0 0 1 2 1 27
9. 0 9 5 0 0 0 0 3 1 27
10. 0 7 1 0 1 0 0 3 1 27
11. 0 0 0 0 1 1 0 3 1 27
12. 1 0 0 1 0 0 1 3 1 27
The above table was derived by using the following commands in Stata:
• use http://fmwww.bc.edu/repec/bocode/t/traindata.dta
• set seed 1234567890
• by pid, sort: egen x1=sum(round(rnormal(0.5),1))
• list in 1/12, sepby(gid)
The information criteria used was the CAIC and the BIC to select the optimal number of latent classes. The estimation results can be seen in the Table the CAIC decreases from 2337.273 to 2292.538 as the the fifth class was added and increases to 2313.10 when sixth class was added. The same was done for the BIC criteria except that the number of latent
4
classes is eight. This example however, will use the 5 classes. The following commands were used in Stata:
• forvalues c = 2/10
• quietly lclogit y price contract local wknown tod seasonal, group(gid) id(pid) nclasses (‘c’) membership ( x1) seed(1234567890)
• matrix b = e(b)
• matrix ic = nullmat(ic) ‘e(nclasses)´, ‘e(ll)´, ‘=colsof(b)´, ‘e(caic)´, ‘e(bic)´
(output omitted )
•• matrix colnames ic = ”Classes” ”LLF” ”Nparam” ”CAIC” ”BIC”
• matlist ic, name(columns)
Table 2: Number of Class Selection
Classes LLF Nparam CAIC BIC
2 -1211.232 14 2500.935 2486.935
3 -1117.521 22 2258.356 2336.356
4 -1084.559 30 2337.273 2307.273
5 – 1039.771 38 2292.538 2254.538
6 – 1027.633 46 2313.103 2267.103
7 -999.9628 54 2302.605 2248.605
8 -987.7199 62 2322.96 2260.96
9 -985.1933 70 2362.748 2292.748
10 -966.3487 78 2369.901 2291.901
Table 3 gives the estimated model with 5 classes. Class 2 is the largest class with 28 percent. The average share over agents is represented by the class shares. This is the case because the class shares are now agent specific which is estimated by using the lclogitpr command.
• by ‘(id)’, sort: generate first = n==1
• lclogitpr cp, cp
• egen double cpmax = rowmax(cp1-cp5)
5
Table 3: Choice Model parameters and average class share
Variable Class1 Class2 Class3 Class4 Class5
price -0.315 -0.562 -0.887 -1.497 -0.762
contract 0.025 -0.083 -0.470 -0.380 -0.538
local 3.072 4.512 0.400 0.803 0.526
wknown 2.256 3.405 0.424 1.075 0.317
tod -2.183 -7.872 -8.245 -15.229 -5.356
seasonal -2.484 -7.705 -6.225 -14.419 -7.760
Class Share 0.300 0.174 0.112 0.254 0.160
Variable Class1 Class2 Class3 Class4 Class5
x1 -0.011 0.024 -0.022 -0.027 0.000
cons 0.902 -0.556 0.172 1.119 0.000
• summarize cpmax if first, sep(0)
The lclogitpr command is used to describe the efficiency of the model when measuring the difference in class preference. The mean of .95 states that the model did a good job of differentiating preference of each class. This can be seen in Table 4. The following command was inserted in stata: lclogit y price contract local wknown tod seasonal, group(gid) id(pid) nclasses(5) membership( x1) seed(1234567890)
Table 4: Fitness of Model
Variable Obs Mean Std. Dev. Min Max
cpmax 100 .9596674 0.860159 .5899004 1
The respondents are classifed in classes based on the one that gives that agent ”high posterior probablity” (Pacifico and Yoo, 2013). This is done so that the choice outcomes within within the model can be predicted. The conditional and unconditional probability for the choice is computed for being in that class.
• lclogitpr pr, pr
• generate byte class = . (4780 missing values generated)
6
• forvalues c = 1/‘e(nclasses)´
• quietly replace class = ‘c´ if cpmax==cp‘c´
forvalues c = 1/‘e(nclasses)´
•• quietly summarize pr if class == ‘c´ y==1
• local n=r(N)
• local a=r(mean)
• quietly summarize pr‘c´ if class == ‘c´ y==1
• local b=r(mean)
• matrix pr = nullmat(pr) ‘n´, ‘c´, ‘a´, ‘b´
•• matrix colnames pr = ”Obs” ”Class” ”Uncond Pr” ”Cond PR”
• matlist pr, name(columns)
Table 5 shows the conditional and unconditional probabilities if the model. Pacifico and Yoo (2013) states that the average conditional is 0.5 while the unconditional probability is 0.25. The probabilities depicted in the table is higher than the the usual probably which indicates that this model is estimates observed choice situations. The fowllowing stata code was used
• matrix list e(PB)
• e(PB)[1,6]
Table 5: Conditional and Unconditional Probabilities
Obs Class Uncondi Pr Cond Pr
129 1 .3364491 .5387555
336 2 .3344088 .4585939
191 3 .3407353 .5261553
300 4 .4562778 .7557497
239 5 .4321717 .6582177
7
7 Conclusion
The lclogit is a stata command that uses the EM algorithm to estimate discrete mixing distribution choices. This algorithm allows for large parameters to be estimated in a shorter period of time and with a lower computational cost. It is also used to describe the efficiency of the model when measuring the difference in class preference The CAIC and BIC are used to select the number of latent classes. The EM algorithm makes the model numerically more stable.
8
References
Berndt, E. R., Hall, B. H., Hall, R. E., and Hausman, J. A. (1974). Estimation and inference in nonlinear structural models. In Annals of Economic and Social Measurement, Volume 3, number 4, pages 653–665. NBER.
Bhat, C. R. (1997). An endogenous segmentation mode choice model with an application to intercity travel. Transportation science, 31(1):34–48.
Pacifico, D. and Yoo, H. I. (2012). A stata module for estimating latent class conditional logit models via the expectation-maximization algorithm. Technical report, School of Economics, The University of New South Wales.
Pacifico, D. and Yoo, H. I. (2013). lclogit: A stata command for fitting latent-class con- ditional logit models via the expectation-maximization algorithm. The Stata Journal, 13(3):625–639.
Train, K. E. (2008). Em algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling, 1(1):40–69.