Provide a clear explanation of what is meant by “left censored” and “right censored” survival times, and illustrate your answer with some examples of how each may arise in a social science context.

Suppose that you have continuous time unemployment spell data. The data were derived using a stock sample with “follow-up” (i.e. interviews some time after the stock sampling date). You also know the date of the interview, at which time information about characteristics were collected, and whether or not the spell in progress at the stock sampling date was still in progress and, if not, the date the spell ended. By deduction, you can calculate the length of time between the stock sample date and the date at which each person was last observed to be unemployed (the interview date for those still unemployed; or some date between the stock sample date and interview date for those who got a job). However, you don’t know the date at which each person’s spell began, and nor therefore the length of each person’s unemployment spell in total from start until last observed. With reference to expressions for the sample log-likelihood function, show that it is possible to estimate the parameters of an Exponential hazard regression model in this case. Also discuss, giving reasons, whether you could estimate a Weibull model with the same data.

[adapted from Wooldridge (2002, Ex. 20.3)] Assume that you have a random sample from the inflow to the state, and all survival times are right-censored.

(i) Write down the sample log-likelihood function for this situation.

(ii) Derive the special case of likelihood function given in (i) when survival times follow the Gompertz distribution. [Recall that the Gompertz model has hazard function q(t, X) = lexp (gt), where l = exp (b0 + b1X1 + b2X2 + â€¦ + bkXk) and shape parameter g > 0.]

(iii) Consider the Gompertz model in which the covariate vector X only contains a constant. Show that the Gompertz log likelihood cannot be maximized for real numbers b0 and g.

(iv) From (iii), what do you conclude about estimating duration models from inflow sample data when all survival times are right censored?

## â€¦â€¦â€¦

## Table of Contents

## Left censored and Right censored

When we deal with observations the observation period is the difference between the time when experiment begins (time is zero) and when it terminates (let, time is T0 in Figure 01). But in many cases the entities under consideration (human/device) don’t come to an end and in those cases we say that it has been suspended, truncated or censored. In many areas of social science and life testing, the subject(s) may leave or enter after they have been put on test. The subject may leave our study before completion (due to failure or death) or may enter late. To analyse such behaviour of human being we are interested in left censored and right censored. Censoring occurs because sometimes our study of interest is lost to follow-up.

Censored data means that the observations are known partially and it reflects the side of the dimension. Stephen P. Jenkins in his ‘Survival Analysis’ wrote,

“A survival time is censored if all that is known is that it began or ended within some particular interval of time, and thus the total spell length (from entry time until transition) is not known exactly.”

(Jenkins 2005, p. 4)

It’s a major problem in social science that some observations are censored but it’s very usual that our study of interest may not survive until the end period.

## Left Censored

Left censoring refers to the event that occurs at a time before a left bound. In this case we don’t know the time when it started. (L Samartzis 2005-06)

It is such a situation that we know the datum is below a certain value but we don’t know how much it is.

Say, for example, a pathological report is revealed which ensures that the patient is suffering from cancer but we have no idea when the patient has been infected.

Figure 01 illustrates the censoring situations where “X” refers the points in time when we actually start or finish monitoring the censored entities, except the beginning (of entity life, at time zero) and the end of the experimental observation period (time T0). Here Line “C” completes its spell and all other entities are interrupted.

Here, “a” shows an entity that has already been “operating” for some unknown period of time, before we start monitoring it. This case is called “left-censoring.” (Dr. J Luis Romeu, n. d.)

## Figure 01: Left and Right Censoring

In a word left censoring means censoring occurs on the left side. If we ignore this type of censoring then there arise ‘selectivity bias’ because left censoring will overestimate the mean duration as longer spells tend to be observed more frequently than shorter spells. (Amemiya 1999)

## Right Censored

Right censoring refers to the event that occurs at a time after a right bound. In this case we don’t know the time when it ended. (L Samartzis 2005-06)

In duration models and survival analysis right censoring occurs very often because in many cases observations are known to be larger than some given value. In this case the only information we have is the right bound.

Say, for example, we start with 500 light bulbs and this will be terminated after an assigned period of time. In this experiment censoring will occur on the right side because we exactly know the starting point of our experiment.

In Figure 01, Line “b” shows an entity that has been monitored since the beginning of its life (i.e. at the start of the experiment) but which we have ceased to observe before the experiment ends (time T0) or it fails. That is, we observe the entity for some time, after which we are not able to monitor it any more. This other type of truncation is known as “right censoring.” (Dr. J Luis Romeu, n. d)

## Comparison between left and right censoring with the help of an example

Suppose, a social scientist is interested in analysing the adverse affect of taking illegal drugs in a particular area (may be Colchester). The researcher is willing to determine the distribution of the time until first Marijuana use among high school boys in that area. The question to be answered by the school boys is:

“When did you first use Marijuana?”

Let us consider two hypothetical replies:

## Respondent 01:

I have used it but cannot remember just when the first time was.

## Respondent 02:

I never used it.

In case of the 1st respondent the event had occurred but exact date at which he started using Marijuana is totally unknown. This is an example of left censored.

On the other hand, in the 2nd case the event not yet occurred but there may be the possibility of taking Marijuana in some future dates. Unlike the left censored the censoring occurs on the right side and thus this is an example of right censored. (Klein and Moeschberger 2003, p. 70-71)

## (b) Stock Sample with “follow-up”

The important things to be considered in this example are:

This is a continuous time unemployment spell data.

The data were derived using a stock sample with follow-up which is a different name of left truncation (delayed entry) and their applications are similar to handle. This type of data is most commonly used by economists. (Jenkins 2005, p. 5)

The stock sample dates are still in progresses which indicate that there are some observations that are right censored.

Let us define,

Ti = Total spell length

f (Ti) = Probability density function (slope of Failure function) at time Ti

S (Ti) = Survival function at time Ti

Î¸ (Ti) = Hazard function at time Ti

S (âˆ†ti) = The date at which the stock sample was drawn

Ci = Censoring indicator

Xi = Vector of observed covariates

b = Parameter to be estimated

N = Sample size

There are two types of contributors,

Those who leave the state of interest.

Those who stay in our state of interest.

So the likelihood function will be,

## N N

## Å = âˆ [ f(Ti) / S (âˆ†ti) ] Ci âˆ [S (Ti) / S (âˆ†ti) ] 1- Ci

## i = 1 i= 1

Now by definition of hazard function, we have

## N

## Å = âˆ [ Î¸ (Ti) ] Ci [S (Ti) / S (âˆ†ti) ]

## i = 1

## N

Or, log Å = âˆ‘ { Ci log Î¸ (Ti) + log S (Ti) – log S (âˆ†ti) } [ Equation no – 01 ]

## i = 1

Equation no – 01 clearly states the log-likelihood function of the example. Now it’s not difficult to consider the Exponential and Weibull model to estimate the parameters.

## For Exponential Model case:

We know that the Exponential model has the following hazard function:

Î¸ (Ti) = Î» where l = exp(b’X)

Now, by definition the survival function can be obtained from the hazard function by the equation below:

t

S(t) = exp ( – âˆ« Î¸(u)du ) [ Equation no – 02 ]

0

So the survival function of the Exponential model is S(t) = exp (-Î»t ). Now plugging the value of the hazard and survival function of the Exponential model in the log-likelihood function (Equation no – 01) we get the Exponential hazard regression model which is as follows:

## N

## log Å = âˆ‘ { Ci log Î» + log [exp (-Î»T )] – log [exp (-Î»âˆ†t )] }

## i = 1

## N

Or, log Å = âˆ‘ { Ci (b’X) – Î»T – Î»âˆ†t }

## i = 1

Once we get the value of the variables we can easily calculate the log-likelihood function of the Exponential hazard regression model.

## For Weibull Model case:

Exponential model is a special case of Weibull model which has the following hazard function:

Î¸ (Ti) = Î» Î± tÎ±-1 where l = exp(b’X)

When Î± = 1 the model describes the Exponential model thus it is nothing but a special case of Weibull model. From equation no – 02 the survival function of Weibull model is,

## S(t) = exp (-Î»tÎ± )

Plugging the value in the log-likelihood function (Equation no – 01) we get the Weibull model,

## N

## log Å = âˆ‘ { Ci log [Î» Î± tÎ±-1] + log [exp (- Î»TÎ±)] – log [exp (- Î»âˆ†tÎ± )] }

## i = 1

## N

Or, log Å = âˆ‘ { Ci (b’X) + Ci log Î± + Ci (Î± – 1) log t – Î»TÎ± – Î»âˆ†tÎ± }

## i = 1

Like the exponential model we can easily calculate the Weibull model when we have the data of the model. The estimation can be obtained from the above log-likelihood function for the given data.

But it’s a matter of judgment that which model will be the best-fitted? The result depends on the value of Î± and it’s critical value of the t-statistic (the p-value). The critical t-statistic value of Î± will decide which model is appropriate for the given data. If the value of Î± is greater than 1 and significant then it is wise to consider the Weibull model rather than the exponential model.

## (c) [Adapted from Wooldridge (2002, Ex. 20.3)]

The problem of estimating the censoring and time varying covariates is not possible to handle by the Ordinary Least Square (OLS) method rather it is addressed by the estimation based on Maximum Likelihood (ML) method. But before going to estimate we should identify the type of process that generates the data i.e. the type of sampling scheme.

The random sample from the inflow to the state is one of the five sampling schemes analyzed in social science. (Jenkins 2005, p. 61)

Given the random sample, let

Xi = Vector of observed covariates

Î¸ = Vector of unknown parameters

N = Random sample size

ti = Length of time

Ci= Censoring indicator

Ci = 1 if uncensored

Ci = 0 if censored

The conditional likelihood observations can be written as

## f( ti | Xi, Î¸) Ci [ 1 – F (ti | Xi, Î¸ ) 1- Ci

where uncensored and censored subjects are in product form. (Cox and Oakes 1992, p. 33)

## (i)

If all observations are right censored, Ci = 0 and hence the log-likelihood function is

## N

âˆ‘ log [1 – F (ti | Xi, Î¸) ] [Equation no – 03]

## i=1

## (ii)

Gompertz model has hazard function q(t, X) = lexp (gt)

where l = exp(b0 + b1X1 + b2X2 + â€¦ + bkXk) and shape parameter g > 0

By definition, survival function S(t) is

t

S(t) = exp ( – âˆ« Î¸(u)du ) [ recall Equation no – 02 ]

0

Now the survival function in Gompertz model is

## S(t) = exp [ – Î» / g exp ( gt ) + (Î» / g ) ]

## S(t) = exp [ (Î» / g) {1 – exp (gt) } ]

And consequently the failure function is

## F(t) = 1- exp [ (Î» / g) {1 – exp (gt)}]

So the log-likelihood function for Gompertz distribution (from Equation no – 03) is

## N

## âˆ‘ log [1 – 1 + exp [ (Î» / g) {1 – exp (gt)}] ]

## i=1

## N

## = âˆ‘ log [exp [ (Î» / g) {1 – exp (gt)}] ]

## i=1

## N

= âˆ‘ (Î» / g) {1 – exp (gt)} [ Equation no – 04 ]

## i=1

## (iii)

In Gompertz distribution when the covariate vector Xi only contains a constant implies that l = exp (b0) where without this condition l = exp (b0 + b1X1 + b2X2 + â€¦ + bkXk). In this conditional case the observed covariates Xi is defined only by the constant term b0.

Hence the log-likelihood function (from Equation no – 04) is

## N

= âˆ‘ (Î» / g) {1 – exp (gt)} where l = exp( b0 )

## i=1

## N

= âˆ‘ (exp( b0 ) / g) {1 – exp (gt)} [ Equation no – 05 ]

## i=1

Given positive value of t and g the value of {1 – exp (gt)} will always be negative and consequently the value of equation no – 05 will be negative. So we can maximise the likelihood function only by maximising b.

But when the value of b â†’ âˆž the exp (b0) â†’ âˆž. So for any positive value of g (nevertheless to mention that t is also positive) the log-likelihood function (containing only constant of covariate vector Xi ) will lead to b getting more positive values without any bound.

We can also rule out the minimisation of log-likelihood function by minimising exp (b0) across b. For the value of b â†’ – âˆž the exp (b0) â†’ 0. The values of b are getting more and more negative and it will go beyond calculation.

Hence, the Gompertz log-likelihood cannot be maximized only for the real numbers b0 and g.

## (iv)

From (iii) we observed that Gompertz log-likelihood cannot be maximised for only real numbers b0 and g. So it is not possible to estimate the Gompertz models from any given flow data when all survival times are right censored. Actually this might be a special case when all data under consideration are right censored and covariate vector Xi contains only a constant.

## (d) References

Amemiya T. (1999), “A note on left censoring”, Analysis of Panels and Limited Dependent Variables Models, Edited by Hsiao, C., Lahiri, K., Lee, Lung-Fei, and Pesaran, M. H., Cambridge: Cambridge University Press.

Cox, D. R. and Oakes, D. (1992), Analysis of Survival Data, 1st edition (Reprinted by University Press, Cambridge), London: Chapman & Hall.

Jenkins, Stephen P. (2005), Survival Analysis (unpublished),

Klein, J. P. and Moeschberger, M. L. (2003), Survival Analysis: Techniques for Censored and Truncated Data, 2nd Edition, New York: Springer-Verlag.

Romeu, Jorge L., (n. d.), Reliability and Advanced Information Technology Research with Alion Science and Technology, Online at

Samartzis, Lefteris (n. d), “Survival and Censored Data”, Semester Project, Winter 2005-2006, Online at < http://ima.epfl.ch/~partovi/project/lafteris.pdf>, Accessed on 08 April 2010.