# C-Index from scratch in Python

--

David Deutsch has this tweet:

If you can’t program it, you haven’t understood it.

If you do some work in medicine, statistics, or even biophysics, you are surely familiar with the C-Index, also called the C-statistics, or Harrell’s index . Frank Harrell introduced it to measure the ability of a model to discriminate patients with different prognosis. This means:

Consider two patients. Patient A lived longer than patient B.

If the predicted survival time for the patient A is longer than the predicted survival time for the patient B, the predictions for this pair A-B are concordant with the outcomes.

The definition of the C-Index is:

Here you can find a wonderful explanation of the C-Index and its interpretation, I highly recommend it.

If you go to the PySurvival website, you will find the following and more refined definition:

where

So, lets program it to understand it!

(Here is how you can type Greek letters on a Linux system.)

**First**, this expression:

`δT = lambda Ti, Tj: 1. if Tj < Ti else 0.`

The function δT takes two arguments: Ti and Tj. These are the survival times.

**Second**, this expression:

`δη = lambda ηi, ηj: 1. if ηj > ηi else 0.`

The function δη takes two arguments: ηi and ηj. These are the risk scores. The patient with a higher risk score should have a shorter predicted survival time.

**Third**, what is δj?

Mathematically, it can be either 1 or 0. So, lets assume for now it is always 1:

δj = [1, 1, 1, 1, …]

Assuming, the number of patients is *n*:

`δj = np.array([1. for i in range(n)])`

**Lastly**, we implement the sum over i and j:

n = 5 # NUMBER OF PATIENTS

numerator = 0

denominator = 0

for i in range(n):

for j in range(n):

numerator +=

denominator +=

The expression x+= 1 is identical to the expression x = x+1.

# Putting it all together

This is now easy. Bur first, we have to define the actual survival times, predicted risk scores, and the vector…