• Krishna Kankipati

Use of Machine Learning in detecting Skin Cancer.

In this article, we will look into the methodology of an ML application used to detect Skin Cancers.

Skin Cancer

Skin is the largest organ in the body. Skin contains some cells called Melanocytes. These melanocytes secrete "Melanin", the dark pigments seen at some places of your skin as "Moles".

These melanocytes may proliferate more than normal to turn cancerous above a limit. Below that limit and above the normal pattern of proliferation of a cell it is Benign.

These changes arising from these melanocytes are classified into two types -

  1. Benign (Naevus)

  2. Malignant (Melanoma)

The Malignant/Cancerous changes in the melanocytes is called Malignant Melanoma

Skin cancer is the most common of all cancers.

Those are classified into two types...

  1. Melanotic Skin Cancers(MSC) - 5% of the total cases

  2. Non-Melanotic Skin Cancers(NMSC) - 95% of the total cases

Malignant Melanoma -

This cancer arises from the melanocytes usually in the skin but can form anywhere, such as in the Bowel Mucosa, Retina of Eye and the Leptomeninges where the melanocytes exist.

  • Here we will be discussing Cutaneous Malignant Melanoma(i.e specifically the Skin Cancer).

Let us learn some facts regarding this cancer-

  1. It is the most aggressive cutaneous malignant tumour.

  2. Although it accounts for less than 5% of skin malignancy generally it is responsible for more than 75% of deaths related to skin malignancy(skin cancer).

  3. Worldwide, Malignant Melanoma accounts for 3% of all cancers. It is common cancer in young adults and is the most likely cause of cancer-related death.

  4. It is largely caused by exposure to UV radiations. More seen in white-skinned races(no offence ;)) that are not suitable for sun exposure.

A condition called Xeroderma pigmentosum acts like a precursor for Malignant melanoma which increases the relative risk of developing Malignant Melanoma to 1000.

The risk factors for developing Malignant melanoma(MM) are:

  1. Xeroderma pigmentosum (relative risk = 1000).

  2. Post medical or family history of MM (Malignant Melanoma) with dysplastic naevi (relative risk = 22-1269).

  3. Previous Melanoma (relative risk = 84).

  4. High total number of naevi (relative risk = 3.4, if >20 naevi).

  5. Dysplastic naevi (10% lifetime risk).

  6. Red hair (relative risk = 3).

  7. Tendency to freckle (relative risk = 1.9)

  8. Immune compromised conditions: HIV infection, Hodgkin's disease, cyclosporin A therapy.

  9. GCPN (increased risk).

  10. History of sunburn - especially in childhood.

Sometimes naevus (Benign) may turn into MM: 10-20% chance.

How can we identify this?

Observe the following changes in your naevus-

  • Change in size - any adult naevus > 6mm is suspected(for reference a lead pencil diameter is 7mm) and anything changing to >10mm is more likely to be malignant than benign.

  • Shape - irregular or any changes

  • Colour - any changes

  • Surface(nodularity or ulceration)

  • Satellite lesions(discrete pigmented areas spreading away from the primary)

  • Tingling/itching/serosanguinous discharge(usually late signs)

  • Blood supply: melanomas > 1mm thick have a blood supply that can be found with a hand-held Doppler monitor, so 'Doppler-positive' pigmented lesions should be excised.

Sites where you may observe this cancer-

  • Head & Neck - 25%

  • Trunk - 25% (most common in males)

  • Lower Limb - 25% (most common in females)

  • Upper Limb - 11%

  • Other sites - 14%

The other sites include...Eye, Bowel, Genitalia, Nasopharynx etc.

Clinical Types of Melanoma:

  1. Superficial Spreading Melanoma

  2. The most common type of MM(70%).

  3. Most likely arises from a pre-existing naevus(benign).

  4. A better prognosis(recovery).

  5. Spreads superficially.

2. Nodular Melanoma-

  1. Most aggressive type.

  2. Poor prognosis.

  3. Grows into the skin vertically.

3. Lentigo Maligna Melanoma-

  1. Presence of slow-growing, variegated brown melanoma.

  2. Positively correlated with prolonged and intense sun exposure.

  3. Least Malignant.

4. Acral Lentiginous melanoma-

  1. Least common.

  2. Poor prognosis.

  3. Affects the soles of feet and palms of the hand.

Okay, how does this cancer spreads?

This cancer can spread through...

  1. Lymphatics

  2. Blood

  3. To various organs like Brain, Lungs, Liver etc.

  4. Can also spread from mother to fetus.

Awesome, we discussed a lot! Let's dive into the application of Machine Learning to detect Skin Cancers.

The machine learning applications use images of skin lesions and then classifies them. At present many state of the art Deep Learning applications are being used to assist dermatologists.

An Australia-based AI software, Moleanalyzer, calculates and compares the size, diameter, and structure of moles. This tool helps in differentiating between benign and malignant lesions. FotoFinder

Here we will look at a simple application to classify skin lesions...

The dataset is taken from the ISIC (International Skin Image Collaboration) Archive. The International Skin Imaging Collaboration: Melanoma Project is an academia and industry partnership designed to facilitate the application of digital skin imaging to help reduce melanoma mortality.

The dataset consists of 1800 pictures of benign moles and 1497 pictures of malignant classified moles. The pictures have all been resized to low resolution (224x224x3) RGB. We will look at how a ML model classifies a Dysplastic mole into benign or malignant. 

So, we can say that we are having 2 different classes - Benign and Malignant. 

We are using Convolutional Neural Network to classify the images and then analyse the model performance.

CNN models use train images to learn the features used for classification. The learned features/parameters are then used on test images to validate how well the model is working.


  1. Data Collection

  2. Data Preprocessing

  3. Model Building 

  4. Training 

  5. Testing and Model metrics

  6. Prediction

1. Data Collection-

We are going to use the above mentioned dataset to build the model. The dataset directory has the following structure:

data -

- train 

- benign (1440 images)

- malignant (1197 images)

- test 

- benign (360 images)

- malignant (300 images)

2. Data Preprocessing-

Human brain can identify images easily. But the machines work with numbers, i.e in their view everything is a number of an array of numbers. So, how can a machine view an image? Any guess?

Yes, the images are nothing but an array of pixels. The resolution of each image in the dataset is 224 x 224 x 3, therefore there are 150528 pixel values.

Sample Images from the dataset

Let us look at an example how machines view the images:


In real-world the images may have different orientations, which affects the models’ performance. Thus we will be using image augmentation techniques on the images, so that the model can extract prominent features. 

Look at the following example -

Image Augmentation Code
Images when Augmentations are applied

3. Model Building-

CNN is a type of neural network mainly used for Computer Vision tasks like Image Classification, Image Recognition and many more.

The above model is having 4 Convolutional Layers. From the 1st layer simple features like vertical lines, horizontal lines, curved lines, etc are learned. While going deep into the upcoming layers the model ignores irrelevant features and learns the prominent features.

Convolution - A method used to produce a feature maps - an array of image features

MaxPooling - Used to select prominent features from the feature map

Dropout - A regularisation technique which randomly drops the nodes

At the end, the output neuron generates either 0 or 1. Where 0 represents Benign and 1 represents Malignant.

4. Training- The model is trained on train images.

5. Model Metrics - 

During the training process the model learn the important features by reducing the loss factor and thus increasing the accuracy factor

We can see that training accuracy increases from 0.55 to 0.83, whereas the testing(validation) accuracy increases from 0.5980 to 0.7920.

The training loss is reduced from 0.7031 to 0.3051, whereas the testing loss is reduced from 0.6624 to 0.4051.

i.e we can say that this model is 80% accurate.

6. Predictions - 

Let us use a benign tumour image to check how the model predicts

Well, the output 0 means, it is benign. Hmm, the model predicted truly.

Let us look how the model predicts an image consisting mole

Hmm, great it predicted as benign.

This is a basic model used to give an idea how a ML model is built and used to predict Skin Cancers.

Use of these models:

- Melanoma when recognised and treated in its earliest stages, is readily curable.

- Many research groups are working on building ML models to predict the skin cancers accurately.

Predicting at a higher rate doesn't mean there's a 100% chance of occurrence. The prediction is just a probability, one shouldn't solely depend on these models.

Therefore the traditional way of diagnosis by a physician and a pathologist through microscopic and macroscopic features is final.

Investigations-to confirm it as a cancer, to know the spread and the extent of spread:

  1. Excision biopsy of the tumour.

  2. FNAC of Lymph node.

  3. Sentinel lymph node Biopsy (SLNB).

  4. Dermoscopy - Early cases.

  5. MRI of the site.


The Breslow thickness of the primary tumour offers the best correlation with survival in stage I disease. The higher the mitotic index, the poorer the prognosis of the primary tumour; this has greater significance than the presence or absence of ulceration.

- Alexander Breslow, 1928-1980, American Pathologist.

The presence of lymph node metastases is the single most important prognostic index in melanoma, outweighing both tumour and host factors. The number of affected nodes and the presence of extra nodal extension are also significant outcome predictors. Once regional nodes are involved clinically, 70-85% of patients will have occult distant metastases.

Resources and Reference

Data OilSt.