Monitoring immunohistochemical stain quality with artificial intelligence
Transcript
Hello everybody, welcome back to our masterclass series on the challenges of IHC staining. I am Bettina Winkler from Visiopharm.
Today, Paul van der Diest from the University Medical Center Utrecht in the Netherlands will present the results of an extensive IHC stain and stainer evaluation performed using polytopics.
Paul is a board certified pathologist and is head of the pathology department in Utrecht since two thousand and three.
You will answer your questions after the talk, so please add any questions to the chat.
Paul, please go ahead with the webinar.
Good afternoon. Thank you all for being here in this webinar and what I’d like to do today is hear some experiences that we have had over the last couple years with monitoring immunochemical stain quality with artificial intelligence.
And first, a few disclosure. I’m a member of the advisory board of a few companies, and we have some collaborations with different companies, but all the money goes to the department. So I have no personal benefit from any of these activities.
So basically, what this is all about is that more and more systemic therapeutic strategies become available, like hormonal therapy, chemotherapy, antibody therapy, small molecules, and also immunotherapy.
And to make sure that the right people get the right treatment, the right very expensive treatment, we’re using biomarkers to select or sometimes deselect patients for this type of therapy. Now if we do that, we get an optimal balance between effectiveness of the therapy on the one hand and cost and side effects on the other. So this plays a very important role in the daily practice of pathology.
There’s molecular biomarkers that we use, like, a lot of different gene mutations, just to name a few. ECCM receptor, HER2, BIG3CAC kit, EGFR, and many more genes, but also tumor mutational burden and microsatellite instability that play a role in selecting patients for immunotherapy.
And BRCA and homologous recombinant deficiency also for some types of chemotherapy and PARP inhibition.
And then there is a lot of immunosychymosical biomarkers that we use a lot in daily pathology practice. There’s a long list. I’m not gonna mention all of them, but I’m quite sure that some of these will be very widely in use, like the breast cancer antibodies, HER2, ER, PR, AR, and the microsatellite instability markers. So these antigens can be detected with, immunohistochemistry, and we do this a lot. And sometimes in practice, there are a few problems with that. This means that not all the stain quality is always optimal across laboratories, and we have some reproducibility problems in assessing the amount of staining that we get, in our immunohistochemistry.
Just a few few examples to make that point for HER2. This is our own results from preparing different laboratories in the Netherlands. And you can see that there are some labs with only five percent, three plus HER2 cases, and then there are labs with twelve to thirteen percent positive cases. And this is a difference which is, of course, not acceptable. For ER, you also see quite some differences, like one lab that didn’t do a lot of staining, as you can see here, that had only seventy seven percent ER positive cases, and were a few labs that went over ninety percent. So this points to systematic differences between labs when it comes to staining of these two antigens, and that, of course, is not acceptable.
This is another example where you can see some variations from, different runs within the framework of quality control. This is, the Nordic where you see that insufficient staining quality was present in the runs for thirty two percent of the labs. And this is PDO one where you also see that there is quite some variations in the EQA run.
And another example, you see here the pass rate within these external quality control rounds, and you can see that that varies a lot. In the beginning in two thousand seventeen, it was way below standard with only one fifty laps passing the quality control threshold. And you see over time that it’s it still varies a bit. It goes up quite nicely, but in the last year, there was again a drop. So this points to the fact that immunosy chemical staining is not just reproducible and stable technique in clinical practice, but there can be problems that need to be detected and solved.
One of the big advantage of AI could be that this can help to solve these type of problems. I’m quite sure that you’ve seen many examples of this, the Go player that is pretty desperate after losing to the AI, the Google self driving car in which AI plays a very important role, chat GPT. I’m sure everybody has tried this. This is amazing what this produces.
And, also, this program, mid journey where you can produce amazing pictures by just putting in some text. So what about pathology? This this is just, you know, breast pathology for all the other tumors. There are, you know, similar or perhaps even different types of applications.
But you see the many, many areas of application that AI could find just for breast cancer. I’m not gonna mention them all. But you see that one of them also for breast is stain quality control.
So let’s focus on that a bit more. So I think that we all know that the quality of immunohmmeschemicals stains may vary and that for the, you know, mentioned biomarkers and therapeutic targets.
So this can provide big problems. This is just a visual example. Right? This is PDO one.
On the left and right, you see the same cell line, stained with the same stainer, with the same antibody in the same lab. And it’s quite feasible that there are, you know, big variations in the PDO one intensity, and that’s, again, not acceptable. And and this is not unique. All of us experience some stains that have to be redone because we can visually detect that there is a problem.
Now that’s probably not sensitive enough, so let’s look a bit more into detail and dig a little bit deeper.
We’ve been experimenting with the Qualitopics approach in my visual form. This is based on cell lines with a standardized range of expression of a certain protein. What you see on the top right here is an example for HER2, where there are four cell lines that have been constructed into a tissue ray that you mount on a slide as you do with normal control tissue that is derived from your own lab. And one core is zero, one is one plus, two plus, and three plus. You mount and stain these control cells with the tissue of interest, then you do the staining, and you measure the quality of stain over time, and you depict the results graphically. Now let me give you one result what happened, when we run this for a while with the, HER2 staining that we did.
So what you see here is a graph where, you know, the different days where we did these measurements are depicted, and you see that this is the zero or for her two, and you see there is some spikes. Some spikes were even. The zero goes up to the one plus range.
You see the same problem is present for one plus. It’s most of the time quite stable, but in the beginning here, we had some spikes up to the two plus region, and that can prompt reflex testing with insight of hybridization, for instance. The three plus was most of the time quite stable, but you see the two plus was basically quite unstable all the time.
And we run this through the Nordic quality control assessment and we scored poor as you can see. So what we then did is we replaced the antibody. After that, you see that the results were much more stable, almost flat lines for the three plus, two plus, one plus, and zero plus. We did the Nordic control again, and we scored optimal.
So I think this shows that you can detect the problem, you replace the antibody, and then you fix the problem. So we decided to dig a little bit deeper and try some more experiments because we were not really happy with what we found, and we wanted to find out if perhaps in the stains, there were also some problems. So we did a single center study in our own lab here in Utrecht, and we used the five different benchmark ultra stains from Ventania. And we did most of the experiments on HER2 and some in on PDL one.
First of all, we, again, quality checked the material that we had, the tissue ray block with the four cores by FISH. It has been validated by FISH for the company, but we decided to do it again. And you see a very nice range of spots in the the zero, the one plus, the two plus, and the three plus cores. So this block was good, and we decided to use it further.
In the experiments, we cut three micron thick ribbons, and we made sure that from each experiments, the sections that we cut from the ribbons were very close together to reduce that type of variation as much as possible.
We did all the tests with the Fantana stainless that was calibrated before by the company first before we put them into practice. And, also, we do regular quality control within the framework of our own quality system. So, basically, these these are machines that we use every day, and that should be good. Also, in the different Nordic quality control runs, we scored optimal ratings for different antibodies, so we had no indication that perhaps there was a problem with these machines.
The slides were scanned with a Hamamatsu scanner to reduce variation within the lab. So first of all, we decided to do manual staining to make sure that it’s not us. Right? That it’s perhaps something in the lab, that it’s the water or, you know, the people that are doing stuff.
So we did manual experiments for her too without the machines, and you see that the variation between the different causes minimal. So I think that excludes that we have any inherent problem in the lab with this type of antibody.
And then we did some more variation experiments. So we repeated the HER2 experiments.
And for all the five machines that we have in use, we use the same slot in all the different states. Slot number fifteen, again, to reduce variation.
When we depicted the results, what you can see on the bottom right, there was quite a minimal difference for the zero and the three plus scores between the different staining, but you can see that the reds, the two plus bars show quite some variation, which goes from, like, twenty five perhaps up to, like, seventy five, and that was way too much. And two of the stainless, they were even spiking towards the three plus range.
So we decided to dig a little bit deeper and did some experiments to see if there is variation between the different slots in an individual stainer. So we took one stable and one unstable stainer and positioned the slides in different positions within the stainer in in both stainer is the same, position one, five, ten, fifteen, twenty, twenty five, and thirty.
The stainer that was previously unstable showed more variation between the slots for the one plus and the two plus scores, and the stable statement showed stable results for all the different cores, as you can see depicted here. So here, the red bar, for instance, is much more stable in this intra run variation, and here it’s way too variable for especially the two plus. And we tried if there would be a difference if we would position the tissue ray core on different positions on the glass slide. So near to the label, in the middle of the slide, and at the end of the slide. We again took one stable and one unstable stainer, and the previously stable stainer showed now quite some differences for the two plus core up to thirty percent, and the unstable stainer showed less variation for the two plus core. So that was a bit surprising.
And the unstable stainless showed much higher two plus core intensity that was actually close to the three plus region.
We also visualize these results just to make it a little bit more, you know, apprehensible for the pathologist.
Strong day, normal day, weekday for HER2. And you can see that that this is her own control, that there is some variation. This is the three plus cell line, which is fairly stable even on a weekday, but you can see that for the two plus cell line. You know, there is big differences on a strong day. It’s really nicely in between one plus and two plus, but on the weekday, it’s way less than the three plus and there’s hardly any difference with the one plus and the zero.
We concluded that we had a problem. Right? And we talked with Defender, and this prompted a full scan of the different stainless and extra maintenance. And funny enough, there were some problems detected that we could solve, like fixing a clogged four x mixer, aligning the piston that pushes the antibody and the dApp on the tissue slides, and reducing the amount of fluids that we push over the slides to wash them.
And after maintenance, you see that the results were much more stable. Just look at the red bars again for the two plus, and you see that after maintenance, the results were much better than before. So that actually shows that you can detect some hidden problems within these machines by having this technology in place. Just to make sure that these results were not unique just for HER two, we decided to repeat some of the experiments for PDO one.
Essentially, we got the same results. On the left, you see the variation over time. You see that for the three plus, there is some variation. For the two plus, there is some variation.
And even for those zero and the one plus, you see these lines here. There’s little variation. But probably this variation where, you know, the two plus and the three plus are within the same range almost, from this moment onwards. That’s unacceptable.
This is, the experiment repeated for PDO one with the different slide positions. And, again, for especially the two plus score, you see there is way too much variation between the different positions within the same statement. So that makes make sure that it’s not just HER2. This also applies to PD one and probably many other antibodies that you might use in your clinical practice.
Again, to visualize this, you see here strong day, normal day, and weekday. And, again, especially for the two plus, you see that there’s way too much variation between a normal day and a weekday.
So we have this in place, and we do this routinely now. This is part of a big roadmap of AI implementation that we are very diligently working on in in the UMC you tracked. And you see that we have different AI applications in place, the key sixty seven from etcetera, a mitosis counting algorithm that we made ourselves together with the technical university in Eindhoven.
Till counting that we’re working on ourselves and lymph vascular invasion that we’re working on ourselves. And we have the whole suit from Visiopharm in place, the quality topics, the ERP or HER2 k sixty seven, biomarker applications. We routinely do lymph node metastases finding, and PDL one has recently been added to this arsenal. Also for skin, we have an application for prostate cancer and lymph node metastases. Finder. We work with space where we have some nice results.
And, also, with the DeepMed company, we’ll work on implementing a lymph node metastasis finder. So it’s quite an extensive program that bit by bit we implement in clinical practice, and body topics is now part of that for almost two years.
So to come to a conclusion, I think I’ve shown that the quality topics AI algorithm enables monitoring stain quality by measuring expression of engineered control cells.
In our hands, this detected some worrisome variation over time between strainers, between slots within the same stainer, and also with regard to slide positions of the control cells.
These were problems that we have not detected before. So I think this shows this that AI can detect this malperformance better than the human eye. And in our hands, this prompted replacing antibodies and extra maintenance that truly helped to solve the problem. So I think this is an important development and an important area of application of AI in the pathology practice.
So where do we go from here?
One important question, of course, is how clinically relevant are all these variations?
Honestly, we’re not really sure.
Clearly, some of these variations may not be clinically relevant because they don’t classify a patient into a different category. But I think that I’ve shown at least some examples where the the different, classes spike into each other region, and this may may lead to a false result for an individual patient. So I think we should try to avoid this type of variation. What we would like to do is similar experiments for other type of stainers because by far, I think this is unique for the Fontana stainers.
I’m not aware of any other published experiments on on other type stainers, but I know they are ongoing in different places of the world. And I’ve heard, not seen. I’ve heard that variation detected was the same. So I think it’s it’s gonna be in a while quite clear that this just doesn’t hold only for the Fontana stains, but probably for all the different other stains that are on the market.
I think we’re gonna need to control engineered cells for all the therapeutic protein targets that we have. There are different ones around now, but not for the full arsenal that I showed you in the beginning. So I hope that the companies will work on that to provide also for the other therapeutic targets, these type of engineered cells. And what we’re really thinking of is not to do this experiment just once a day with one slide of control cells, but to actually mount these engineered control cells on all the slides from individual patients where we do a staining.
Now the cost of that are about five euro per slide. We do about sixty thousand immunostains, not all therapeutic targets. Right? So but but suppose that we have, in the end, control material for all the antibodies that we use. I think for now, we will be limiting ourselves to the therapeutic target. So then this is a clear overestimation.
But in the end, we could end up with having control cells and the cost for all these different staining, which would be in my lab about three hundred thousand euros. And somebody needs to start paying for that because I think it’s probably worth it to do it to make sure that we have optimal immunoceochemical stains for all the different antibodies that we use.
So let’s see what happens.
So thanks to especially Sven van Kempen. He was the technician who did all the experiments. So I’m really grateful about the the hard work that he put in. And, of course, we are grateful to the companies that have collaborated, not officially sponsored this study because this was our study. But, of course, we used the machines. We used to Visiopharm algorithm and Roche provided us with some of the extra kits that we needed to perform some of these experiments. So they are also thanked for helping out to to make us complete the study.
So this is, what I had in mind to tell you today. I think it shows that this is a very nice area of application of AI, and this could really help us to improve our, daily business of performing immunoskeletal cold stains in our pathology labs. Thank you very much for your attention.
Okay. Thank you very much, Paul, for this talk.
Could you switch on your camera?
Hi.
You’re mute?
Yeah. Thanks.
Okay. So I would welcome everybody to put questions into the chat so that we can, discuss with Paul.
So that was a very extensive, experiments that you performed in your lab.
Guys, it was quite some work. But, again, Svend von Kempven, did all the hard work that I’m presenting here today. But, yes, altogether, he’s been busy with this, for quite some time. But I think it yielded some nice results that I’ve learned us a lot about the potential application of AI and monitoring this type of staining quality.
So it’s been very fruitful, and I think we’ve learned a lot.
Yeah. That comes the first question.
What is the biggest source of IFC stain variability? Human error, reagent variability, stainer errors, or something else?
Oh, that’s a hard question.
I cannot just answer that, of course. That, that depends a lot, on on especially the type of antigen. Right?
You know, for some of the the well known antigens that we’ve been stating for a while, for, like, you know, decades now, we have very good antibodies that we didn’t have in the beginning. Right? So I think that the problems are much less at this very moment for antibodies like ER, PR, and HER2, just to name a few.
So, you know, over the years, they have improved. So I think they’re probably the biggest variation is, is between labs.
What I’ve seen for key sixty seven, just to give another example, is, is pretty dramatic.
Once we did a study where we took a tissue ray and we sent it to different labs, we have them stained locally. And then, one observer counted the number of positive cells, and they could easily vary from on on the same tissue core from zero to thirty percent. So that’s partly the lab, at least.
For some things, the human eye is not very good, I think, or things are simply difficult.
I always have problems with heterogeneity. I’m sure you have them too.
You know, some parts stay well, other parts do not stay very well. So how do you come to an overall result?
PDO one is not so easy to interpret.
The CPS score is quite difficult because you’re supposed to count non tumor cells which are positive, which is a lot of work and not so easy. So it varies, from antigen to antigen, but I think I’ve mentioned a number of sources where it can go wrong.
Indeed, the human eyes, sometimes the stainers that provide problems, sometimes the antibodies, sometimes the tissue processing in your own lab, and and, of course, human error.
Okay. Yeah.
Next question we have. With your experience, do you have advice for labs that want to improve their quality? Where to start and how or what are the low hanging fruits?
I mean, this is not so easy to just implement, and it’s costing money. Right? So I can well imagine that not all of you are jumping on implementing quality topics tomorrow.
So probably is is having a quality system in place in your own lab is step number one. I think that’s quite normal these days. And participating in external quality control rounds like UK UK NECWIS and the Nordic is probably, the second logical step.
This is maybe the third step for labs that are interested and and see the importance of this, this application of AI.
So, yeah, probably, this is how you move in in in monitoring the the quality of your stains. But we have in the lab that all the stains are being quality checked by the technicians before they are being scanned. We work digitally before they’re being scanned and and sent to pathologists and residents. So but I I think that’s quite usual over the different labs to to do this kind of thing. So I I I doubt that that’s a that’s a very special thing only in a few labs. So there’s different ways that you can work on on the the quality.
Using ready to use antibodies is is often a good idea because they don’t have the dilution problems that you may have and all the different quality control runs that you need to do in your own lab to make sure that these antibodies are working well.
To have antibodies with CE IVD or FDA approval is a good idea because they’re supposed to be quite stable and work very well across labs. So these are a number of things that you can think of to to monitor and improve as much as you can the quality of your IHC in your lab.
Mhmm. Yeah. Sounds good.
Next question we have for Michael. Thank you for a wonderful presentation. Can you perhaps comment a bit more on the business case for adopting continuous quality control with the current levels of budgets for pathology labs?
Yeah. That’s a very important and at the same time, very different question.
You know, people like Sven, the the technician that did the did the job, he’s now kind of pushing me to sort of stop with their own quality control for the antibodies for which these analytic control engine itself are available. And so I’ve now challenged him to to make me a calculation how much more that’s gonna cost.
I’m very much in favor of continuously improving the quality of our IHC, but that doesn’t mean that we just have the money for that. So if we start implementing this more routinely, so not in batch mode because the results that I’ve been telling you is like a batch thing we do on the days that we do this. We stay in one slide.
When we switch to single slide mode and we we mount these control cells on the slides of the patients where we do the actual staining, that’s gonna be five euros per slide approximately.
So if you calculate that the consumable cost for a single stain is about five euros as well, that means that the consumable costs per stain are almost being doubled if you start doing this in single slide mode. And that’s a serious thing that could cost easily for the different therapeutic targets where, of course, this is most important up to perhaps ten or twenty thousand euros a year. I’m not sure because I’m awaiting the calculations, but it could be that that much.
It’s not gonna be like we’re gonna get a lot of extra money to do this. I think the patients, the doctors that we question, have a clear interest in having an optimal result, especially for the therapeutic targets on which these very expensive new therapies are being based. But but are they gonna give us the money for that? Well, not in my place.
So somehow we have to to try to realize this, and that that’s a bit of a problem. Yeah. So I think the business case is is not easy because when we do a better job, we make sure that the right patients get this the best treatment and the right treatment, but that doesn’t mean that that money that we save elsewhere will just flow to us. It’s not like that in most, institutions.
So the business case is a bit of a problem.
Mhmm. Okay. Next question from Sonia. Thanks for great presentation. How much was the workflow affected by adding quality topics?
Well, Sven’s not here to answer that question for me. Not much in this stage, because we did it in batch mode.
Again, oh, Sven is actually He is.
Yeah.
Available. So maybe, Sven, can can you turn on your camera and and your mic and answer this question for me, please?
In a minute.
I’m allowing him.
So then you should be able to Yes.
If it’s okay you can hear me?
Yes.
Yeah. So we have, one technician, from the diagnostic immunohistochemistry lab that’s doing weekly uploads now.
So it’s just cutting a few slides from all the markers that we have in Qualitopix, once a week at this point. After the study, we just continued with it.
But we are working on the automatic pipeline. And as soon as that’s working, we are gonna switch, hopefully, to the cell lines on every patient slide that we have. And then if there are automatic uploads, then the workflow flow doesn’t change that much because you just see the results within an hour.
Yeah. So you can really check the quality of your standings in the same day.
But the workflow didn’t change that much.
I think, essentially, when we switched to single slide mode, the workload is not gonna change that much either.
I mean, we mount a tissue ray control on every single slide that we stain for an individual patient. Right? This is homemade controls that we make. So we produce these tissue arrays ourselves. We have to carefully select the right material, for that.
We have to do quality control checks of the tissue arrays and so forth.
When we start mounting in single slide mode, these engine itself, we will basically do the same. It’s a tissue array, right, in this stage. Although companies are working on beads with the same principle, with standard amounts of, of protein, glued to these beads, so then it might be a little bit different. But in this stage, the the dish array block with the engineered cells will have to be cut and mounted with the single, slides from the patient. So, essentially, in in in that situation, the workload is not affected that much.
Oh, that’s good to hear. So the next question is from Sabrina. Thank you, Paul, for the nice presentation.
Have you Maybe maybe I can add something.
When we are discussing the business case. Right? If we start doing single slide mode with this engine itself, we can skip our own controls.
Right? So it’s not just that we add things that are gonna cost money, but we will lose some work that’s also costing money. So we will save at least a bit of work and consumables for all the quality control. I’m not sure how much in this stage, but at least in single slide mode, we we do something less and not just something extra. So, you know, I think it’s probably worth it, but we have to do some calculations to see where we end up.
Yeah. That’s a very good point. That’s also good to have less work.
Coming back to Sabrina. Have you implemented in your laboratory cell line controls a replacement to tissue controls for antibodies like ER, PR, HER two, MMR?
For example, cell line controls from histo side.
Thanks. We use our own home brew tissue race, that we use as controls.
So that’s it. No no cell line material.
Okay. So that’s only the ones that you use with polytopics then, I guess. Yeah. Which is also the ones from HistoCyte, actually.
Okay.
Any more questions?
Yeah. Mads has another question. What is the downstream effect of high stain variability?
I know it’s hard to answer, but do you get the impression that it affects patient care, or are the errors caught before it gets that far?
Well, it can. I I probably you all do. The technicians, they check the stain quality of each individual slide. Right? And they compare it with the control tissue rate. So I think that most errors we catch.
The variation that we see to which extent is that clinically relevant. I I’ve been a bit careful about this. Right? Because I’m not convinced that it has a lot of clinical, implications.
Even though we sell some variations for her too, which I showed, you hardly ever see it spiking into another region. That’s that’s an exception.
So I think it sometimes may happen.
We’re not sure in this stage how many variation we will detect if we are gonna switch to single slide mode with these control cells. But I think the AI will help us to catch that variation really quickly because as Swayne pointed out, we will get results back in, like, an hour.
And, you know, maybe, and I’m not sure that Sven’s gonna like me saying this, but, maybe if we implement this, we can skip the quality control check, the visual quality control check by the technicians.
And that’s a lot of work.
So then we save technician time. So maybe there’s a bit of return on investment. So I think we should still do this, and we have to do further studies to really assess the clinical implications of the variations we find.
I I think you’ve seen the pictures. Sometimes the even the visual difference in staining is quite high, and these are problems that we would probably detect by the human eye as well. So clearly, further studies are necessary.
I think the companies like what we found.
At least Roche, they have been very open to discussing and interpreting these results with us and very open to doing extra maintenance. So I think also, the Roche company sees this development as an opportunity to monitor the the performance of their staining machines and and to prompt extra maintenance if the problem is being detected as we did in this study. So I think that’s good, and I would like the other companies that produce these type of staining to start doing these studies as well for their stainers because I’m quite convinced all the stainers will have similar problems.
And I would say thank you to Paul, and have a nice evening. Bye.
Thanks for being here. Bye bye.
UMC Utrecht has established an advanced digital pathology set up in their pathology lab. In this webinar, Professor van Diest will present the results of an extensive IHC stain and stainer evaluation performed using Qualitopix. The quantification of stain consistency enabled the lab to significantly improve their stain quality for HER2 and unveil staining variations based on their stainer.
Paul J van Diest, Professor, UMC Utrecht, The Netherlands
Paul J van Diest studied Medicine and did his PhD and pathology residency at VU University Medical Center in Amsterdam. After obtaining his Board certification (1996) he became Consultant Pathologist, Associate Professor (1999) and full Professor (2001). Since 2003 he is Head of the Department of Pathology at University Medical Center Utrecht. This department has gone fully digital and at the moment is involved in AI-research and implementation. He is Adjunct Professor of Oncology at the Sidney Kimmel Oncology Center at Johns Hopkins, Baltimore, USA, serves on the editorial board of international journals, and has been active in several international societies.