A preliminary report, from the Joint PHE Porton Down & University of Oxford SARS-CoV-2 test development and validation cell, evaluates lateral flow viral antigen detection devices (LFDs) for mass community testing.
Prof Sebastian Johnston FMedSci, Professor of Respiratory Medicine & Allergy, Imperial College London, said:
“The headline of the press release is very misleading – it gives the impression that all lateral flow antigen tests have high-sensitivity following extensive clinical testing by PHE/Oxford. This is very far from the truth.
“31/40 devices were not evaluated in any detail as they failed initial testing – the precise fail criteria are not provided in this report but are assumed to be very poor sensitivity, and/or positive results with seasonal CoVs (poor specificity) and/or a high test failure frequency (failure to give a valid result).
“Only one of these 40 test devices is reported in any detail.
“This means that people buying such tests from online providers will have an 80%+ chance that what they are buying performs very poorly indeed.
“One single manufacturer’s test (the Innova SARS-CoV-2 Antigen rapid qualitative test) has results reported following extensive clinical testing.
“This single test had moderate sensitivity when used by trained healthcare-workers (73%). It had only 57.5% sensitivity when used by members of the public.
“The high specificity of the single test reported (99.68%) is relatively good news as it means relatively few people who do not have SARS-CoV-2 infection will have to isolate unnecessarily. That is still 32 people for every 10,000 people tested having to isolate unnecessarily. I would not wish to be one of those 32.
“The reported 95% sensitivity of that single test is only when used by a trained healthcare professional and only above virus loads of 100,000 RNA copies/mL. That’s a lot of viral RNA.
“Between 1,000 and 100,000 RNA copies/mL, when used by a trained healthcare professional, it only picked up 78%. Below 1,000, but still positive by PCR, when used by a trained healthcare professional, only 30%.
“Only 43 tests on asymptomatic people were performed, so this result is very preliminary.
“Yes it will be nice to have a single rapid test that can quickly detect people with high virus loads and get them isolated rapidly. This single test will not be good enough to say you are almost certainly negative, as its sensitivity is not good enough, especially in the hands of the general public.
“Regarding the other 39 of the 40 tests, most (31) of the tests failed basic performance characteristics and were not assessed further, 8 tests are still being evaluated with results so far unknown.
“That is the main message for me.”
Prof Jon Deeks, Professor of Biostatistics and head of the Biostatistics, Evidence Synthesis and Test Evaluation Research Group, University of Birmingham, said:
“The report from the Joint PHE Porton Down and University of Oxford SARS-CoV-2 test development and validation cell summarises the studies which have been undertaken to evaluate the accuracy of the Innova lateral flow assay which has been implemented in mass screening in Liverpool, with planned roll-out to Universities and other locations across the UK. I have had the benefit of discussing the report with the Chief Investigator, Prof Tim Peto, from the University of Oxford, who has clarified its content, and indicated some data are not yet complete and further updates and a more detailed paper will be forthcoming. It is clear that the evaluations and reporting have been done under exceptional time pressure, with Government purchasing and implementation decisions happening before completion of the report.
“The Test Development and Validation Cell has outlined the first three Phases of their evaluation process in a protocol available on the DHSC website. This report includes data from studies in Phases 2 and 3, plus additional field studies in a previously undescribed Phase 4. Many of the studies listed have used the test on stored samples, samples in saliva, and with tests undertaken by technical experts in the Porton Down laboratory or have not verified the Covid status of the participants. Although such studies are important steps in the pathway of validating the test, they do not provide evidence that is generalizable to its application in the real world and are likely to overestimate the accuracy of the test.
“Two studies in the report provide more generalizable data, although neither has focused on recruiting asymptomatic participants. The FALCON-C19 study included patients with COVID-19 with tests either done at the bedside or a testing centre by a clinical research nurse or sent to laboratory experts at Porton Down. The second study was done at a Pillar 2 testing centre, with a person experienced at taking swabs following instructions in how to use the test – this setup is most comparable to how we expect to see the tests used in mass testing scenarios. Results from the lateral flow assay were compared with PCR results in both studies.
“In the Pillar 2 testing centre study 58% of the COVID-19 cases were detected by Innova, although the uncertainty due to sample size means the rate could be as low as 52% or high as 63%. In FALCON, testing at the bedside by an experienced nurse picked up 73% of the COVID-19 cases with the uncertainty giving a possible range of 65% to 80% for the detection rate. Those in FALCON where samples were tested at Porton Down had a higher detection rate of 79% illustrating the importance of evaluating the test in the setting where it will be used rather than in an expert laboratory. The overall headline figure in the report of 76.8% mixes in the accuracy in these different settings. In summary the data presented in the Test Centre study and FALCON suggest the test appears to miss between one out of every two cases of COVID and one out of every four cases of COVID.
“The report does not evaluate anything about the infectiousness of the cases. Despite assertions in Government announcements that the test is able to identify those that are infectious, there is no assessment made to identify whether the cases detected are infectious, and whether those that are negative are not infectious. Scientifically it is exceptionally challenging to assess whether a case is able transmit the virus to others – there is no evidence presented supporting these claims, and they appear misleading.
“The Press Release and other comments focus on the higher accuracy of the test in people with “low Ct values”. These are in people where the PCR test had to go through less amplification cycles to detect the virus, indicating that the swab contained more viral matter. Ct values are not precise, they are not standardised measures, and depend on how well the swab was done. They do not provide a reliable way of indicating whether people are infectious.
“The rate of false positives was similar across the studies: in the Test Centre study it was 0.4% (or between 0.2% and 0.9% accounting for uncertainty). Whilst this is a very low rate, the numbers of false positives can still outnumber the number of cases detected when used in mass screening where very few will have the virus. For example, if 100,000 people are tested in a city where the prevalence of Covid is 400 per 100,000, assuming the Innova test has a sensitivity of 58% and specificity of 99.6% (as per the testing centre performance), these figures predict the test will give 630 test positives: however only 230 of these will have Covid, 400 of them will be false positives.
The report also shows concern about test failure rates – instances where the control lines on the test device failed – which occurred in one setting in 16% of tests, indicating that there may be problems with some batches of the test devices that have been purchased failing.
“Likewise, the poor detection rate of the test makes it entirely unsuitable for the Government’s claim that it will allow safe “test and release” of people from lockdown and students from University. As the test may miss up to half of Covid cases, a negative test result indicates a reduced risk of Covid, but does not exclude Covid. Independent evaluations for WHO have shown other lateral flow antigen tests are likely to outperform Innova, but even those do not have high enough sensitivity to rule out Covid. The Innova test is certainly not fit for use for this purpose.
“Identification of any previously undetected cases may help reduce transmission, but only if those cases and their contacts can successfully isolate, and nothing untoward happens to increase transmission. Currently little information is given to those being tested in the pilot. It is particularly important to inform those with a negative result that Covid could have easily have been missed – should they be falsely reassured and consider, for example, visiting elderly relatives, greater harm could easily occur.
“It is of immense concern that the Moonshot plans have not undergone any scientific scrutiny by experts such as our National Screening Committee. This is the first evidence that has been provided to allow any scrutiny of the technology acquired for Moonshot – and it raises serious concerns that the benefits are likely to be few with serious risks of harm from the public being misled from the unjustified claims of high performance of this test from Government.”
Prof Kevin McConway, Emeritus Professor of Applied Statistics, The Open University, said:
“So far, the great majority of testing in this country and elsewhere of whether people are currently infected with SARS-CoV-2, the virus that causes Covid-19, has used a procedure called RT-PCR. Those tests are certainly pretty accurate, but a snag is that generally the specimens (swabs) taken from people being tested need to go to a lab for processing, which adds logistic complication, and the processing takes quite a time. So the availability of a testing system that is much quicker and might possible avoid specimens being sent to a lab is good news – as long, that is, as the test is accurate enough.
“These new results provide estimates of the performance of one particular test, a lateral flow device called the Innova SARS-CoV-2 Antigen Rapid Qualitative Test. Tests of this kind generally can provide results in under half an hour. The usual quantities, that measure how accurate the test is, do look high, but whether they are high enough does depend on exactly how and where the test is being used, I’d say. There are two of these quantities, called the sensitivity and specificity. There have to be two of them because there are two different ways in which any test, for diagnosing an disease or detecting an infection, can give the wrong answer.
“First consider people who do actually have the disease or condition in question – in this case, people who really are infected with the virus. Many of them – we’d hope the great majority – would test positive for the virus, and they are the ‘true positives’. But no test is perfect, so there will be some people who are infected but test negative for the infection. They are ‘false negatives’ – negatives because their test was negative, false because that negative result is in fact wrong. Now consider people who aren’t infected. Again, we’d hope that the majority would test negative, and they would be the ‘true negatives’, but again it will very likely happen that some of them would test positive, and they are ‘false positives’. So there are two different kinds of wrong test result – false positives and false negatives – and they both matter.
“The two ways of reporting on the chance of errors from a test are the ‘sensitivity’ and the ‘specificity’. Two ways because there are two kinds of wrong result. The sensitivity is the percentage of true positives, out of all the people who really are infected (that is, of the true positive together with the false negatives). The specificity is the percentage of true negatives, out of all the people who really aren’t infected (that is, of the true negatives together with the false positives). Both of these should be high.
“This study estimates the sensitivity for this Innova test as 76.8%, and the specificity as 99.68%. In a way, these are like the sensitivity and specificity for RT-PCR, in that both look reasonably high, but the specificity is much higher than the sensitivity. Though the sensitivity is lower than the specificity, actually for the issues I will describe, the specificity is more important – and anyway the researchers report that the sensitivity is much higher for people with a high viral load, who are most likely to be infective to others.
“So what might be an issue with the test, or to be more precise with some ways that it might be used, given these very high figures? It’s that the sensitivity and specificity are both percentages of people for whom it’s known whether they are infected or not. But in a real-life testing situation you don’t know whether someone is infected or not. The whole reason for doing the test is to try to find out whether they are infected. The specificity, for example, tells you what percentage, out of the people who are truly not infected, are true negatives, and what percentage are false positives. But those two groups have different test results (negative and positive). What you know, if someone has been tested is that they are negative, or that they are positive, so a percentage that includes both positives and negatives isn’t going to help. What you want to know, for people who test positive, is how many of them are true positives and how many are false positives. And the specificity can’t tell you that, because it relates to false positives and true negatives, not to false positives and true positives. To work out what you want to know, you need to know three things – the specificity, yes, but the sensitivity too, and also the percentage of people, in the group being tested, that are actually infected. (That’s the ‘prevalence’, in the jargon.)
“The new research results give you estimates of the sensitivity and specificity, but they don’t give you estimates of the prevalence. The prevalence will depend on where the test is being used, and which type of people are being tested. The latest results from the ONS Infection Survey estimated the prevalence of infection in the English community population as 1.42% (taking into account possible false positives and false negatives in the people they tested) – that corresponds to about 1 in 70 people in the population being infected. If 100 people from that population test positive with the Innova test, and the sensitivity and specificity of that test are exactly as estimated (76.8% and 99.68%), then about 78 of them will really be infected and the other 22 will be false positives. The exact numbers would probably be a bit different from 78 and 22, but it’s likely that about three-quarters of the 100 people who tested positive would be true positives, and the other quarter would be false positives and would not really be infected at all. (However, almost everyone who tests negative – over 99.6% of them – will truly not be infected.)
“Now, picking up quite a lot of false positives may not matter much, for example if the new rapid test is being used generally to find areas where infection rates are high. But it could matter if the test results are going to be used to make decisions on individuals, say, asking them to self-isolate, if quite a high proportion of those testing positive are not actually infected. That issue could be improved perhaps by repeatedly testing people who test positive, which might be easier and quicker with a rapid test like this than with RT-PCR.
“How can it arise that the implication of a positive test result, in terms of the chance that the person really is infected, is so much lower than the sensitivity and the specificity? Intuitively, it’s because of the following. In the people being tested, a very large majority aren’t infected – in fact with these assumptions over 98% of them aren’t. So the people who test positive consist of a large percentage (the sensitivity) of the small number who are infected, who are the true positives, and a very small percentage (defined by the specificity) of the very much larger number who are not infected, who are the false positives. It’s not clear instantly how the large percentage of a small number and the small percentage of a large number will compare – you have to work out the numbers – and in this case it turns out that there are more true positives than false positives, but there are still rather a lot of false positives.
“Another important point here is that what look like rather small differences in the specificity can make quite large differences to the numbers and proportions of false positives. There’s some evidence than the specificity of the RT-PCR test is higher than the 99.68% reported for the Innova test – it could perhaps be 99.95% or even higher. That doesn’t really look much higher than 99.68%. But if it’s 99.95%, and I use the same sensitivity (76.8%) and prevalence (1.42%) as I used in the last calculation, then about 95 in every hundred people testing positive will be true positives (really infected), rather than about 77 in a hundred.
“As a final illustration of how the numbers can vary, let’s go back to the sensitivity and specificity reported for the Innova test (76.8% and 99.68%) but consider what would happen if the test is used in a group of people with a very high level of infection, say 4%, which is more than double the average rate for England in the ONS survey. Then, out of 100 people who test positive, about 90 would be true positives, so many fewer than in an area with an average level of infection. But in a group with a much lower level of infection, say 0.7%, which might be somewhere near the current position in South-West England for instance, only around 60 in every 100 people who test positive would really have the virus.
“None of this is meant to imply in any way that the new rapid test is no good. A pretty accurate and more rapid test, like this, will have many important uses. All I intend to do is to urge some care in interpreting the test results, depending on exactly what the test is being used for and which groups of people are being tested.”
Prof Jonathan Ball, Professor of Molecular Virology, University of Nottingham, said:
“Whilst the lateral flow assay lacks the sensitivity of the PCR test, its rapidity and ease of use makes it a pragmatic test for community surveillance, where you want to quickly identify then isolate infected people. Even though it won’t detect as many infected individuals as the PCR test, it will identify those with the highest viral loads, and it’s those people who are most likely to go onto infect others. It won’t replace other tests like PCR, but it is a useful additional tool for coronavirus control.”
Prof Sheila Bird, Formerly Programme Leader, MRC Biostatistics Unit, University of Cambridge, said:
“Given the deployment of the INNOVA SARS-CoV-2 Antigen Rapid Qualitative Test for mass screening of asymptomatic persons in Liverpool, it is disappointing to read as a conclusion to the phase 3b evaluation in individuals with confirmed SARS-Cov-2 infection: “There were no discernible differences in viral antigen detection in asymptomatic vs. symptomatic individuals (33/43 76.7% vs. 100/127 78.7%, p = 0.78)”.
“First, notice that the denominator for estimating sensitivity in asymptomatic confirmed SARS-CoV-2 positives was a mere 43 subjects. The 95% confidence interval about 77% is wide, ranging from 65% to 90% (nearest 5%).
“Second, suppose that I had wanted to design a randomized controlled trial (RCT) with 80% power to differentiate by the yard-stick of statistical significance at the 5% level between novel [75%] and control [65%] treatments’ success-rate when patients were randomized in the ratio 1:1. How large should my RCT be? The answer is that I should need to consent around 660 patients.
“By contrast, the above phase 3b’s 170 SARS-CoV-2 infected individuals with known symptom-status were many times fewer than 660 and were also unequally apportioned between symptomatic (127) and asymptomatic (43), which weakens power.
“Third is a further alert. Among consecutive cases from COVID19 Testing centres, performance was lower when self-trained members of the public attempted to follow a protocol [214/372, 58% positive; 95% confidence interval: 52% to 63%] than when the test was used by laboratory scientists [156/197, 79% positive; 95% confidence interval: 73% to 85%] or by trained healthcare workers [92/126, 73% positive; 95% confidence interval: 64% to 80%]. However, as there is no mention that consecutive cases were randomized to format for test-deployment {self, laboratory scientist, trained healthcare worker}, the 3-way comparisons are not necessarily like-with-like.
“Let us all hope that Liverpool’s public health and academic teams have been able to deploy scientific method, including randomization and study size considerations, to good effect.”
All our previous output on this subject can be seen at this weblink:
www.sciencemediacentre.org/tag/covid-19
Declared interests
Prof Sebastian Johnston: “Prof Sebastian Johnston is Chairman & Chief Medical Officer of Virtus Respiratory Research Ltd. which sells an antibody test and is currently investigating PCR/LAMP/antigen testing.”
Prof Jon Deeks: “Jon Deeks is Professor of Biostatistics at the University of Birmingham and fully funded by the University of Birmingham. He leads the international Cochrane COVID-19 Diagnostic Test Accuracy Reviews team summarising the evidence of the accuracy of tests for Covid-19; he is a member of the Royal Statistical Society (RSS) Covid-19 taskforce steering group, and co-chair of the RSS Diagnostic Test Advisory Group; he is a consultant adviser to the WHO Essential Diagnostic List; and he receives payment from the BMJ as their Chief Statistical advisor.”
Prof Kevin McConway: “I am a Trustee of the SMC and a member of the Advisory Committee, but my quote above is in my capacity as a professional statistician.”
Prof Jonathan Ball: “No CoIs.”
Prof Sheila Bird: “SMB is a member of Royal Statistical Society’s COVID-19 Taskforce and has a long-standing interest in statistical reporting standards.”
None others received.