Public Health England (PHE) laboratories have conducted analysis on antibody tests produced by the companies Roche and Abbott.
Prof Jon Deeks, Professor of Biostatistics and Head of the Test Evaluation Research Group, University of Birmingham, said:
“COVID-19 serology tests from Roche and Abbott were reported by the manufacturers and government last week as being “100% accurate” and as game-changers in the identification of past infection with COVID-19. These claims were based on studies undertaken by Public Health England (PHE) but the release of these statements preceded any official study reports being made available, and thus their validity could not be scrutinised. These reports have now been released (1,2) and it is possible to check the veracity of the “100% accuracy” claim.
“The two studies have notable limitations: they are based on samples and not patients (almost certainly some patients will have contributed multiple samples which will make the results look more precise than they actually are); the origin and severity of disease in the COVID-19 samples is not known, so we can’t check whether the samples are representative of typical patient groups; and non-COVID-19 patients with similar respiratory illnesses were not included. As the studies were undertaken in expert PHE laboratories, their performance may not be so high when used in practice.
“The accuracy of a test relates to whether it makes errors: whether there are individuals with the disease who wrongly get negative test results, and whether there are individuals without the disease who wrongly get positive test results. Saying a test is 100% accurate implies to the public that neither of these two types of error occurs. The reports from PHE make it clear that this statement is misleading.
“Whilst both tests make no or very few false positive errors – they only very rarely wrongly state a non-COVID-19 sample as showing antibodies to COVID-19, both tests sometimes miss detecting COVID-19 in samples which are from infected patients.
“For the Roche test, 93 samples with COVID-19 were tested, of which 78 (84%) gave positive test results. Thus 16% were missed. For the Abbott test, 96 samples with COVID-19 were tested, of which 90 (94%) gave positive test results. Thus 6% were missed. Whilst these error rates, particularly for Abbott, may still be low enough for these tests to have a useful role, they fall short of being game-changers, and certainly cannot be described as 100% accurate. Other tests exist which have similar performance. The two tests were evaluated on different samples in different laboratories, so caution is needed in making direct comparisons.
“It is also important to consider the margin of error in these estimates which arises because of sample size. The 95% confidence intervals for Roche figure ranges from 75% to 91%, and for Abbott from 87% to 98%. The values within these ranges mark out a range encapsulating the possible performance of the test.
“What would these errors rates mean for people using this test? If a group of people chose to use these tests because they have had a COVID-19 like illness could they trust the results? To answer this question it is necessary to make a guess at what proportion of them are likely to have COVID-19. If we assume that half of them had had COVID-19 and the other half had had a different respiratory illness, a positive test result will almost certainly indicate that a person has had COVID-19 (between a 95% and 100% chance). However, 14% of those receiving a negative result for the ROCHE test would have COVID-19, and 6% of those receiving a negative result for the Abbott test would have COVID-19. Considering the margin of error in these estimates gives ranges from 8% to 20% for Roche and 2% to 12% for Abbott.
“It was also very clear to see that the performance of the Roche test increases with the time since onset of symptoms. This is expected as it takes some time after infection for antibody responses to build up and be detectable. The study showed that 25% of COVID-19 samples taken between 11 and 20 days after onset of symptoms wrongly tested negative, 21% between 21 and 30 days and 5% after 30 days. These figures are based on even smaller samples and have large uncertainty in them. The trend with time was not so evident in the samples tested by Abbott.
“It is of concern that there are now websites offering home sampling kits to send samples for evaluation using these tests, which make false claims about the performance of these tests, either directly repeating the 100% accuracy claim, or quoting other figures much higher than those from these studies. They also rarely mention the importance of time since onset of symptoms in deciding when a test should be done. It is important that when the public use these tests they will be made aware of the likely error rates, and the certainty that they can attach to positive and negative test results, and are given clear instructions on when and how to use the tests. Several websites repeat the misleading statements made in the media that these tests have been approved by PHE with unauthorised copies of the PHE logo – PHE have made it very clear that their role is to evaluate but not to approve tests, and that no commercial test can state that it has been approved by PHE or use the PHE logo [3].
“For the benefit of the public it is important that performance of these tests is correctly reported and disseminated, and that the public are given factual information to correctly choose whether, how and when to use these tests, and the certainty with which their results should be interpreted.”
Prof Sheila Bird, formerly Programme Leader at MRC Biostatistics Unit, University of Cambridge, Honorary Professor, Edinburgh University College of Medicine and Veterinary Medicine & member of RSS Covid-19 Taskforce, said:
“Public health and statistical science require robust experimental design and reporting standards.
Independent evaluations have been conducted under the auspices of Public Health England of two serological tests, developed respectively by Abbott Laboratories and by Roche, for detection of Immunoglobulin G (IgG) antibodies against SARS-CoV-21 and for use in population surveillance. The test developers had, of course, conducted their own evaluations. For the criteria set by UK’s Medicines and Healthcare products Regulatory Authority for accreditation as point-of-care test, please see https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/883897/Target_Product_Profile_antibody_tests_to_help_determine_if_people_have_immunity_to_SARS-CoV-2_Version_2.pdf.
Specificity [Sp, true negative rate or percentage of those truly negative who actually test negative] and sensitivity [Sn, true positive rate or percentage of those truly positive who actually test positive] of at least 98% each are considered necessary to determine an individual’s antibody status; but lower values can be acceptable for population surveillance because appropriate adjustment can be made2 3.
“Background considerations: Demographic factors (age-group; being male) are strong prognostic factors for severe versus mild manifestation of COVID-19 disease. Hence, were the tests to be deployed in population surveillance for IgG antibodies against SARS-CoV-2 (have I had it?), we need to know in advance whether the serological tests’ performances in terms of Sp and Sn vary by age-group and/or gender. If so, adjustments to population surveillance estimates would need to be made for age and/or gender.
“In addition, robust evaluation of IgG antibody tests needs to source sufficient sera from people who tested PCR-positive in swab-test (have I got it?) but experienced only mild symptoms as their prevalence (or persistence) of IgG antibodies could be different from patients whose symptoms led to their hospitalization.
“Serological testing for IgG antibodies to SARS-CoV-2 may be confounded by the presence of other viruses and so a key aspect of any independent evaluation is to amass sera from patients with a range of confounding conditions (but not coronavirus-2) to check if the COVID-19 IgG antibody-test signals positive.
“Finally, persistence of IgG antibodies – measured alternatively from the date of symptom onset (subject to recall bias) or from swab-sample-date (subject to access to testing) – matters and so any rigorous independent evaluation needs to access repeat sera (pairs, triples, quartets) from recovered COVID-19-swab-positive patients and, for masking purposes, from patients who are not known to have encountered SARS-CoV-2.
“Acquisition and banking of sera: The above background considerations mean that sera must be acquired as shown in Table 1, which includes dating alternatively from symptom-onset-date and swab-positive-date. Ideally, each patient will have contributed sufficient blood that their research-donation or residual blood-samples can be used in at least 5 evaluations. If each cell in Table 1 were represented by at least 100 sera, then 1400 negative sera (700 from males; 700 from females; 100 per age-group per gender) would have to be acquired together with 800 confounder sera (400 from males; 400 from females). The goal in Table 1 would be to acquire sera from six times as many swab-positive patients (8400 in all; 1680 sera from patients who were hospitalized with COVID-19 disease).
“A different acquisition scheme is needed to test robustly for persistence of IgG antibodies within an individual.
“Weaknesses in independent evaluation under the auspices of Public Health England (PHE): Ten reservations are listed.
1. Equally-powered evaluation of the two tests was not provided: PHE’s evaluation of the Roche test accommodated 85 confounder samples versus 364 for the Abbott test; each was evaluated on fewer than 100 convalescent sera.
2. Uncertain differentiation between repeat samples from the same patient and single samples from distinct patients. Throughout, there is a lack of transparency about the use of repeat sera per patient. Initial evaluation should be straightforwardly on the basis of one-sample per patient.
3. Hence, readers of the PHE reports cannot discern how many positive patients (versus sera) contributed to the evaluation of sensitivity (true positive rate). Neither evaluation included more than 100 convalescent patients (both genders, all ages, hospitalized or mildly symptomatic).
4. Age-group and gender of the patients whose sera were being analysed was not heeded. Demography matters for whether patients develop severe COVID-19 disease. Also, since immune responses differ intrinsically with age, the proportion of patients who develop IgG antibodies may differ by age; as may their persistence.
5. Level playing-field between the Roche and Abbott evaluations was not apparent in terms of whether the tested sera came from patients who had been hospitalized for COVID-19 disease or had been only mildly symptomatic. Both matter, especially if IgG antibody tests are to be used for population surveillance, and test-performance may be different by symptomatology.
6. Confounder samples, amongst which no positive was found in PHE’s evaluation of the Roche test, did double-duty in the evaluation of the Roche test by being counted together with the “negative” samples, thereby increasing precision in an unprincipled manner.
7. Even playing field for both tests was lacking in PHE’s evaluation of confounders. Confounder samples came from a range of patient-conditions, totalling 85 sera for the Roche evaluation (35/85 were from Lyme disease patients) versus 364 for the Abbott evaluation (11/364 were seasonal coronavirus positives).
8. Lack of clarity about analysis plan in respect of whether test-evaluation was primarily in respect of a) time-since-swab-positive-sample-date; or b) time-since-symptom-onset-date. Both matter in public health terms. Test-developers’ focus may be preferential.
9. Lack of clarity about the analysis plan for exclusion of outliers when fitting half-normal distribution to the log10(test read-outs).
10. Lack of clarity about PHE’s evaluation having not been designed to test the persistence of IgG antibodies within-person over time.
The above list is not exhaustive4.
“Public Health England’s evaluation does not meet the standards that would be expected for confirmation that these tests meet the criteria set by UK’s Medicines and Healthcare products Regulatory Authority for accreditation as point-of-care tests. Moreover, their suitability for age- and gender-specific population surveillance has not been addressed.
1. COVID-19: laboratory evaluations of serological assays, see https://www.gov.uk/government/publications/covid-19-laboratory-evaluations-of-serological-assays
2. Diggle, P.J. (2011). Estimating prevalence using an imperfect test. Epidemiology Research International, Article ID 608719. doi:10.1155/2011/608719.
3. Royal Statistical Society COVID-19 Taskforce Statement on COVID-19 Antibody Testing, issued on 14 April 2020. https://rss.org.uk/RSS/media/Policy-and-campaigns/Policy/Statement-Antibody-testing-14-04-2020.pdf.
4. Deeks J. Independent Evaluation of Immunoglobulin G antibodies against SARS-CoV-2: Statements of “100% accuracy” are not supported by the data now available. It is important that the public are properly informed of the accuracy of these tests. @deeksj.
Table 1. Data per single blood sample from an individual who tested PCR-positive for COVID-19 include: birth-year, gender, symptom-onset date, swab-date, swab-PCR-result, whether hospitalized with COVID-19 disease, blood-sample-date. Italicised data apply also per single blood sample from individual who could not have been infected by SARS-CoV-2, for example because blood-sample-date was in the first semester of 2019.
Prof Liam Smeeth, Professor of Clinical Epidemiology, and Dean of the Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, said:
“These evaluations and the tests themselves are very welcome. The high specificity is important as it makes it very unlikely people will wrongly be told they have been exposed when in fact they have not.
“However, what remains crucial is knowing the extent to which the results of these tests provide reliable information on infection risk and also on whether people are still able to carry the virus. Knowing these two things would allow us to start using serology results to guide individual behaviour. The hope is that serological tests can tell us someone is immune to infection, so they are personally not at risk. But also there is hope that serology will strongly correlate with whether someone can still be a carrier and infect others. While the evaluations so far are promising, we do not yet have the data we need to use these tests to guide individual behaviours.”
Dr Alexander Edwards, Associate Professor in Biomedical Technology, Reading School of Pharmacy, University of Reading, said:
“There is a lot to celebrate here. Firstly, it’s brilliant to see PHE sharing the main findings of these evaluations of these important products. Secondly, having reliable laboratory antibody tests will help us to build a more detailed picture of viral spread, and be essential to develop the next generation of diagnostic tests.
“What is becoming clear is that these laboratory antibody tests are very specific giving great confidence that they won’t erroneously pick up “false positives”. Neither test is 100% sensitive especially at early times after infection but that’s also not a surprise, and they are both still very useful. Both the Roche and Abbott immunoassay instruments are quite commonly found in UK diagnostic laboratories. Remember these laboratory tests still require a blood sample to be taken – these are not home tests. Whilst so many people who did not get swab tested when they were ill will want to know if they have been infected, it will take significant work to make these tests widely available. They appear to become more accurate at later timepoints after infection, giving best sensitivity >20 days after the start of infection.
“An important point to reinforce however is that antibody tests are not yet going to confirm protective immunity for every individual with a positive result. The way these tests are configured currently is likely to be to detect a wide range of levels of antibody – this can be a very strong antibody response, or a relatively weak one. The reason to detect the weaker responses is to try to detect as many people as possible who have been infected – this is to maximise the “sensitivity”. Antibody tests can try to measure the level of antibody, but we don’t yet have data to know whether the level of antibody is related to protection. Equally importantly, prior science and early studies hint that antibody responses against the “Spike protein” are most important for neutralising virus, which might be important to protect against infection. However, both the Abbott and Roche tests detect antibody against a different protein termed “N”.
“Most people who have recovered would not be expected to become re-infected – but we don’t yet know how strong this protection is or how long it will last. Having this data on test performance is essential so that we can start answering these two critical scientific questions.”
https://www.gov.uk/government/publications/covid-19-laboratory-evaluations-of-serological-assays
Previous comments on the Roche antibody test from Thursday 14 May: https://www.sciencemediacentre.org/expert-reaction-to-news-reports-that-phe-have-said-roche-antibody-test-is-a-very-positive-development-and-that-assessments-at-porton-down-found-it-was-highly-specific/
All our previous output on this subject can be seen at this weblink:
www.sciencemediacentre.org/tag/covid-19
Declared interests
Prof Jon Deeks: No conflicts of interest
Prof Liam Smeeth: “No conflicts.”
None others received.