Journal of Hepatology
Volume 53, Issue 2 , Pages 222-224, August 2010

Serum fibrosis markers: Death by validation or a leap of faith?

Université Pierre et Marie Curie, Assistance Publique – Hôpitaux de Paris, Hôpital Pitié Salpêtrière, INSERM UMRS 893, Paris, France

Received 27 April 2010; accepted 27 April 2010. published online 11 May 2010.

Article Outline

 

Before the year 2000, serum fibrosis markers were in an embryonic state of development [1], [2], [3]. Then along came an era of rapid therapeutic advances in viral hepatitis, prompting the clinical need for diagnosing liver fibrosis simply, safely, and in the largest number of candidates for therapy. As a consequence, the past 10 years have witnessed a flurry of publications on newly identified non-invasive fibrosis markers. All of them have reported the area under the receiver operating characteristic curves (AUROC) for significant fibrosis, calculated sensitivity, specificity, and positive and negative predictive values in cross-sectional studies, and as a result, have claimed to be validated. Left with no choice in the past, the clinician is now facing a bewildering situation each time news about a new fibrosis marker is published: what do we call validation, how much validation is needed, and which test can we trust in clinical practice?

The French National Authority for Health (Haute Autorité de Santé) has issued a list of criteria for validation on which to base the decision on whether to recommend each new non-invasive method for assessing liver fibrosis for use in clinical practice (Table 1). Initial reports of most studies usually fulfill the first three criteria and report on an AUROC for differentiating advanced vs. no/minimal fibrosis ranging between 0.7 and 0.85. Unsurprisingly, the higher the AUROC, the louder the claim of “validation”. The study by Cales et al., as reported in the current issue of the Journal, brings these criteria a step further. The authors present two new tests specifically developed for predicting advanced fibrosis in a population of HCV–HIV coinfected patients. These tests are a simplification of the original Fibrometer [4], a proprietary non-fibrosis marker which they share many components with. The HICV test simply uses alpha 2 macroglobulin, AST, and the prothrombin index. The second test, the FibroMeter HICV uses, in addition, platelets, urea, and hyaluronic acid, adjusted for age and sex. Using impeccable methodology in both a derivation population (n=183) and an independent multicentric validation population (n=284), the authors demonstrate that these tests perform better than other available tests, including the original Fibrometer and the more popular FibroTest. Why these slimmed down versions of the parent tests, chosen empirically, perform better than the original ones is unclear. Empiricism at its best, although already proven successful [4], [5], sometimes clashes with the legitimate claim for a scientific rationale. Inevitably, one is also left asking whether these second generation tests would have performed better then the first generation ones if they had been tested for the other common liver diseases (HCV, HBV, alcoholic liver disease, or NAFLD) as well. While waiting for these answers and going back to the original concern regarding the methodology of validation, the report by Cales et al. is innovative, as the authors use a plethora of performance indices, new in this field. These include, but are not limited to: overall diagnostic performance, test performance profile, diagnostic accuracy, reliable diagnosis, reliable diagnosis intervals, old and new, misclassification rate, diagnostic cut-offs, overall test reproducibility, likelihood ratios, robustness, adjusted AUROCS and the Obuchowski method [6]. This is more validation than the average reader/clinician can comprehend. Moreover, by showing higher scores for all these indices, the authors strive to demonstrate that the new tests perform better than the previous ones, as they provide a result closer to that of liver biopsy. This raises two simple concerns. First, how legitimate is the race for higher diagnostic indices vs. liver biopsy? Second, as the choice of tests becomes larger and the validation that is presented for each one increasingly sophisticated, should we constantly ask for more validation or should we focus on the validation that will make a difference for our clinical practice?

Table 1. Minimal requirements for the validation of non-invasive fibrosis markers according to the French National Authority for Health (Haute Autorité de Santé).
Sufficiently large sample size; consecutive patientsanalyzed prospectively
Results reported separately for different etiologies of liver disease
Specification of characteristics of liver biopsy (size, time interval between the test, and biopsy)
Accessibility, reproducibility, including inter-laboratory variability, limits of use and interpretation
At least one independent validation

Adapted from: http://www.has-sante.fr/portail/jcms/c_476347/methodes-devaluation-de-la-fibrose-hepatique-au-cours-des-hepatopathies-chroniques, accessed on April 10th 2010.

There is now enough data showing that the race for the highest AUROC or, by this account, any other diagnostic indices might instead lead to a dead-end. Liver biopsy has significant sampling variability and even with a sample size of 25mm, by all accounts a good size for a biopsy, there is a 25% misclassification rate between adjacent METAVIR fibrosis stages [7]. Trying to achieve perfect concordance with a diagnostic procedure that has such a high misclassification rate is, to put it nicely, paradoxical. This is intuitive reasoning, but in an elegant study [8] Mehta et al. demonstrated that the variability of the results provided by liver biopsy makes it impossible to achieve an AUROC close to 1, even for a marker that measures the disease perfectly. For instance, with conservative estimates of biopsy error, such as an 80% sensitivity and specificity of biopsy and an advanced fibrosis prevalence of 40%, a perfect test would have an expected AUROC vs. liver biopsy of only 0.76. In the best case scenario (90% sensitivity and specificity of liver biopsy) and for the same disease prevalence, the same perfect test will only reach an AUROC of 0.9 [8]. The optimistic interpretation of these simulations is that some of the markers now available are already perfect. A more realistic interpretation is that striving for the highest possible AUROC vs. liver biopsy will amount to promoting a test that is as imperfect as liver biopsy itself. That does not make it simple to compare two tests with different AUROCS, even in the same population: the one with the lowest AUROC might well be the more accurate one!

There is yet another conceptual hurdle when trying to validate a marker against liver biopsy, and this is our uncertainty about what these markers are really measuring. Most studies report proportional step-wise increases between test values and fibrosis stages. However, fibrosis stages, especially in the METAVIR classification, are defined primarily by architectural distortion rather than by the amount of fibrosis per se. Measurement of the area of fibrosis by micromorphometry (an accepted surrogate of the amount of liver fibrosis) has shown that the relationship with the histological stages is all but linear: slight increases between F0 and F2 and dramatic increases beyond F2 [7]. While it might be reasonable to accept that direct or even indirect fibrosis markers are correlated with the amount of fibrosis, it does take a serious stretch of imagination to admit the rationale of their correlation with the architectural changes which define the histological stages per se.

These two conceptual hurdles do not mean that serum fibrosis markers are not acceptable surrogates of liver fibrosis and should not be used for this purpose. But they should remind us that there are serious limits in our understanding of their correlation with the fibrotic process in the liver and, most importantly, that there is a clear limitation on how much validation we can achieve in these type of cross-sectional studies vs. liver biopsy, given the imperfect nature of the latter as a gold standard. Endless cross-sectional validation of newer and older markers would only achieve death by validation but not build the case for a more convincing diagnostic alternative to liver biopsy. What type of validation is then critical, given the limits of liver biopsy and our requirements for generalized use of these markers in clinical practice?

First, analytical validation of these markers, especially serum-based markers, is critical. As most of these markers simply calculate a score based on locally obtained biological measurements, the standardization of these measurements and their exportability from one laboratory to another is critical. The variability of some of the biological parameters such as aminotransferases is of particular concern and needs to be minimized [9]. This is the case for intra- and inter-laboratory variability, as well as preanalytical conditions of sampling and storage [10]. Even more important is the assessment of intra-patient reproducibility. However good the correlation with liver biopsy in an individual patient may be, if short-term fluctuations of the test are minimal, then longer-term variations will most likely reflect true progression or regression of the disease. This will allow for confident long-term patient monitoring. Unfortunately, with currently available biomarkers of liver fibrosis, an extensive analytical validation is by far an exception rather than the rule.

A second major consideration is the exhaustive description of precautions of use and limits of interpretation of any candidate non-invasive marker. Just as we require the best understanding of contraindications and precautions of use for pharmaceutical agents, we should know when a test is at risk of providing unreliable results, either falsely positive or falsely negative. This should start with an analysis of the reasons for discordant results between the test and liver biopsy (or any better standard of comparison), acknowledging that when biopsy is used, biopsy itself can be the reason for discordance [11]. An extensive analysis of the prevalence of risk factors for false results should be conducted in the general population and in that of patients with different liver diseases; whenever possible, measures to minimize the impact of such a risk factor on the overall result should be tested in order to provide the clinician with a robust result. The bottom line is that clinicians should know when they can reasonably trust the test and when they should seek alternative diagnostic methods.

A third critical issue is the one circumventing the need for liver biopsy in the validation process. Because of all the methodological issues related to the use of biopsy discussed above, we clearly need alternative standards for validation. Ideally prognostic analyses based on hard clinical end-points (liver-related death, cirrhosis complications) would confirm the clinical relevance of these markers. Scarce studies in HCV [12], HBV [13], alcoholic liver disease [14], and primary biliary cirrhosis [15] are already available, but only for a few markers. Thresholds of a specific marker, that place patients at risk for an adverse hepatic outcome during mid or long-term follow-up, should be defined. This will help tremendously with the identification of individuals at risk of progression, with monitoring strategies and with the optimization of indications for treatment. Another important aspect is whether a non-invasive marker has the ability to capture changes in fibrosis on therapy, independent of improvement in inflammation or steatosis. Of note, markers that include aminotransferases could be confounded by necrosis and/or inflammation. Future trials should ideally incorporate sequential assessment of non-invasive markers in addition to end-of-treatment biopsies [16], whenever applicable. Finally, I suggest that the ultimate demonstration of the ability of a non-invasive marker to replace liver biopsy for patient management, would be the demonstration that basing the decision to treat on a non-invasive marker rather than on liver biopsy would result in the same proportion of eradicated patients. It is well known that advanced fibrosis reduces sustained viral response to the standard of care in chronic hepatitis C. A randomized trial where all participants undergo both liver biopsy and a non-invasive marker could be designed. The decision to treat should be based in one arm on the results provided by liver biopsy, and in the other arm on those provided by the surrogate marker. If, regardless of the method chosen, the eradication rate is the same without additional side effects, then it can be reasonably concluded that non-invasive strategies could replace biopsy for uncomplicated, first-line therapeutic management. Despite all the studies published on fibrosis markers in the past decade, such a demonstration of the utility of any new marker for patient management is not yet available.

The fourth and final critical validation is, of course, the one provided by truly independent studies in a variety of diseased populations, not restricted to tertiary care centers. This type of validation that confirms diagnostic performances equivalent to the ones originally described by the promoters of the tests, is already available for both the FibroTest and the Fibrometer.

Finally, the article by Cales et al. raises challenging issues related to the very design of a non-invasive marker. With the notable exception of both FibroTest and Fibrometer, most serum fibrosis markers simply provide a binary information on the presence of advanced vs. no or mild fibrosis. Given the sophistication of the information provided by liver biopsy (5-point scale for METAVIR, 7-point scale for Ishak fibrosis staging systems) and the way we have incorporated this in our clinical decision making, this binary information appears to be an incongruous oversimplification. If a marker is to truly replace liver biopsy, it should not only provide a truly quantitative measure, useful for follow-up, but also a numerical equivalent to the commonly used histological stages. Ideally, the value should reflect liver fibrosis independent of inflammatory activity or steatosis. Moreover, were such a test to become a useful marker in primary care medicine, akin to markers of cardiovascular risk for instance, it should be simple to use and interpret. Therefore at odds with what is suggested by Cales et al., we should have a single and universal marker of liver fibrosis instead of different markers for different etiologies [4], or distinct algorithms for diagnosing different stages of the disease process [17].

Our understanding of the concept of diagnostic validation of non-invasive markers has grown increasingly complex owing to the many studies on serum fibrosis markers published throughout the last decade. One of the main lessons learned is that cross-sectional validation vs. liver biopsy, although important initially, has intrinsic limitations that will persist no matter how much validation is provided. It is time to move forward from this initial assessment towards a more clinically relevant validation process, encompassing altogether analytical, longitudinal, prognostic, therapeutic, and independent validation. For some markers most of this is already available. Whenever this is the case, the clinician should take the leap of faith towards using these tests in his or her own clinical practice.

Back to Article Outline

Conflict of interest 

The author has no conflict of interest to disclose over the past two years with any maker of a proprietary non-invasive diagnostic methods including BioLivescale or BioPredictive.

Back to Article Outline

References 

  1. Poynard T, Aubert A, Bedossa P, Abella A, Naveau S, Paraf F, et al. A simple biological index for detection of alcoholic liver disease in drinkers. Gastroenterology. 1991;100:1397–1402
  2. Oberti F, Valsesia E, Pilette C, Rousselet MC, Bedossa P, Aube C, et al. Noninvasive diagnosis of hepatic fibrosis or cirrhosis. Gastroenterology. 1997;113:1609–1616
  3. Ratziu V, Giral P, Charlotte F, Bruckert E, Thibault V, Theodorou I, et al. Liver fibrosis in overweight patients. Gastroenterology. 2000;118:1117–1123
  4. Cales P, Oberti F, Michalak S, Hubert-Fouchard I, Rousselet MC, Konate A, et al. A novel panel of blood markers to assess the degree of liver fibrosis. Hepatology. 2005;42:1373–1381
  5. Imbert-Bismut F, Ratziu V, Pieroni L, Charlotte F, Benhamou Y, Poynard T. Biochemical markers of liver fibrosis in patients with hepatitis C virus infection: a prospective study. Lancet. 2001;357:1069–1075
  6. Cales P, Boursier J, Oberti F, Hubert I, Gallois Y, Rousselet MC, et al. FibroMeters: a family of blood tests for liver fibrosis. Gastroenterol Clin Biol. 2008;32:40–51
  7. Bedossa P, Dargere D, Paradis V. Sampling variability of liver fibrosis in chronic hepatitis C. Hepatology. 2003;38:1449–1457
  8. Mehta SH, Lau B, Afdhal NH, Thomas DL. Exceeding the limits of liver histology markers. J Hepatol. 2009;50:36–41
  9. Imbert-Bismut F, Messous D, Thibaut V, Myers RB, Piton A, Thabut D, et al. Intra-laboratory analytical variability of biochemical markers of fibrosis (Fibrotest) and activity (Actitest) and reference ranges in healthy blood donors. Clin Chem Lab Med. 2004;42:323–333
  10. Poynard T, Muntenau M, Morra R, Ngo Y, Imbert-Bismut F, Thabut D, et al. Methodological aspects of the interpretation of non-invasive biomarkers of liver fibrosis: a 2008 update. Gastroenterol Clin Biol. 2008;32:8–21
  11. Poynard T, Munteanu M, Imbert-Bismut F, Charlotte F, Thabut D, Le Calvez S, et al. Prospective analysis of discordant results between biochemical markers and biopsy in patients with chronic hepatitis C. Clin Chem. 2004;50:1344–1355
  12. Ngo Y, Munteanu M, Messous D, Charlotte F, Imbert-Bismut F, Thabut D, et al. A prospective analysis of the prognostic value of biomarkers (FibroTest) in patients with chronic hepatitis C. Clin Chem. 2006;52:1887–1896
  13. Ngo Y, Benhamou Y, Thibault V, Ingiliz P, Munteanu M, Lebray P, et al. An accurate definition of the status of inactive hepatitis B virus carrier by a combination of biomarkers (FibroTest–ActiTest) and viral load. PLoS One. 2008;3:e2573
  14. Naveau S, Gaude G, Asnacios A, Agostini H, Abella A, Barri-Ova N, et al. Diagnostic and prognostic values of noninvasive biomarkers of fibrosis in patients with alcoholic liver disease. Hepatology. 2009;49:97–105
  15. Mayo MJ, Parkes J, Adams-Huet B, Combes B, Mills AS, Markin RS, et al. Prediction of clinical outcomes in primary biliary cirrhosis by serum enhanced liver fibrosis assay. Hepatology. 2008;48:1549–1557
  16. Poynard T, Ngo Y, Marcellin P, Hadziyannis S, Ratziu V, Benhamou Y. Impact of adefovir dipivoxil on liver fibrosis and activity assessed with biochemical markers (FibroTest–ActiTest) in patients infected by hepatitis B virus. J Viral Hepat. 2009;16:203–213
  17. Sebastiani G, Vario A, Guido M, Noventa F, Plebani M, Pistis R, et al. Stepwise combination algorithms of non-invasive markers to diagnose significant fibrosis in chronic hepatitis C. J Hepatol. 2006;44:686–693

PII: S0168-8278(10)00388-0

doi:10.1016/j.jhep.2010.04.006

Refers to article:

  • Comparison of liver fibrosis blood tests developed for HCV with new specific tests in HIV/HCV co-infection , 26 April 2010

    Paul Calès, Philippe Halfon, Dominique Batisse, Fabrice Carrat, Philippe Perré, Guillaume Penaranda, Dominique Guyader, Louis d’Alteroche, Isabelle Fouchard-Hubert, Christian Michelet, Pascal Veillon, Jérôme Lambert, Laurence Weiss, Dominique Salmon, Patrice Cacoub
    Journal of Hepatology August 2010 (Vol. 53, Issue 2, Pages 238-244)

Journal of Hepatology
Volume 53, Issue 2 , Pages 222-224, August 2010