Edith Sherwood very kindly left an interesting comment on my “Voynich Manuscript – the State of Play” post, which I thought was far too good to leave dangling in a mere margin. She wrote:-
If you read the 14C dating of the Vinland Map by the U of Arizona, you will find that they calculate the SD of individual results from the scatter of separate runs from that average, or from the counting statistical error, which ever was larger. They report their Average fraction of modern F value together with a SD for each measurement:
- 0.9588 ± 0.014
- 0.9507 ± 0.0035
- 0.9353 ± 0.006
- 0.9412 ± 0.003
- 0.9310 ± 0.008
F (weighted average) = 0.9434 ± 0.0033, or a 2SD range of 0.9368 – 0.9500
Radiocarbon age = 467 ± 27 BP.
You will note that 4 of the 5 F values that were used to compute the mean, from which the final age of the parchment was calculated, lie outside this 2SD range!
The U of A states: The error is a standard deviation deduced from the scatter of the five individual measurements from their mean value.
According to the Wikipedia radiocarbon article:
‘Radiocarbon dating laboratories generally report an uncertainty for each date. Traditionally this included only the statistical counting uncertainty. However, some laboratories supplied an “error multiplier” that could be multiplied by the uncertainty to account for other sources of error in the measuring process.’
The U of A quotes this Wikipedia article on their web site.
It appears that the U of Arizona used only the statistical counting error to computing the SD for the Vinland Map. They may have treated their measurements on the Voynich Manuscript the same way. As their SD represents only their counting error and not the overall error associated with the totality of the data, a realistic SD could be substantially larger.
A SD for the Vinland map that is a reasonable fit to all their data is:
F (weighted average) = 0.9434 ± 0.011 ( the SD computed from the 5 F values).
Or a radiocarbon age = 467 ± 90 BP instead of 467 ± 27 BP.
I appreciate that the U of A adjust their errors in processing the samples from their 13C/12C measurements, but this approach does not appear to be adequate. It would be nice if they had supplied their results with an “error multiplier”. They are performing a complex series of operations on minute samples that may be easily contaminated.
I suggest that this modified interpretation of the U of A’s results for the Vinland Map be confirmed because a similar analysis for the Voynich Manuscript might yield a SD significantly larger than they quote. I would also suggest that your bloggers read the results obtained for 14C dating by the U of A for samples of parchment of known age from Florence. These results are given at the very end of their article, after the references. You and your bloggers should have something concrete to discuss.
So… what do I think?
The reason that this is provocative is that if Edith’s statistical reasoning is right, then there would a substantial widening of the date range, far more (because of the turbulence in the calibration curve’s coverage of the late fifteenth century and sixteenth century) than merely the (90/27) = 3.3333x widening suggested by the numbers.
All the same, I’d say that what the U of A researchers did with the Vinland Map wasn’t so much statistical sampling (for which the errors would indeed accumulate if not actually multiply) but cross-procedural calibration – by which I mean they experimentally tried out different treatment/processing regimes on what was essentially the same sample. That is, they seem to have been using the test as a means not only to date the Vinland Map but also as an opportunity to validate that their own brand of processing and radiocarbon dating could ever be a pragmatically useful means to date similar objects.
However, pretty much as Edith points out with their calibrating-the-calibration appendix, the central problem with relying solely on radiocarbon results to date any one-off object remains: that it is subject to contamination or systematic uncertainties which may (as in Table 2’s sample #4) move it far out of the proposed date ranges, even when it falls (as the VM and the VMs apparently do) in one of the less wiggly ranges on the calibration curve. Had the Vinland Map actually been made 50 years later, it would have been a particularly problematic poster (session) child: luckily for them, though, the pin landed in a spot not too far from the date suggested by the history.
By comparison, the Voynich Manuscript presents a quite different sampling challenge. Its four samples were taken from a document which (a) was probably written in several phases over a period of time (as implied by the subtle evolution in the handwriting and cipher system), and (b) subsequently had its bifolios reordered, whether deliberately by the author (as Glen Claston believes) or by someone unable to make sense of it (as I believe). This provides an historical superstructure within which the statistical reasoning would need to be performed: even though Rene Zandbergen tends to disagree with me over this, my position is that unless you have demonstrably special sampling circumstances, the statistical reasoning involved in radiocarbon dating is not independent of the historical reasoning… the two logical structures interact. I’m a logician by training (many years ago), so I try to stay alert to the limits of any given logical system – and I think dating the VMs sits astride that fuzzy edge.
For the Vinland Map, I suspect that the real answer lies inbetween the two: that while 467 ± 27 BP may well be slightly too optimistic (relative to the amount of experience the U of A had with this kind of test at that time), 467 ± 90 BP is probably far too pessimistic – they used multiple processes specifically to try to reduce the overall error, not to increase it. For the Voynich Manuscript, though, I really can’t say: a lot of radiocarbon has flowed under their bridge since the Vinland Map test was carried out, so the U of A’s processual expertise has doubtless increased significantly – yet I suspect it isn’t as straightforward a sampling problem as some might think. We shall see (hopefully soon!)… =:-o
The problem with these results for the Vinland map is, that the five
samples are not consistent with each other. The spread of the
samples is much larger than the sigmas provided for them.
Since all samples are from the same piece of parchment, this
means that the individual sigmas are not representative for
the error of the measurements.
Rene: please correct me if I’ve got this wrong, but I read the Vinland Map report as saying that they took a single large sample from the bottom right of the map, which they then subdivided into smaller pieces for trying out different processes. The first test, however, gave an essentially modern date, which they attributed to some kind of (unknown) environmental contamination: and so their subsequent passes (over a six month period) used different types of chemical cleansing intended to remove that contamination. The five VM “samples”, then, represent five different processing paths on essentially the same core sample (the first path was basically “as-is”) – unlike the four VMs samples, which came from different bifolios (and hopefully from different animals).
All of which does rather beg the question of what that curious environmental contamination was: accident, forgery or botched restoration? But that’s another can of worms entirely! 😉
Nick, as you will have seen, the ‘outlier’ sample had a C-14 content
of greater than one, meaning that it was contaminated with
post-1950 material, and rather heavily too.
Speculation is always easy and fun. Who knows where the map (or
the sample) was as a function of time? Was it ever near an atomic
test or reactor incident? Was it ever flown around the world in
a jet plane (not sure if that would have been sufficient).
Would it be possible to ‘wash away’ that sort of contamination???
On the numerical results, I had discussed these with Edith before, and
agree that the standard deviation estimated from the spread of the
samples should be 0.011 .
On the other hand, the combined sigma from the five individual sigmas
would be 0.002 (note that you have a typo for sample 4).
The latter value is pretty meaningless, and since we may assume
that all samples have had different amounts of contamination, it is not
clear how one can combined these reliably into a single date estimate.
Rene: my position here is that I think the Vinland Map dating is not a sampling problem per se. The samples are all dependent (it was a single sample taken from one place, and then subdivided into individual pieces for processing), and though the four processes applied to the fragments of the same sample were not identical, they were all devised to try to remove the (presumed 20th century, but otherwise unspecified) contamination. The actual issue is that the U of A seems to have staked its claim to reliability in this field on the back of this test: even though I really don’t share Edith’s statistical pessimism, I don’t quite share the U of A’s statistical optimism either.
As to the source of the heavy “contamination”: yes, speculation is fun – but whatever it was, it ~seems~ as though the processes were indeed largely able to remove it from the surface to a large enough degree to allow radiocarbon dating of the support material to produce sensible results. Perhaps someone will now think to take a tiny piece, divide it in two, strip away the contamination from one and compare spectroscopic analyses of the two to work out what was removed. Perhaps the U of A still has a tiny fragment large enough to do such a small-scale test?
I would like to point out several additional things related to the U of Arizona’s analysis of the Vinland Map:
1. I think they did an excellent job on the analytical data, particularly considering the various runs were performed over a period of 6 years. Their overall quality control must be excellent. My only problem is with their interpretation of the data.
2. I am not sure whether it is appreciated that parchment consists of a protein, collagen, which contains nitrogen in addition to carbon, hydrogen and oxygen, this must complicate the 14C analysis. When collagen is treated with boiling water or acid it is converted to gelatin, this may cause additional problems.
3. I do not understand how the F values are modified using the delta 13C per mil measurements. The U of A state: “All values of F have been normalized to delta 13C = –25.0 per mil.” Corrections have been made from values like –21.9. Can anyone enlighten me?
4. When the U of A converts the Radiocarbon age = 467 ± 27 BP to the calender-age ranges they report a 1 and 2 SD range, presumably due to the 1998 atmospheric decadal tree ring data set being non linear. They first calculate the Radiocarbon age ranges for 1 and 2 SD and from those two sets of points extrapolate from the 1998 atmospheric decadal tree ring curve the following:
One sigma: cal AD 1423–1445, mean 1434 1SD 11
Two sigma: cal AD 1411–1468, mean 1440 2SD 28 1SD 14
Analysis of Vinland Map, Table 2, shows that the non linearity of the 1998 atmospheric decadal tree ring curve can result in some odd looking 1SD and 2SD ranges.
Enjoy your blogging, unless someone gives me something concrete to think about I am finished. Sorry about the typo in the previous post. Please change the SD for sample 4 from 0.006 to 0.003.
Hi Edith,
The point about the last 600 years is that “odd looking 1SD and 2SD ranges” comprise the majority of the radiocarbon dating curve, which means that if the labs aren’t able to elicit the underlying numbers going into the curve without large error bounds, the whole technique is practically useless (i.e. valueless) for things from 1400 onwards. Hence there are lots of good reasons why it is easy to suspect that their desire to make sure it is a useful technique might lead them to optimistically narrow the range of measurement uncertainty.
Rhetorically, this is the point where I’m supposed to defend the U of A but… having looked over the evidence and arguments in the Vinland Map paper very carefully, I don’t think they’ve yet proved their case. Though I’m not as pessimistic about the raw technique as you appear to be, I certainly don’t share their optimism. Perhaps we will see a whole new level of care and argumentative proof in the paper they are doubtless finalizing for their Voynich manuscript experiment… but we shall see! Interesting times…
Cheers, ….Nick Pelling….
PS: my preferred organic chemical experiment is mixing gelatin with sugar, flavourings, colourings, and a dusting of starch to make jelly babies: but they never quite seem to last long enough to get a radiocarbon date. 🙂
Dear Edith,
there are lots of explanations related to the C-13 treatment in the
Vinland map paper, but as a non-expert, I also cannot fully
understand the impact of that.
Of course they are aware of the nitrogen problem, and it was
explained to me how this is taken care of but I have to admit that
I don’t remember 🙁
What is unfortunate, is that they speak about 1 SD and 2 SD values
for the calibrated results, because more correct would be to call
them 68% and 95% ranges. This is the terminology used
for the Voynich MS results. As the results you describe in point 4
show, the curve is not symmetric here, and clearly not gaussian.
The 95% range always includes the 68% range (it would be a real
problem if not 😉 ) but the resulting ‘mean’ is usually not the same.
The Table 2 is very intersting in itself, as it compares C-14 dates
of a number of MS’s with the known creation
dates of these.
It is not clear when these tests were done and based on how many
samples they were done each, but they clearly show the problems
with dating items of 1450 or later, and for early 15th C items, where
the curve also overlaps with most of the 14th C. A close look at
the Voynich MS documentary also shows this to be the case for one
of the four samples.
Nick, there is only so much one can gain by ‘improving’ the
measurement accuracy. The calibration curve has its own error
distribution, and even a perfectly measured sample leads to a
calibrated date with an error distribution which could be quite complex
in shape.
What we’re left with is our uncertainty how the combined SD of
0.0033 was computed for the Vinland map. To me it also looks
optimistic but without further details I would hesitate to make
any more specific statements about it.
Fortunately, this problem does not exist with the Voynich MS
analysis.
Rene: it doesn’t matter how much can be gained by ‘improving’ the measurement accuracy if the person taking the measurements thinks that the existing measurements are already good enough. 🙂
I now understand the reason for the delta 13C per/mil correction required to correct 14C data. It is as follows:
1. Atmospheric carbon dioxide consists of 98.9% and 1.1% 12C and 13C respectively. Both isotopes are stable. Only 14C is unstable and is present in minute amounts in atmospheric CO2.
2. Carbon’s 12, 13 and 14 have different atomic weights of 12, 13 and 14 respectively.
3. Plants have a slight preference for the lightest carbon, 12C. As a result the measured concentration of 14C in a plant is slightly less than the concentration in the air. The preference plants and animals have for 12C over 14C is twice as big as the preference for 12C over 13C. This is known as isotopic fractionation.
4. Plants have different amounts of the three carbon isotopes, depending on species and climatic conditions. Animals likewise, depending on their diet and metabolism, will have different amounts of these isotopes in their bones and tissue.
5. If the sample is 1% depleted in 13C relative to 13C in carbon dioxide in the air, it will be 2% depleted in 14C.
6. If only a 14C measurement is made, the sample will appear to be older than it actually is. For example typical herbivore bone gives dates about 80 ±35 years too old, unless a correction is made for isotope fractionation. The delta 13C measurement is used to make this correction. http://bruceowen.com/introarch/32402f05.htm
7. The 14C error is corrected by measuring the amount of 13C/12C in the sample and comparing it to the amount of 13C/12C in a standard (equivalent to atmospheric carbon dioxide). The result is adjusted to correspond to average wood, on which all 14C dating is based. The correction is independent of the age or origin of the sample.
8. A delta –20 13C per/mil value means that a sample contains 980 parts of 13C compared to a standard containing a 1000 parts of 13 C. This sample is 20 parts per thousand low in 13C.
An excellent discussion related to the above and also to AMS 14C measurements is available on the following web site: http://bruceowen.com/introarch/32402f05.htm.
Sorry Nick, I can see readers eyes glazing over. In order to understand the errors involved, both systematic and random, in 14C dating, it is necessary to understand the analytical process.
The standard deviation estimate available from the 5 reported measurements of the F values for the Vinland Map is +/_ 0.011 SD. This result is determined using standard statistics, not pessimism, and is supported by both Rene and my husband, a well known geophysicist and mathematician. Additionally, the five measurements were made over a period of 6 years, with different methods being used to clean the samples each time, so it is even more unclear how this would affect the statistics.
To sum it up, I will end with the observation made at the end of Bruce Owen’s article:
‘Even with the best of dates, drawing the right conclusions depends on …………. understanding the uncertainty and statistical issues involved.’
Hi Edith,
Thank you very much for that – though I’ve seen all the same issues discussed at a fairly high level elsewhere, the Bruce Owen page manages to cover all of them in a usable way. The short version of it (for Voynich researchers) is that for any given sample, we would need to know the U of A’s values for 12C and 13C as well as for 14C in order that we can see how the subtle isotopic fractionation correction to 14C is being applied.
However, I still stand by my neither-pessimistic-nor-optimistic position vis-a-vis merging the VM statistics. If the five Vinland Map processes were truly independent, then I would of course agree that it would be possible to merge them into a single more confident statistic. But I don’t think we have even close to enough information to say whether (for example) two or three of the five processes just happened to have exactly the same net chemical effect. This is a matter less of statistics than of confirming the principle of independence upon which statistical sampling relies. If the authors had identified (say) 30 different processes and picked five at random, that would probably be stronger – but that didn’t happen.
My old statistics lecturer used to say that “statistics begins at 30” (i.e. 30 independent samples): and I agree, as long as you respect both the number and the independence criteria.
Cheers, ….Nick Pelling….
Dear Nick,
Your comment about increasing the number of independent samples set my husband and myself thinking about how best to evaluate the U. of Arizona’s analysis of the VM. We have produced a technical paper which my daughter has posted up on my web site: http://www.edithsherwood.com/radiocarbon_dating_statistics/index.php
Your site is not suitable for tables, other wise I would have used it. I will be interested in your and Rene’s comments. I feel that John and I have performed a careful and accurate investigation of the errors involved in carbon 14 dating.
The U. of Arizona should be encouraged to supply:
1. The uncalibrated results.
2. The procedure used to clean the parchment.
3. Any and all corrections made to the results.
4. The overall error of their measurements.
5. The statistics used to calculate their errors.
6. The calibration curve used to determine the age of the VM.
7. Their offset (systematic error) wrt this calibration curve.
Best wishes,
Edith Sherwood
Edith: thanks for passing that on, it’s much appreciated. All very sensible stuff, but in the absence of the VMs’ raw radiocarbon data (and the methodology/-ies used to get them), it’s reminds me somewhat of building a submarine in the middle of the desert – so let’s all hope there’s a flood of data soon!
For me, one significant open question is whether the U of A did or did not really strip the samples back before testing them. What strikes me is that the VMs’ modern history is somewhat unusual insofar as it seems to have spent most of the 1940s, 50s and 60s deep inside a New York bank’s safe, which may have left it less exposed to atmospheric contamination than other comparable artefacts, altering the basic stats. Just a thought… so many things to take into consideration when assessing the reliability of radiocarbon, eh? 😮
I have added another article on radiocarbon dating to my web site.
Edith: thanks, it’s another good contribution – we now have so much critical machinery to hand, but the paper detailing the Voynich tests remains unpublished (or perhaps even unwritten). How long can they string this out?
What I find most curious is the startling coincidence that both the Vinland Map and the Voynich date to mostly the same time… in fact they overlap by 15 years. So I did a search to see if anyone suspected a further connection… and I found that in fact, yes… in 2005 it was mentioned (on the BBC special, Nick?) that Wilfred was mentioned as a possible forger of the Vinland map. I would love to know more… on what basis was this claimed/rumored? I mean, this was four years before the Voynich date was known. I wrote a post on it, musing on some fanciful implications… but still, wondered what your thoughts were on this dating overlap. Strictly coincidence? Rich.
http://proto57.wordpress.com/2011/02/26/something-sheepy-in-the-state-of-denmar/
Rich: there are strong, errrm, processual similarities between them, by which I mean that historical investigation into both seems to be following very similar paths. In time, I hope we’ll find out how similar they actually are…
Another similarity, if I’m not mistaken, is that both were found to have some level of zinc as a contaminate (probably original) in their ink.