Raising the bar on statistical significance

I was searching the early edition of PNAS for the abstract of yet another sloppy “science by press release” that didn’t bother to give the the title of the paper or the DOI, and came across this paper, so it wasn’t a wasted effort.

Steve McIntyre recently mentioned:

Mann rose to prominence by supposedly being able to detect “faint” signals using “advanced” statistical methods. Lewandowsky has taken this to a new level: using lew-statistics, lew-scientists can deduce properties of population with no members.

Josh (N=0) humor aside, this new paper makes me wonder how many climate science findings would fail evidence thresholds under this new proposed standard?pvalue_curve

Revised standards for statistical evidence

Valen E. Johnson

Significance

The lack of reproducibility of scientific research undermines public confidence in science and leads to the misuse of resources when researchers attempt to replicate and extend fallacious research findings. Using recent developments in Bayesian hypothesis testing, a root cause of nonreproducibility is traced to the conduct of significance tests at inappropriately high levels of significance. Modifications of common standards of evidence are proposed to reduce the rate of nonreproducibility of scientific research by a factor of 5 or greater.

Abstract

Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25–50:1, and to 100–200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

From the discussion:

The correspondence between P values and Bayes factors based on UMPBTs suggest that commonly used thresholds for statistical significance represent only moderate evidence against null hypotheses. Although it is difficult to assess the proportion of all tested null hypotheses that are actually true, if one assumes that this proportion is approximately one-half, then these results suggest that between 17% and 25% of marginally significant scientific findings are false. This range of false positives is consistent with nonreproducibility rates reported by others (e.g., ref.5). If the proportion of true null hypotheses is greater than one-half, then the proportion of false positives reported in the scientific literature, and thus the proportion of scientific studies that would fail to replicate, is even higher.

In addition, this estimate of the nonreproducibility rate of scientific findings is based on the use of UMPBTs to establish the rejection regions of Bayesian tests. In general, the use of other default Bayesian methods to model effect sizes results in even higher assignments of posterior probability to rejected null hypotheses, and thus to even higher estimates of false-positive rates.

This phenomenon is discussed further in SI Text, where Bayes factors obtained using several other default Bayesian procedures are compared with UMPBTs (seeFig. S1). These analyses suggest that the range 17–25% underestimates the actual proportion of marginally significant scientific findings that are false.

Finally, it is important to note that this high rate of nonreproducibility is not the result of scientific misconduct, publication bias, file drawer biases, or flawed statistical designs; it is simply the consequence of using evidence thresholds that do not represent sufficiently strong evidence in favor of hypothesized effects.

=================================================================

The full paper is here: http://www.pnas.org/content/early/2013/10/28/1313476110.full.pdf

The SI is here: Download Supporting Information (PDF)

For our layman readers who might be a bit behind on statistics, here is a primer on statistical significance and P-values as it relates to weight loss/nutrition, which is something that you can easily get your mind around.

Gross failure of scientifical nutritional studies is another topic McIntyre recently discussed: A Scathing Indictment of Federally-Funded Nutrition Research

So, while some dicey science findings might simply be low threshold problems, there are real human conduct problems in science too.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
219 Comments
milodonharlani
November 12, 2013 1:32 pm

TheLastDemocrat says:
November 12, 2013 at 11:23 am
In the long run, science depends upon making an a priori prediction, specifying a disprovable test of that prediction, then gathering that actual evidence to observe whether the prediction is accurate or not. We do not know what will happen in the future. A predciton can be based on some good science, but a prediction of what the future will be like can never be observable – until it happens, at which time it is no longer the future. So, patently, a prediction cannot quite ever be “confirmed,” or “scientiftic” in the way other things can be scientific facts.
Beyond micro-evolution, evolutionalry theory suffers the same weakness. We will never see the cow-like animal again adapt itself back to underwater life as the whale-like animal. Evolution of the various species makes sense, and has a lot of evidence to support it, but it is not observed. If our species pays attention and keeps record long enough, say, a million years, sure, we might observe a new species emerge from recognized species. But we have not yet.
Jquip says:
November 12, 2013 at 12:51 pm
M Simon: ” Bacteria come to mind. From this the rest is inferred. Some do not care for the inference.”
You missed the point in that we haven’t observed bacteria turn into bivalves. And we will not reunobserve bivalves going backwards through time into bacteria. As we go through time in the other direction. It is not testable, and in a condition of passive observation that requires time machines, impossible. eg. When it you can’t slap it up on a lab table, on demand, the best you get is passive observation. And in this, unless you have time machines, time goes one way and its pace; not ours.
People get a rather religious burr about it when you mention the ‘E’ word from the great Chuck D. But it remains that anything that is inherently stateful and chaotic, and that cannot be built on demand in a lab, has no valid inferences but an absolute crapload of observations. Simply look at the condition of astronomy and the ages of time it took to get from rather impressive Neolithic devices for measuring the heavens on to about Galileo. And that’s for something as simple as a first-order approximation of an ellipse. This remains true in every similar case, even for treemometers and IPCC models.
————————–
I don’t know how the myth persists that speciation has not been observed. It has repeatedly, both in the lab & in the field, as well of course as inferred from the fossil & genomic record, plus many other independent lines of evidence. The macro-evolution of new species, however defined, from other species is an observation, not just an inference, ie a scientific fact.
The instances are too many to recount all here, or the correct predictions made on the basis of evolutionary theory, which is of course itself still developing, just as is the theory of gravitation, for instance.
In the case of microbes, simple mutations can produce a new species. Good example are the two not at all closely related bacteria which have acquired the ability to metabolize nylon by-products. It’s harder to say what constitutes a species with bacteria than with eukaryotes, but a switch from getting energy from sugar or other natural substances to nylon byproducts in my book counts as speciation, just as insects evolving from eating sugar to blood does.
New plant species arise essentially overnight due to hybridization & polyploidy, which is less common in animals, but has also been observed in the wild & recreated in the lab. “Darwinian” evolution, ie by natural selection, has also been observed in animals, & while gradual, can still be quite rapid (eg two decades), even in species with fairly long generation times. As luck would have it, Darwin’s Galapagos finches have recently been found to provide examples:
http://en.wikipedia.org/wiki/Peter_and_Rosemary_Grant
Gradual speciation taking place over longer periods can still be directly observed, as with the ongoing separation of polar bears from their brown bear kin.
As for whales evolving from terrestrial artiodactyls (much smaller than cows), the fossil, anatomical & genomic evidence is overwhelming. While, despite an excellent & improving fossil record exists, some inference is called for, but such was also the case for the earth orbiting the sun until direct observation was possible. The same applies to other major transitions, not just speciation but changes from one higher Linnaean or common classification to another, eg from “fish” to “amphibian” to “reptile” to “bird”.
Evolution is both an observed, scientific fact & a body of theory seeking to explain that fact.

November 12, 2013 1:38 pm

is this method similar to what the epa did with the meta study on second hand smoke? the studies found NO significant stat correlation so they altered what is significant to claim there was??

Steve D
November 12, 2013 1:41 pm

I disagree with the blanket conclusion in the article because:
1) The researcher sets the p-value. It determines the level of false positive error he is willing to allow into the study.
2) Selecting an appropriate p-value for hypothesis testing depends on several factors including the objective and conditions of the study. For example, we can and should run screens at much lower stringency since you need to eliminate false negatives and are willing to accept more false-positives.
3) A p-value of 0.0001 or 0.2 can be quite appropriate depending upon the objective of the study. In either case, it should be possible to reproduce the study under the right set of conditions.
4) The power and the level of false negative error rate is the critical factor once we have selected an appropriate p-value.
5) Understanding critical sources of error in the study, taking steps to control them, including appropriate sample size (reps etc), are as critical for reproducing results as the p value.
Statistics is a tool. It can be a very good one if we use it appropriately and a very bad one if we do not. Kind of like a wrench or backhoe or just about any other tool.

Steve D
November 12, 2013 1:51 pm

‘Some do not care for the inference. They may (or may not) have a point.’
It is not an inference; it is just that in cases of slowly reproducing species we can observe only smaller changes over the same amount of time. The extent of evolution directly observable is less.
The question is not whether species evolve. By their very characteristics as living organisms, they must and they do. It is a mathematical certainty. The question is whether evolution explains all of the biological variation we see around us.

Gail Combs
November 12, 2013 1:51 pm

tumetuestumefaisdubien1 says:
November 12, 2013 at 11:40 am

Gail Combs says:
November 12, 2013 at 9:53 am
“Taking ferd berple’s example of drug testing.”

Drug testing of employees is not science and anyway a drug test which has 95% confidence is a screening cheap crap which anyway needs retest in case of positive in any case.
[Agreed, I was just using it as an example.]

“In science it is independent replication usually by at least two independent labs. If it can not be replicated it gets tossed into the dustbin of history.
I see no reason to change this but the confirmation testing by an independent lab or two or three is an absolute must and this is what is missing in Climate Science.”

Confirmation is matter of church, not science.
>>>>>>>>>>>>>>>>
I disagree, but then I worked in industrial chemistry. If a new reaction or method can not be duplicated by others it gets tossed. Think Fleischmann, Pons and Cold Fusion.
I should not however have used ‘confirmation’ when I meant verification and validation.

Jimbo
November 12, 2013 1:55 pm

Here is a fascinating article about genuine scientists producing results which are initially reproducible, then later are not!

The New Yorker – December 13, 2010
The Truth Wears Off
Is there something wrong with the scientific method?
by Jonah Lehrer
Unfortunately, I couldn’t find the effect,” he said. “But the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” For Simmons, the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory….”
[Page 3]
“…The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it….”
[Page 4]
“…Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.)….Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true.”
[Page 5]

Gail Combs
November 12, 2013 1:58 pm

tumetuestumefaisdubien1 says: November 12, 2013 at 11:40 am
…The point of scientific method is falsifiability (a hypothesis which is not falsifiable, is not scientific hypothesis by definition). If a claim, for example “man is responsible for global warming, because we made model XYZ123, tuned it and it shows he is and we predict temperature rise with it this and this” is a dangerous claim in science – because immediately the reality shows that the temperature is significantly not following the model, the model is falsified and belongs to dustbin of history. You actually don’t need anything like “confirmation” for it.
>>>>>>>>>>>>>>>>
Actually after the examples we have seen in climate science I think you do need independent checking . Actually after what I have seen go on in industry I KNOW you do.
However you are correct that falsification is the main point. I was just saying it badly.

Jquip
November 12, 2013 2:00 pm

milodon: “I don’t know how the myth persists that speciation has not been observed.”
Yeh, you know. I didn’t say ‘speciation’ and detailing all the sophist redefinition and name-changing about ‘evolution’ and ‘species’ is about as thoroughly voluminous as the same treatments for Anthropogenic Global Warming, I mean Climate Change, sorry, I mean Climate that isn’t changing. And is not changing due to Natural Causes er… Water vapor feedbacks, nuts. I mean it’s been proven that it’s humanity. So fast… I mean it’s obviously the Koch brothers. And not only is it as stupid and voluminous; it’s almost guaranteed to be religious (like Cllimastrology) but off topic. But hey, if you can’t make your error bars big enough, undefine your words.
Point is, inferences that go backwards in time are in every case invalid. It’s pure pro causa non causa. And the only validity to an inference that goes forward in time is that we have validated it enough — observationally — that we can get it some inductive assurety in future uses.

Duster
November 12, 2013 2:06 pm

polistra says:
November 12, 2013 at 4:28 am
Stats can’t be saved by tweaking.
The only proper standard for science is NO STATS. If a result has to be reached by statistics, it’s not a scientific result. Only well-calibrated measurements plus a well-balanced experiment can give scientific results.

SO, was the Higgs Boson identified or not?
There are no – as in “none” – scientific disciplines whose study is not entwined with statistics in some fashion. ANY time the word “quantifiable” emerges, statistics is lurking. Ideally, the issue is merely accuracy of multiple measurements, as in calculating the “real” value of G – the gravitational constant – an empirical constant, that is dependent upon repeated measures for a estimated value. Since the use of G is essential in critically important modern topics like orbital mechanics, statistics measuring accuracy or similarity of repeated measures is critical. In fact, “well calibrated” measurements are only determined to be “well calibrated” statistically. If a standard is used in the calibration, that standard was supplied with an “error.” Read up on the Michelson-Morley measurement of the speed of light. They used repeated measures, and then estimated the speed of light statistically, with an estimated error based upon the dispersion of measurements from multiple experiments. As described in Wikipedia, for the current value in use:
By combining many such measurements, a best fit value for the light time per unit distance is obtained. As of 2009, the best estimate, as approved by the International Astronomical Union (IAU), is:[88][89]…
light time for unit distance: 499.004783836(10) s
c = 0.00200398880410(4) AU/s = 173.144632674(3) AU/day.
The relative uncertainty in these measurements is 0.02 parts per billion (2×10−11),
Note the use of “best estimate,” “best fit,” and “relative uncertainty” – which is quantified. Statistics again. That is purtedly “hard science.” In the real world, compared to the complexities of any field science including meteorology, measuring the speed of light is a comparatively simple effort.

Duster
November 12, 2013 2:09 pm

Second to last sentence should begin, “That is purportedly …”

DirkH
November 12, 2013 2:56 pm

M Simon says:
November 12, 2013 at 11:21 am
“And what are the odds that the body is “trained” by early consumption? How big is that effect? Is it an effect?
Are cyclists representative of the general population? ”
I don’t think there’s a training effect by early consumption. Maybe some epigenetic inheritance, gene methylation patterns determined by the mother during pregnancy; I guess this also influences body size, just a guess.
In Germany and the Netherlands cyclists are representative of the general population, even though the Dutch won’t believe me; they also believe that it is mandatory to wear a bicycle helmet in Germany, but it ain’t.
Well, I thought you might have THE conclusive study that shows the link between fat and heart attacks but I guess I’ll have to wait then…
What I recently found is this:
David Diamond, Ph.D., of the University of South Florida College of Arts and Sciences shares his personal story about his battle with obesity. Diamond shows how he lost weight and reduced his triglycerides by eating red meat, eggs and butter.

Zeke
November 12, 2013 2:57 pm

“Using recent developments in Bayesian hypothesis testing, a root cause of nonreproducibility is traced to the conduct of significance tests at inappropriately high levels of significance.”
Oopsie daisy!

milodonharlani
November 12, 2013 3:32 pm

Jquip says:
November 12, 2013 at 2:00 pm
What redefinition? Speciation means the origin of species, same as in the title of Darwin’s book, which dealt with the evolutionary process of natural selection, one means of speciation. Call it macro-evolution if you want. It’s not only based upon inferences from observations of the past, but, as I showed, upon direct observation in the present & in the lab. I’ve made new species myself. It’s fun & easy.
Both the fact & theory of evolution make testable hypotheses, which have not been falsified. That new species, genera, families, orders, classes & phyla arise from existing taxa is an observable fact as well as an inference that explains observations best. Comparing evolution with CACA is what Warmunistas do. Descent with modification, ie evolution is valid science, describing objective reality. CACA is anti-scientific.
I’d be interested in your better explanation for such observed phenomena as shared, derived pseudogenes in, let’s say, primate lineages such as humans, other great apes, lesser apes, Old World & New World monkeys & tarsiers. Or how about the fact that human chromosome 2 results from the fusion of two smaller chromosomes conserved in chimps & bonobos?

November 12, 2013 3:59 pm

JEM on November 12, 2013 at 11:25 am
Whitman on November 12, 2013 at 9:15 am

– – – – – – –
JEM,
Thanks for your comment.
I think ‘helicoptered in’ has some plausibility given that the nature of peer review does not prevent it per se. If peer review process documentation were publicly available after the peer review process completes, then science could correct itself more efficiently than it currently does.
John

jcarlton
November 12, 2013 4:15 pm
rgbatduke
November 12, 2013 4:19 pm

10 thousand Olympic athletes are tested for drug use with a test that is 99.99% accurate. what are the odds that an athlete identified as a drug user is a drug user?
50-50, even odds. One (very likely) true postive, one false positive. One reason that physicians should not give everybody that walks through the door a full spectrum of tests, especially for low prevalence conditions. If prevalence is one in 10,000, and you test everybody with a test that has a 1% false positive/negative rate, (99% accurate, as it were) you’ll get 1% of 10,000 or 100 false positives and one true positive 99% of the time (you’ll actually MISS that true positive 1% of the time, sigh).
Nearly all of the 100 people identified will be false positives, and you’ll have to do infinitely more tests, spending infinitely more money, to weed out the false positives and find the one lonely true positive. Not enough insurance money in the world… not to mention spending weeks thinking you might have brain cancer when you don’t. That’s why physicians have to have OTHER reasons to suspect a low-prevalence condition before they give you a test for it. The other reasons “promote” you from a Bayesian prior of randomly selected from a low prevalence base population to a much more refined population of people that actually exhibit symptoms for the disease AND have the disease, which ordinarily will have a much higher prevalence. Maybe enough to get to even odds.
And good examples, BTW, especially the data dredging example. I didn’t do this (prevalence) particular example and probably should have. I also like the “Let’s make a deal” example to illustrate why additional information changes the prior odds.
On the Lets Make a Deal show, one would often select (say) door number three in hopes of winning the Carribean Vacation package. Before showing you what was behind door number three, Monte opens one of the remaining two doors (say, door number one) to show you that it contains the barbecue set worth a whopping $150. He then asks if you want to change doors to number two before he opens them.
If Monte either always does this, or randomly decides to do this without being influenced by whether or not door three contains the big prize, the answer is “yes”. You improve your chances of winning if you switch to door number two, because you can take advantage of the additional knowledge associated with knowing that door one didn’t contain the prize in the second pick. Proving this is an entertaining exercise. But kind of irrelevant to hypothesis testing…
rgb

Jquip
November 12, 2013 4:33 pm

milodon: “I’d be interested in your better explanation for …”

Stop there, Sparky. I am not required to give you any explanation at all; let alone one that is ‘worse’ or ‘better’ by your purely subjective judgement. Nor was the big E from Chuck D central to the point: It is, purely, a passive observational. And it is quite required to be, since if we monkied about with intelligent purpose that would be… the theory of its detractors.
If you wish to attack the salient points then:
a) Establish how one can make a valid inference about things we never observed and cannot.
b) Establish how one can make a valid inference when nothing is claimed, or no time is claimed for its culmination. (cf: The 17 217 years of Santer. Global Warming entails heating cooling and more less severe weather events.)
c) Establish how we can state what will occur, in a chaotic system with feedbacks. When those feedbacks are not known, not described, or not describable.
And I guarantee you this: If you succeed, in any measure, establishing the validity of these as unimpeachable, then so to is Climate Change. And just about every religion that has ever existed. As well as the totality of psychic and other esp claims.

November 12, 2013 4:37 pm

Gail Combs says:
November 12, 2013 at 1:51 pm
“I disagree, but then I worked in industrial chemistry. If a new reaction or method can not be duplicated by others it gets tossed. Think Fleischmann, Pons and Cold Fusion.
I should not however have used ‘confirmation’ when I meant verification and validation.”
I’m quite not sure Cold fusion has anything with chemistry or rather even whether it has to do with anything. Verification in science purely means to inequivocally set criteria for a hypothesis or theory to be true or false and then test whether the hypothesis or theory passes the criteria or not. Validation means assesment whether a scientific method is suitable and adquate for a defined purpose or not or in the case of mathematical modeling, whether the output of the model fits the observation or not and quantification of difference for deciding whether the model is or isn’t falsifiedl. So I quite still don’t completely understand what you’re talking about.
Independent replication of some scientific results is of course good if somebody does it, but somebody tries something like that only if the original claim is plausible, well supported and the independent researcher has a motivation to look into the issue. Otherwise nobody of course bothers. Definitely not in basic research.
For example: some credible reputable solar scientists published a claim that according to recent data from certain satellite some bands of solar spectrum vary not in phase with solar cycle. And because it looked interesting and plausible when one briefly checked the data and could have interesting implications if true I decided to replicate their results. And after month or so of work I indeed found that in some important bands indeed the solar activity consistently doesn’t variate in phase with solar cycle, moreover I’ve found additional facts, which have likely broad implications and aren’t in any sense part of the original claim – which paradoxicly all relates to the fact, that in other band, where they also claimed the same peculiar variation I’ve found that the data contain clear outliers which spoil the slopes and when taken out the peculiar variation artifact disappears. So partially I replicated some of the original results and partially I also falsified other, but the combination of the sucesful replication of one claim and failure to replicate the other together imply something yet different which is worth of further examination, because it not only means the replication of one and falsification of other claim, but in complex could mean a potential falsification of whole theory of the solar variation and its relation to earth insolation. Science is often history of accidental finding of something when looking for something else, because simply what we don’t know we can’t anticipate to find. So often also the falsification of some previous claim means not only tossing it in the dustbin of history, but also finding something else, sometimes much more interesting that what was falsified and that’s why even it could look destructive for a layman the principle of falsification in science is the way to the truth not only byl elimination of untrue, but also finding the new implications of it not be true.

Brian H
November 12, 2013 4:50 pm

I have been expressing my contempt for years at the use of .90 and .95 “certainties” in CS (or anywhere). The reasons for requiring high-sigma results to reject the Null are many, and demonstrate their relevance with great regularity. Confirmation bias seems to be the currently dominant one.

Jquip
November 12, 2013 4:50 pm

rgbatduke: ” But kind of irrelevant to hypothesis testing….”
Contrary, in fact. As you already know how many doors, how many goats, and how many sports cars there are. But in the false positive condition you’re asking for people to be properly analytical rather than improperly analytical. But this still doesn’t inform you of whether or not you had zero drug users in fact, when you found 2 or 3 by test. After all, if the suspecting people aren’t terribly good at it, then they’ll only test the drug-free in the first place.
Quite strictly,there’s nothing wrong with hypo testing by statistics. But you cannot be demonstrating it on edge cases, outliers, or miracles. Or you’re simply lost in every case.

Max
November 12, 2013 5:07 pm

Statistics is merely a tool that is used when you don’t know. It too often gives the perception that you do know. Most assumptions in statistics are false. They are convenient, but don’t match reality. Most distributions in the world are not Gaussian, heck, most often they are multi-modal. There is not such thing as “standard deviation”. It’s a false construct based on gross assumptions known to be false. Few things in this world are linear, or merely log functions, but statisticians presume this quite often. There are many uncontrolled variables, measurement errors, noise, non-linearity, multivariate effects and interactions, bias, feed-back loops, associated chaos, etc. When someone says something is statistically significant what they are really saying is “I don’t know, but I’ll pretend that I do, and try to convince you I’m right.”

Trey
November 12, 2013 5:42 pm

Deirdre McCloskey has been decrying the Cult of Statistical Significance for decades. Here is her book:
http://www.amazon.com/The-Cult-Statistical-Significance-Economics/dp/0472050079

November 12, 2013 5:49 pm

You missed the point in that we haven’t observed bacteria turn into bivalves.
That wasn’t my point. We have seen bacteria change into a different kind of bacteria with different environmental requirements. From that we infer the rest.

milodonharlani
November 12, 2013 5:49 pm

Jquip says:
November 12, 2013 at 4:33 pm
I didn’t require you to do anything. Nor do I find all your points salient. But I will nevertheless reply to the questions you put to me,
“a) Establish how one can make a valid inference about things we never observed and cannot.”
Evolution makes valid inferences about observations. Paleontologists & anatomists for example observe that Indohyus, an herbivorous, cat-sized, deer-like artiodactyl in the family Raoellidae from the early Eocene of Kashmir, shared derived traits with cetaceans. Among these is a bone growth pattern diagnostically characteristic of cetaceans, not found in any other group. In cladistic phylogeny, that makes the Raoellidae the closest sister group to Cetacea, descending from a common ancestral group. Indohyus also shows signs of aquatic adaptations, including a thick, heavy outer coating & dense limb bones that reduce buoyancy to facilitate staying underwater, adaptions found in the hippopotamus. This suggests a similar survival strategy to modern mammals which dive into water & hides submerged for some minutes when threatened by a bird of prey.
To these observations have been added those of Pakicetus, another latter early Eocene genus from northern Pakistan, found by a river then near the shore of the Tethys Sea, & also fluvial deposits from northwest India. Thus Pakicetids probably lived in an arid environment with ephemeral streams & moderately developed floodplains. Analysis of their stable oxygen isotopes shows they still drank fresh water. Although hoofed, they appear to have been predators, eating land animals which came down to the water to drinking or riverine aquatic organisms.
While certain anatomical features show them to be cetaceans, their elongated cervical vertebrae & four, fused sacral vertebrae are consistent with Artiodactyla, making the Pakicetidae one of the earliest fossils recovered following the Cetacean/Artiodactyla divergence event.
Features of both their skulls & post-cranial anatomy lead Pakicetids to be classified as cetaceans. An important trait is the structure of their auditory bulla, which is formed from the ectotympanic bone only. The shape of the ear region in pakicetids is highly unusual & the skull is cetacean-like, although a blowhole is still absent at this stage. Their dorsal orbits (eye sockets) face up), as in crocodilesm which eye placement helps submerged predators observe potential prey above the water. Pakicetid teeth also resemble those of fossil whales, being less like a dog’s incisors, with a serrated triangular shape, & more similar to a shark’s tooth. However, pakicetids were able to listen underwater, by using enhanced bone conduction, rather than depending on tympanic membrane like general land mammals.
Their bones are unusually thick (osteosclerotic), which is probably an adaptation to make the animal heavier to counteract the buoyancy of the water. Morphological analysis found that pakicetids display no aquatic skeletal adaptation; rather are adapted for running & jumping. Thus they were most likely an aquatic wader.
It is normal in science to make inferences based upon observations, as did Copernicus to develop his heliocentric hypothesis. Paleontologists study the anatomy of extinct creatures, then, based upon their observations, make predictions which can be tested. I wonder why you think that observations of fossils don’t count while observations of light from stars millions of light years away do.
The discoveries of paleontologists are reinforced by those of molecular biologists & geneticists studying the genomes of whales, hippos, artiodactyls (even-toed ungulates) & for comparison, other mammals. Paleontologists, based upon prior observations, predict where & in what rock strata to dig to find more early whale fossils of more recent date, & have been successful doing so.
The same is true for every class, order, family & genus put to comparable tests. For instance, paleontologists predicted that proto-mammals would be found with both the “reptilian” & mammalian jaw joint, & sure enough, they were. (Mammals are unique among vertebrates in having only a single lower jaw bone, the dentary. The other former jaw bones have evolved into the mammalian middle ear.) It’s the scientific method in action, unlike anti-scientific CACA.
Of course, as I pointed out previously, today we don’t need to rely on inferences to observe evolution in action. Macro-evolution is the same process as micro-evolution, just run for more time in the case of gradual evolution. There is no magic genetic barrier that keeps one species from evolving into a new one, given selective pressure to do so or simple reproductive isolation. The processes can be & have been observed all around us.
“b) Establish how one can make a valid inference when nothing is claimed, or no time is claimed for its culmination. (cf: The 17 217 years of Santer. Global Warming entails heating cooling and more less severe weather events.)”
Why would I want to establish such a thing? I think CACA advocates’ constant moving of goal posts is yet another indication that they don’t practice science. I don’t see what this has to do with evolution.
“c) Establish how we can state what will occur, in a chaotic system with feedbacks. When those feedbacks are not known, not described, or not describable.”
See above.

Jquip
November 12, 2013 5:57 pm

M Simon: “We have seen bacteria change into a different kind of bacteria with different environmental requirements.”
My eye color, hair color, and build are different than my parents. Obviously I’m a new species. And again: Not the point. But it does highlight that getting too loose or strict with a bounded range leads to absurdity.