On Experimental Methodology

Scientific experimental methodology often conflicts directly with the intuitive or holistic approach to understanding truth because the scientist intentionally attempts to undermine the legitimacy of claims. The scientific method uses deductive logic to test individual variables to address the accuracy behind a given claim in an infinitely reiterative attempt to reach an unobtainable truth. “We certainly can’t believe ingenuously that a scientific theory, I mean in the field of natural sciences, could be something like ‘true’. Not because of some radical skepticism toward the sciences, but rather by virtue of the very process of science. In the course of its history, this process showed and extraordinary inventiveness in ceaselessly destroying its own theories including the most fundamental ones, replacing them with paradigms whose novelty was so extreme that nobody could anticipate the beginning of their configuration” (Meillassoux 2014). The scientific method in this sense is autophagous even cannibalistic at times because it seeks a more precise approach which limits variation to eliminate observations which cannot be reliably observed.

During optimal design of a scientific study, the experimenter is searching for holes or inconsistencies in a theory or commonly held observation to better understand some phenomena. Attempts to understand the nature of reality which rely entirely on intuition without attempting to falsify any claims often lack depth of understanding because they preclude hypothesis testing in favor of faith-based acceptance of claim. Again, Descartes writes, “As for the false sciences, I saw no need to learn more about them in intellectual self-defence: I thought I already knew their worth well enough not to be open to deception by the promises of an alchemist or the predictions of an astrologer, the tricks of a magician, or the frauds and boasts of those who profess to know more than they do” (Descartes 4). In this case, a “false science” would be an explanation of the natural world which ignores data deduced from scientific experiments. Take for example, an individual that declares themselves an ideological supporter of homeopathic medicine and joins an anti -vaccination movement championing the belief that vaccines cause Autism Spectrum Disorder (ASD), a claim that they trust intuitively. The individual in this example reasons that the symptoms of ASD occur in infants after they receive a vaccine, thus it makes intuitive sense that there is a correlation between vaccination and ASD onset. You will note that this approach lacks skepticism as it does not seek to test the hypothetical correlation, which is what makes this an intuitive approach lacking in dept. Correlation does not mean causation. Actual scientific studies funded by anti-vaccine groups showed no strong correlation between vaccines and ASD, by rigorously testing this hypothesis (Gadad, B.S 2015)(Hasegawa 2018)(Curtis 2015). Persistent ideological belief based on intuition and emotion that is exempt from scrutiny is considered unscientific, and researchers ought to avoid such approaches when designing studies.

Rather than circumnavigating personal biases, an optimal experimental design ought to include a way to test the presuppositions directly held by the experimenter conducting the study. An ideal researcher will be capable of playing devil’s advocate by stepping into an impartial mindset and attacking their own claims, and then test the validity of such attacks experimentally. For example, if a researcher has strong data to suggest that their new antibiotic is the most effective way to treat an ear infection, then it is incumbent upon the researcher to validate that claim through rigorous testing. Thus, the researcher would benefit from strategically testing alternative treatments that show similar or improved efficacy, cheaper production costs, or potential for reduced side effects. In this sense, good experimental design is like establishing fair competition in sports. Ideally, you want to normalize the playing field to ensure that competitors begin without a drastic advantage beyond participant skill level. Normalization is why there are weight classes in professional fighting as well as distinctions between professional and amateur sports leagues. At the experimental design level, a good researcher seeks to normalize the playing field to allow compete their own hypothesis along with other ideas to identify the most likely explanation.

Typically, effective testing of a hypothesis requires experimental design which focuses on specific variable at the expense of others and therefore it behooves the research scientist to simplify the experimental design as much as possible. Rene Descartes realized early on that personally testing every variable would be unrealistic within a scientist’s lifetime, “Every day increases my awareness of how my project of self-instruction is being held back by the need for innumerable experiments that I need and can’t possibly conduct without the help of others” (Bennett 29). For example, when examining the causal relationship of the gene Apoe4 to Alzheimer’s Disease, the best experiment should attempt to examine the effect of Apoe4 directly by examining the consequences of loss of function mutation (i.e. gene knockout). It may be tempting to include additional variables like increasing the expression of the Apoe4 gene, or knocking out Apoe4 and Apoe2 together, or knocking the gene out at different times but these additional variables can add a great deal of complexity. Each measurement will have a level of uncertainty and the overall likelihood of experimental error and uncertainty will increase with the number of variables tested. Thus, the most straightforward way to test a hypothesis effectively is to identify the key variable involved in the observations and isolate it to limit complexity.

Moreover, the larger the number of variables which are examined in a scientific experiment, the higher the degree of complexity involved in the study design. The simplest experiment will test a single independent variable which avoids increased complexity and includes multiple internal controls. A commonplace example of this increase in complexity is the use of a password used to log into an encrypted webserver. The longer password is more secure it is due to the larger number of characters it has. A brute force attack used to obtain a password is done by re-iterative guessing of every possible character and order and it can modeled by a permutation function where the number of characters in the password string n and the number of repetitive attempts to crack the code is r. Thus, nr can represent the complexity of the password where every character is a variable being tested. A single character case-sensitive alphanumeric password would be correctly identified by the hacker in only 94 attempts or less. However, a 10-character password would have 3,628,800 possible solutions and that would take a great deal of processing time. Consider a similar calculation applied to variables within an experiment where more variables n results in greater experimental complexity. In such an example, the research scientist attempting to crack a chemical or biological code will benefit from reducing the number of experimental repetitions to as few as possible

The greater the degree of variable complexity, the greater the computational timing demands within the experimental design. Adding levels of abstraction or larger numbers of variables in real-time has been shown to increase the amount of processing time to complete a specific task, even for the human brain. For example, reading the list of words: red blue green , orange , yellow is easier than reading: red blue green , orange , yellow aloud due to the incongruency between perceived colors and the meaning of each word in context. This phenomenon is known as the Stroop effect and it causes a delay in reaction timing but the same could be said of added steps required to unpack visual cues for colorblind or dyslexic individuals. Although the emergence of quantum computers has allowed for the solution of computational problems in cryptography which would otherwise be time prohibitive by normal computation, there is still a quantal relationship between complexity and processing time. Scientists attempting to interpret higher degrees of complexity or abstraction within experimental design are likely to incur credit in the form of slower processing time and therefore simplicity in experimental design is an ideal strategy.

As was briefly mentioned, the increase in experimental complexity also increases the likelihood of error. “In science, the word error does not carry the usual connotations of the terms mistake or blunder. Error in scientific measurement means the inevitable uncertainty that attends all measurements. As such, errors are not mistakes; you cannot eliminate them by being very careful. The best you can do is ensure that errors are as small as reasonably possibly and to have a reliable estimate of how large they are” (Taylor 1997). On a long enough timeline, the likelihood of error by chance increases with the probability of error approaching 1. For example, a scientist may measure the length of a bone in millimeters using a ruler that has a standard error of 1mm. However, this same error would be unacceptable if the scientist were measuring a cell of 10 micrometers and thus the researcher must rely on a microscope with a different standard error. Many labs attempt to test multiple variables at a time by employing a very large staff with multiple different methods of measurement. However, this strategy contributes to propagation of error when each uncertainty is carried through to the interpretation of results. Each potential for error can build upon one another to increase the overall likelihood of a false positive or false negative as described in the section on hypothesis formation. Increasing the number of variables increases the complexity, the time for completion, and ultimately increases the likelihood of error for the overall study.

Experimental complexity can also lead to an increased probability of error in the interpretation of results. More complex results tend to obscure the causal relationship between independent variables due to the increased conditions necessary to test each variable. A clear example of this effect is in epidemiological study design on human population with respect to nutrition where it is impossible to control for all variables examined. In such studies, an attempt to identify a simple causal relationship between a proposed independent variable like sugar intake (g) and dependent variable being measured as body mass (kg) becomes difficult to discern. While the experimental design seems simple, interpreting the result can become quite complicated due to many unexpected variables included within the study design, age, sex, exercise level, fat intake, protein intake which may partially explain the perceived causal relationship (or lack of causal relationship) where an individual would eat 10g more sugar and gain 10kg of body mass. At best, many epidemiological studies can make strong correlative claims without a strong underlying causal basis, but at worst they run the risk of falsely identifying a correlation which is more strongly dependent on an underlying unseen variable that the experimentalist has not yet recognized. Experimental complexity is by no means limited to epidemiological studies and any study design which incorporates too many variables runs a higher risk of erroneous interpretation.

Both realized and unrealized experimental complexity can increase the likelihood of erroneous conclusion, and it is necessary to address apparent complexity of a problem prior to experimental design to avoid such pitfalls. While it is important to limit the number of variables being within an individual experiment, it is also important to recognize the limitations of the experimental paradigm being used and anticipate criticisms of anticipated results. Albert Einstein suggests, “Everything should be made as simple as possible, but not simpler” (Calaprice 2011). Many classic experiments involved very simple experimental variables. For example, Otto Loewi originally identified neurotransmitters by taking fluid from an electrically stimulated frog heart and used it to drive beating in a different frog heart without electrical stimulation. Fascinatingly, the chemically stimulated heart responded to the amount of neurotransmitter that was added to it. The experiment showed what was not obvious at the time, which was the contractility of heart muscles was driven by secreted chemical components rather than only by electrical stimulation but left open what chemicals were driving the effect. “In my opinion, these observations prove the onset or onset of the various effects are a function of the concentration of the same vagus substance” [translated from German] (Loewi 1924). Loewi could have made grandiose claims about which components of the vagus substance drove contractility, but this would add the need to specify additional variable for testing and the somewhat broad conclusion is more accurate and less complex to explain overall. Simplicity of experimental design avoids the tendency toward errors in experimental completion and has the added benefit of simplicity in results by limiting the number of variables which the experimenter needs to wrestle with.

Controls also reduce variability by contextualizing results and increasing precision and accuracy of analysis. Although inclusion of controls to a study would seem to add a layer of complexity in experimentation, their presence often simplifies the overall study design by reducing the total number of variables during the analysis stage. The use of internal controls can reduce variable number by eliminating potential causative variables by using an internal reference to compare and normalize values. In the example of the epidemiological sugar study, one way to implement internal control would be to normalize for anticipated variables by only comparing subjects with similar age, sex, body weight, etc. and only allowing sugar intake to vary amongst subject. Although this is often referred to as “controlling for the variables”, it is not a true internal control. In this same example, a true internal control would be a known subject that is positive or negative for sugar intake. For example, a positive control would be a person who only eats sugar and the negative control would be someone who does not eat any sugar at all. Verified controls are especially useful to provide context for interpreting complex results by generating a range at either value extreme which contextualizes the rest of the dataset. The use of positive and negative controls thus can serve as Boolean operators (0 or 1), in which the presence or absence of a potential causal variable can be observed with better scrutiny.

Another important function of internal controls is to ensure precision and accuracy of measurement. The use of certified reference materials as controls allow a scientists to examine the quantitative relationship between two variables with more precision than the Boolean 0 or 1. For example, a patient’s blood sample being tested for methylmercury is often extracted and analyzed with a standard curve because their clinician wants to know how much methylmercury is in their blood and not whether it is merely present or absent. A typical approach would be to include a standard curve consisting many different values from a certified reference material of known concentration prepared by an impartial third-party source. The sample preparations are often diluted to encompass the range of predicted values for a typical patient. For example, if most patient samples contain between 1-100 parts per billion (ppb), then the standard curve might contain 0.1, 1, 25, 50, 75, 100 and 1,000ppb of methylmercury from a certified reference material. A calibration curve surrounding samples assists multivariate analysis by providing a sense of precision of measurement at different concentrations.

The use of experimental controls often provides an important context for the interpretation of experimental results. Controls establish a functional range which extends above and below the sample values for the experimental variables being tested and describe the background levels of analyte present in the sample matrix. For example, in patients’ blood where there is some amount of analyte to be assessed (i.e. between 0 and ∞), there are many different possible values that may be identified. The range of control values ought to be able to detect orders of magnitude above and below the anticipated mean of the sample. With respect to the instrument’s dynamic range, controls define the lower and upper limits of detection (LOD) and thus provide a context for what exactly a “0” or “non-detect” means (Buckingham 458). For example in the curve below; the 0 is considered the LOD, but in some cases an instrument may not be able to distinguish the difference between 10 units and 0 units, and thus the LOD would be 10. The minimum and maximum limits of detection and quantitation of unit concentration and the range of control values must be established experimentally for each analyte. For example, assume that the orange line represents the detection of a specific protein. Note, that perfect linearity of detection is represented by the blue line where the expected value is always measured. However, in the case of orange protein, the linearity is only predictable and consistent between 0 and 600 units and after that point there is nonlinearity in measurement. This result would likely encourage the researcher to dilute their samples within the dynamic range of the instrument for the analyte of interest. The use of experimental controls provides a background for the interpretation of describe a relevant context for the interpretation of values by the researcher.

For some variables tested, the experimental conditions may exhibit a great deal of variance which can be tracked across different analyses using CCVs, duplicates, and replicates. The dynamic range of detection can vary for a variety of reasons including instrument type, temperature, length of experiment, reagent lot/batch number purity, working solution preparation, and many more. The variance inherent to a given experimental analysis can often merit the inclusion of a continuous calibration value (CCV) to assist in avoiding these pitfalls. A CCV will often be prepared in large amounts to allow repeated measures with each analytical run which adds confidence for the researcher interpreting results. CCV’s also provide the added benefit of allowing a sense of the rate of sample degradation, instrumental variance over time, and identification unexpected contamination of a commercial reagent. An experienced researcher may also choose to include duplicate preparations and/or replicate sample analyses to assess variation by the experimenter’s preparation or drift by the instrument over the length of a specific run. A duplicate is when an experimenter prepares a single sample two times for two separate analyses to check repeatability in preparation and analysis. In contrast, a replicate sample with prepared only once, but is analyzed two times to check the repeatability of the analytical instrumentation independently from the preparation. CCV’s, duplicates, and replicates are useful controls and provide context for variability within samples and between samples.

Furthermore, the inclusion of controls allows an audience to assess the data in greater detail when interpreting results independently from the experimenter. Controls provide an experimenter’s accuracy, precision, and bias. “Precision provides a measure of the random, or indeterminate, error of analysis Figures of merit for precision include absolute standard deviation, relative standard deviation, standard error of the mean, coefficient of variation, and variance” (Skoog et al 2005). A researcher does not need to achieve low standard error, standard deviations, or high correlation coefficients to be a successful scientist necessarily. However, a successful scientist must recognize how they affect precision of analysis and what the limitations are of their method of generating data. While precision addresses variability, accuracy relates to how far a measure deviates from the true underlying values. For example, a precise game of darts will have all the darts in a tight cluster somewhere on the board, whereas an accurate shot would be a single dart which hit the absolute center of the bullseye. A cluster of darts by one player to the left side of the target would reveal a leftward bias, which would be important for that person if they are trying to improve their dart game.

In addition to multivariate calibration curve, CCV’s, duplicates and replicates, the use of spike-in controls can also provide important information regarding sample matrix variability. To make a spike-in control, an experimenter will typically add an amount of certified reference material into a sample preparation while blinded to the original concentration of the sample. This is known commonly known as a “spike” or “spike-in” and returns the original values of the sample plus the added reference amount. For instance, adding 10ppb methylmercury to a 40ppb sample should return 50ppb in the final analysis. If analysis returns 48ppb, then the spike-recovery is 80% because only 8/10 of the amount added was recovered upon detection and analysis. Incorporating certified reference materials from a third party helps provide a reliability to the accuracy and precision of measurement and spikes are a great way to test the precision and accuracy of detection.

Common inductive logical fallacies are associated with a lack of empirical testing, a lack of deductive reasoning, and often rely on premature interpretation of limited data. These include hasty generalizations, unrepresentative samples, false analogies, slothful inductions, and fallacy exclusions all of which lead to faulty conclusions. Take the example of hasty generalizations, which tend to draw conclusions from insufficient observations or small sample sizes. For example, assuming one misdiagnosis from a doctor means that all doctors are incompetent is a logical fallacy drawn from a sample size of N=1 and ignores the cases which were not observed. Similarly, this doctor may have a unique training deficiency compared fellow doctors, and thus may be an unrepresentative sample. Ignoring the unique training of this doctor could also be considered a fallacy of exclusion. Studies reliant upon unrepresentative samples which hastily generalize are often accused of “cherry picking”, because only the so-called “ripe” results are selected, and others are cast aside as unwanted fruit.

In other cases, intuitive logical leaps are a product of emotional fervor which tends to blur the lines of reason by relying heavily on observer bias to make draw faulty conclusions from results. A false analogy occurs when one assumes that similarity between object/event/observation A is similar to B, and thus both A and B exhibit a similar property “P”. For example, uninformed medics in World War II noted melanin differences between white and black soldiers indicated that white soldiers (A) and black soldiers (B) have distinct property in their skin and thus also have distinct blood types (P). This resulted in unnecessary separation of donor blood for injured soldiers and arose from the racist segregationist beliefs running strong within the USA. Dr. Charles Drew, often called the father of the blood bank remarked, “Whenever, however, one breaks out of this rather high-walled prison of the "Negro problem" by virtue of some worthwhile contribution, not only is he himself allowed more freedom, but part of the wall crumbles. And so it should be the aim of every student in science to knock down at least one or two bricks of that wall by virtue of his own accomplishment” (Drew 1947). A proper experimental study not only avoids assumptions of shared properties in favor of empirically tested observations as a basis for hypothesis testing but seeks to expose and overturn them actively to approach truth.

Similarly, slothful induction is a fallacy which implies a conclusion despite data that contradicts the proposition. Slothful induction has become increasingly popular amongst ideologues that choose to believe a particular hypothesis without any empirical testing or data due to strong emotional, cultural, or social support for the propositions. For example, modern conspiracy theorist hypotheses including flat earthers, QAnon, and reptilian humanoid believers. In each case, the propositions put forward directly contradict evidence to the contrary, which is abundant in droves. By seeking to ignore contradictory data, a scientist ultimately falls victim to slothful induction an fails to utilize the scientific method effectively.

Optimal experimental methods exhibit simplicity in design. Resolution of competing hypotheses and strong scientific conclusions depend on clarity throughout the experimental design. Reducing the number of variables requiring testing decreases the complexity of the experiment and the time to completion. In part this is due to the inclusion of only the necessary positive and negative controls and exclusion of excessive numbers of variables requiring additional internal controls. Ultimately, simplicity of design also makes the conclusions regarding independent and dependent variables easier to understand and describe as well.

An optimal experimental methodology will test the researcher’s own presuppositions deductively. Testing one’s implicit presuppositions provides confidence in observations by revealing areas of relevant uncertainty. Typically, great progress in science follows from unearthing explanations about previously misunderstood phenomena in a way that changes the approach to a particular problem for an entire field. Many classic examples include a geocentric versus heliocentric model of the galaxy or the revelation of DNA as the principle hereditary molecule rather than protein. Scientific research methodology does well to incorporate testing of inherent presuppositions as a way of approaching the ever-moving target of certainty.

Ideal methods strive to innovate novel ways of measurement and observation. Innovative methods of observation allow researchers to circumnavigate limitations of less accurate means of measuring data. Any method of observation ultimately has limits of detection and quantification and increasing the precision and accuracy of measurement through innovation increases the depth of observation. The classic example of Antonie van Leeuwenhoek’s microscope was the first time that unicellular organisms could be visualized and ultimately gave birth to the field of microbiology when previous methods of observation lacked the ability to resolve such structures. Ultimately, the unknown may defy observation and innovative techniques of measurement in study design can incorporate simplicity and test presuppositions in tandem.

An optimal experimental methodology includes positive and negative controls. Experimental results are trusted most when they come from a source that can demonstrate accuracy and precision in measurements, and this is completed through inclusion of third-party standardized reference materials or alternatively a novel but appropriate control which has independently observed in some way by other researchers. A method that fails to reproduce a positive or negative control value can hardly be trusted to accurately produce values for samples if they cannot reach consensus with observations as others in their field. Similarly, exclusion of positive and negative controls within a methodology indicates a massive oversight in the researcher’s analytical approach and should be “aaken with a grain of salt or distrusted entirely.

An optimal experimental methodology attempts to normalize the variables on an equal playing field. Every method of analysis comes with some variability between samples and an optimal method of experimental design seeks remove unwanted variables in favor of a more specific focus on the main variable in question. Normalizing sample variability by creating eligibility criteria for samples which are to be included versus excluded is one way the researcher can separate confounding variables. Oftentimes standard deviation is used as a criterion for the removal of outliers to ensure a normal distribution; however, it is also possible to remove unwanted variables by normalizing them to an internal control within the sample matrix. This is the equivalent of a statement like, “all things being equal,” in that any internal variability has been adjusted a portion of a constant internal value to ensure the data can be compared. Optimal methods seek to interrogate specific variables in question by normalizing the distribution of variability in a way that favors specific comparison of only desired variables and not unwanted variables.

Works Cited

Bennett, J. (2017) Discourse on the Method of Rightly Conducting one’s Reason and Seeking Truth in the Sciences

Buckingham, L. (2012) Molecular Diagnostics: Fundamentals, Methods, and Clinical Applications 2nd ed.

Calaprice, A. (2011) The Ultimate Quotable Einstein Princeton University Press

Curtis, B. (2015) Examination of the safety of pediatric vaccine schedules in a non-human primate model: assessments of neurodevelopment, learning, and social behavior Environ Health Perspect. 2015 Jun;123(6):579-89

Descartes, R. A Discourse on Method. Oxford University Press 2006 translated by Ian Maclean

Drew, C. (1947) ‘”Charles R. Drew to Mrs. J.F. Bates, a Fort Worth, Texas schoolteacher, January 27, 1947” US National Library of Medicine, < https://profiles.nlm.nih.gov/spotlight/bg/feature/biographical-overview> accessed Jan 20, 2021.

Gadad, B.S et al (2015) Administration of thimerosal-containing vaccines to infant rhesus macaques does not result in autism-likebehavior or neuropathology PNAS

Haseegawa, Y. et al (2018) Microbial structure and function in infant and juvenile rhesus macaques are primarily affected by age, not vaccination status Sci. Rep 2018 Oct 26;8(1)15867

Loewi, O. (1921). "Über humorale Übertragbarkeit der Herznervenwirkung. I.". Pflügers Archiv. 189: 239–242.

Skoog, D., Holler, F.J., and Crouch, S.R. (2005) Principles of Instrumental Analysis 6th. ed Thomson Brooks Cole (P. 14)

Meillassoux, Q. (2014) Time Without Becoming

Taylor, John R. (1997) Introduction to Error Analysis 2nd ed. University Science Books, pp. 3, 7)