Sex Inclusive Research Framework
An evaluation framework to assess whether an in vivo research proposal follows the sex-inclusive research philosophy.
What is the Sex Inclusive Research Framework (SIRF)?
Our mission in Open Innovation is to share access to tools and technologies to advance innovation in science. The SIRF tool provides, for the first time, guidance to researchers in designing preclinical research aligned with a sex-inclusive research philosophy. The framework consists of a decision tree to evaluate a preclinical research proposal, with supporting guidance for each question. The evaluation lead to one or more “traffic light” outcome classifications, indicating whether a proposal is appropriate, carries some risks, or is insufficient with regards to sex inclusion. The classifications can be grouped into three types:
Green: Proposal is appropriate
Amber: Caution is required (I.e., the proposed design/analysis carries some risk)
Red: Justification for single sex study in the proposal is not sufficient
SIRF development
- Collapse all
Research has shown that scientists are supportive and believe that sex matters in early research but there are barriers which prevent the implementation of sex inclusive designs. Many of the barriers are culturally embedded misconceptions. When a decision has been made to study only a single sex, the framework evaluates whether the justification is a scientifically appropriate, reflective assessment that is not based on common misconceptions. The framework therefore supports a transition from generic to considered justification, which will assist the community in identifying when sex inclusive research is possible. The clarity of the framework provides transparency in the assessment process for both researchers and those evaluating the proposals.
The framework was developed for research involving in vivo or ex vivo samples. Many of the questions are equally applicable to clinical or in vitro research.
An expert working group of in vivo scientists, statisticians and funders was assembled to develop and test the framework. The framework was tested on a published dataset, with a number of ethical review boards and by a reviewing panel during a grant review cycle.
Name | Institute |
---|---|
Natasha A. Karp (Chair) | Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK |
Manuel Berdoy | University of Oxford, UK |
Jon Gledhill (Tool developer) | Comparative Biology Centre, Newcastle University, UK |
Lilian Hunt | Wellcome Trust, London, UK |
Maggy Jennings | RSPCA, Animals in Science Dept, UK |
Angela Kerton | The Learning Curve (Development) Ltd, Ware, UK |
Matt Leach (Tool developer) | Comparative Biology Centre, Newcastle University, UK |
Esther J. Pearl | The NC3Rs, London, UK |
Nathalie Percie du Sert | The NC3Rs, London, UK |
Benjamin Phillips | Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK |
Penny S Reynolds | University of Florida, USA |
Kathy Ryder | Department of Health, Belfast, UK |
S. Clare Stanford | University College London, UK |
Jordi L. Tremoleda | Queen Mary University of London, UK |
Sara Wells | The Mary Lyon Centre at MRC Harwell, UK |
Lucy Whitfield | OWL Vets Ltd, UK |
Karp, N. A., Berdoy, M., Hunt, L. E., Jennings, M., Kerton, A., Leach, M., … Whitfield, L. (2023, October 6). Sex Inclusive Research Framework (SIRF): an evaluation tool to assess whether an in vivo research proposal follows the sex inclusive research philosophy. https://doi.org/10.31219/osf.io/mxg3e
Accessing SIRF
SIRF resources
FAQ
- Collapse all
The objective of the sex inclusive research strategy is to ensure a generalisable estimate of an intervention effect across females and males. There is no requirement to prospectively power a study to detect a baseline difference between males and females or to detect whether sex explains variation in the intervention effect, but studies will detect large differences between females and males when it exists. The general advice has been to estimate the N needed and share it across males and females (MacCarthy M. Schizophrenia Bulletin 2015). Practically, this means the power calculations for the intervention effect can be simplified to a comparison focused on the intervention effect to estimate the total N needed for an intervention, and the resulting recommended N per intervention group is then shared with a balanced design between the females and males. Alternatively, power calculations for factorial designs can be conducted using appropriate methods (e.g., the Superpower package (Lakens and Caldwell Advances in Methods and Practices in Psychological Science 2021).
In experimental research, we can define a factor as an explanatory variable to be studied in an investigation. Factors can be divided into experimental factors (levels of the factors are assigned at random to the experimental units) or observation factors (levels of the factor are characteristics of the experimental unit). Sex and genetic status (e.g., wildtype, homozygous or heterozygous for a gene of interest) are examples of observation factors where experimenter-driven randomisation cannot occur and instead, we argue that randomisation has been achieved by Mendelian inheritance. Interventions such as treatments, dosage, timings are controlled by the experimenter.
In other research settings, such as clinical trials the terminology is different. Typically, sex will be considered as a variable to be managed by stratified random sampling and would be called a stratified design. Here, researchers divide a population into a homogenous subpopulation called strata based on a specific characteristic (e.g., race, gender) and samples are chosen at random from each stratum for an intervention group. This ensures equal allocation of subgroups to each experimental condition. This terminology has not to date been used within the in vivo research community and hence this terminology has been avoided.
If sex is treated as a blocking variable, rather than a factor of interest, after accounting for a baseline sex difference, an average intervention effect across males and females would be estimated. This statistical approach, however, would not test whether the intervention effect depended on sex. This is because a blocking factor is considered a nuisance variable and this implies that there is no interest in whether the intervention effect depends on sex. Therefore, modelling sex as a blocking factor risks missing valuable biological information. For most experiments, the inclusion of the interaction term has minimal impact on the power for the main intervention effect (as it only uses one degree of freedom to statistically assess this) and is the recommended best practice as it enables a direct assessment of whether the sex explained variation in the intervention effect (Phillips B et al. PLoS Biol. 2023. 21(6): e3002129.).
However, in situations where the sample size is small per intervention group per sex (e.g., 2), including the interaction term can reduce the power of the statistical analysis of any effect of the main intervention across the females and males. Fundamentally, these experiments have limited power and could be classed as exploratory. In these scenarios, the researchers could fit a model where baseline sex differences are accounted for as a blocking factor with a visual inspection of the data to determine if further research is needed to assess whether the effect of the intervention depended on sex.
An X-linked recessive gene, for male animals, means only one copy of the mutant gene in each cell is needed to initiate the phenotype. The comparative female group would therefore be a female homozygous for the mutant gene of interest.
For null or recessive alleles of genes on the X chromosome, the appropriate male vs female comparison in terms of loss of gene-product would be hemizygous male with homozygous female. However, there are some circumstances where more consideration is needed, this would include dominant negative mutations and alterations in pseudoautosomic regions where there are homologous sequences on the X and Y.
When the interaction between intervention and sex is statistically significant, this means the intervention effect depends on ‘sex’. The researcher will then need to look at the magnitude of this effect versus the overall effect to determine the biological relevance of this. It is also important to remember that ‘sex’ is used as a category, but multiple mechanisms underlie sex differences and therefore it is not sex itself that drives the sex-related variation but one or more of the underlying mechanisms that is associated with the sex category. Future research, to understand sex differences, will therefore need to consider the study objectives and select measurable, sex-related variables which provide plausible mechanisms to understand what is driving the sex-related differences.
Consider the example of 3 per intervention group: When the original N is an odd number, the N has to be increased by 1 to follow the recommendations on sex inclusive research. In a situation of low N, the increase is a high percentage of the total number of animals needed. If this increase in N is felt to be prohibitive, then a detailed justification centred on the financial cost is needed exploring the differences in cost versus the loss of generalisability by following the one sex research strategy.
A qualitative variable, also called a categorical variable, is a variable that can take on a limited number of possible values. For example, rib shape can be normal or abnormal. Such variables are commonly represented as counts or frequencies and are often presented in contingency tables. Logistic regression is a modelling method for a categorical outcome variable and explores how factors influence the probability of an event occurring. Such modelling can assess the impact of the intervention and whether the effect depends on sex by inclusion of an interaction term. InVivoStat (a point and click freeware tool) includes a module that supports the appropriate analysis (Bate & Clark 2021). Alternative strategies rely on statistical coding. For example, Karp NA et al. Genetics. 2017 Feb;205(2):491-501 explored a variety of statistical methods for studying rare events abnormalities following a genotype knockout intervention and whether the effect depended on sex. This work shared the R code developed for the study.
The sex inclusive research philosophy is to incorporate sex as a factor within the analysis. Ideally, this will be via a factorial analysis which will allow an estimate of a generalisable effect from females and males simultaneously and then assess whether there is a large difference in the intervention effect between the males and females. If the intervention effect is very different, the statistical power passes from the main intervention term to the term which assesses whether the intervention effect depends on sex (interaction term) in the analysis (Phillips B et al. PLoS Biol. 2023. 21(6): e3002129). However, in situations where the sample size is low per intervention group per sex (e.g., 2), including the interaction term can reduce the power for the intervention effect being estimated across the females and males. Fundamentally, these low N experiments have limited power and could be classed as exploratory. In these scenarios, the researchers could fit a model where baseline sex differences are accounted for as a blocking factor with a visual inspection of the data to determine if further research is needed to assess whether the intervention effect depended on sex.
Baseline sex differences are common, a high throughput study of wildtype mice found that 60% of the time there is a statistically significant baseline difference between females and males (Karp, N et al. Nat Commun. 2017. 8, 15475). In the context of a disease model, assuming that both enter the disease state following the induction process, a baseline difference between males and females is not surprising. In a factorial analysis, the baseline difference is accounted for by the inclusion of sex in the statistical model (Phillips B et al. PLoS Biol. 2023. 21(6): e3002129).
Baseline sex differences are common, a high throughput study of wildtype mice found that 60% of the time there is a statistically significant baseline difference between the females and males (Karp, N et al. Nat Commun. 2017. 8, 15475). In the context of a disease model, assuming that males and females have entered the disease state by the induction process, a baseline difference between females and males is not surprising. Frequently, researchers are then powering experiments based on a % change that has been selected arbitrarily rather than considering the biology of interest. This leads to a different effect size of interest depending on whether you are looking at the females or males. In this case, the differences in the effect size arise from the strategy used to select the effect size. An alternative strategy would be to select an effect size which is a change in the outcome variable that is appropriate for both males and females (e.g. minimum change of interest for both sexes) that would bring a biological meaningful change in disease status.
In some instances, a model induction process might be more effective in one sex than the other leading to a higher proportion of animals of one sex entering the disease state than the other. Ideally, if the disease impacts both females and males, the study should represent the patient population with a balanced design such that the conclusions you draw have equal precision for females and males. If, however, this is perceived to be challenging, the researchers should consider the cost and reduction benefit of proceeding with only one sex versus the benefit of using males and females. This justification would then need evaluation by an expert in the field.
In this situation, the justification is centred on a financial cost justification. The males are available but will cost more as they have financial value to the industry. Whether this is acceptable depends on a more detailed exploration of the differences in cost versus the loss of generalisability when following this research strategy.
What if the information provided does not include information on how the analysis will be conducted?
In this situation, the experiment includes males and females, but the reviewer is unable to answer question 11 on whether the analysis plan adequately considers sex-related variation in the data. The classification outcome in the SIRF tool would be ‘caution’ as there is a potential analysis risk. A decision then needs to be made by anyone assessing the design (e.g. during review of a grant application) on whether they wish to proceed with this potential risk and give feedback regarding the analysis risk or request additional information.
Garcia-Siuentes and Maney in 2021 published a paper examining publications where females and males were included and found that data analysis errors (such as pooling, disaggregation, comparison of p values) were common and appropriate factorial analysis rarely conducted. As a community, we need to not only work on including males and females where possible but also improve the analysis of data from inclusive experiments to promote rigor and reproducibility in biomedical research.
In this situation, the experiment set includes females and males, but the reviewer is unable to answer question 12 on whether the design has a balanced representation. The classification outcome in the SIRF would be ‘caution’ as there is a potential generalisability/inference risk. A decision then needs to be made by anyone assessing the design (e.g. during review of a grant application) on whether they wish to proceed with this potential risk and give feedback or request additional information.
Having a balanced design is important to ensure conclusions represent males and females (i.e. is generalisable) and enables the potential for a sex difference in intervention effect to be observed. From a statistical perspective, balanced designs have higher power and more reliable test statistics.
A separate framework is being developed for in vitro as the questions are different and need engagement with a different set of stakeholders. Though, some of the questions will be in common and reviewers have found it useful when considering an in vitro proposal.