Participant Commentary - Driving Institutional Change for Research Assessment Reform Archives | DORA

Rewarding robust and reproducible research: Experiences from the pilot phase of implementing research assessment reforms for hiring professors at BIH/Charité (MERIT-PROF)

DORAadmin — Sun, 20 Oct 2019 13:44:48 +0000

By Miriam Kip and Ulrich Dirnagl (Berlin Institute of Health)

In the fall of 2017, the Charité Universitätsmedizin Berlin introduced five additional items to the online application form for professorships. These new items are a) A narrative on the candidate’s overall scientific contributions, b) Statements on the impact of the candidate’s self-selected top 5 publications , c) A record of the candidate’s open science and reproducible research activities, d) Information on the candidate’s contribution to team science, e) Academic age.

What sparked the change in the application criteria?

An increasing number of publishers and funding agencies are taking steps to encourage robustness and transparency in order to increase the probability that research results will be reproducible and have an impact on medical care. Academic institutions need to play their part in this international drive to increase the value and reduce waste in research. To this end, the QUEST Center for Transforming Biomedical Research was founded by the Berlin Institute of Health to improve the trustworthiness, usefulness, and ethics of biomedical research through an innovative and comprehensive institutional initiative. Among other activities, this provided the basis for a reform of institutional research assessment processes at its University hospital and medical school, Charité Universitätsmedizin Berlin.

MERIT-PROF is an iterative policy implementation project led by the QUEST Center. The project aims to improve the assessment of research by implementing and evaluating the uptake of the new items into the hiring process for professors.

Facilitating patient-oriented translational research is a core mission of BIH and Charité. It is the task of the institution to provide the framework in which robust and reproducible research is incentivized and rewarded. We aim at transitioning from assessments of productivity and reputation based on metrics and narrow expert opinions to more transparent and structured assessments of the research robustness and reproducibility based on the content.

The organizational context

The QUEST center is one of the two innovation drivers at the BIH. The BIH is a public institution that obtains 90% of its funding from the federal government. The Charité Universitätsmedizin Berlin is a large academic medical center that employs more than 4,000 researchers and physicians and teaches 7,500 students. The Charité is one of BIH’s two corporate institutions (along with the Max Delbrück Center for Molecular Medicine (MDC)). Each year Charité hires 25 to 35 professors through external calls. The number of applications varies, from one to over 50 applications per call. The BIH and Charité share a common office responsible for the management and organization of all calls and the hiring commission for professorships under the leadership of the faculty management (recruitment office). The MERIT project team serves as a consulting party to both and one QUEST member is participating in hiring commissions and during deliberations as an independent party without a vote.

Where are we are at now and what are the next steps?

By adding the new items, the whole dynamic of the process is supposed to change for the hiring commission but also for the operational teams (e.g. the recruitment office) who actually provide the documents and information on which the assessment through the hiring commission is based. While having mainly focused on and compared numbers usually displayed in graphs so far, now the hiring commissions are asked to evaluate the additional text information for up to 50 applicants as well. To facilitate the access to the new items, the MERIT project team together with the BIH/Charité hiring office currently develop a format that brings together the new items within the overall narrative as the starting point of the evaluation process and selects metrics (such as the Relative Citation Ratio) as auxiliary information.

The new format produces new workflows. These new workflows need to be integrated into the existing highly formalized procedures.

During the pilot phase, one QUEST member sat in selected hiring commission at different stages of the selection process mainly to observe and gain an understanding of the institutional procedures and common practices. During interview rounds, the QUEST representative for example suggests to the hiring commissions to include the Opens Science track record of candidates as one additional item to the selection procedure.

MERIT follows a bottom-up, participatory approach with the target groups (e.g. recruitment office, faculty) being the main actors implementing the policy. The stages of the implementation are need and field assessment, agenda and strategy setting, introduction to the field, and adjustment. The participatory approach provides a framework for trust building and acceptance of the new measures, and ensures the responsiveness of the policy to the actual needs, requirements and capacities in the field. Both are essential pillars for an effective uptake of the new approach and ultimately research assessment reform.

MERIT-PROF is a subproject of BIH MERIT and one of the four projects funded through the Wellcome Trust Translational Partnership with the Charité. The funding period will start in December 2019. In the next two years, MERIT aims at participating as (independent) advisors in at least 60% of all hiring commission. In this process, the assessments formats will be further developed and the uptake of the new items and their possible effect will be evaluated (process and outcome evaluation).

The post Rewarding robust and reproducible research: Experiences from the pilot phase of implementing research assessment reforms for hiring professors at BIH/Charité (MERIT-PROF) appeared first on DORA.

The Myth of the Three-Legged Stool

DORAadmin — Sun, 20 Oct 2019 12:59:01 +0000

By Lee Ligon (Rensselaer Polytechnic Institute)

New faculty members at most (if not all!) research universities are given the same spiel: “your success here, and eventual promotion and tenure, is built on a three-legged stool, with one leg being research, one being teaching, and one being service.” And shortly thereafter, well-meaning mentors and department heads will tell the newbie, sotto voce, “do as little service as you can get away with, preferably none, and the bar you have to meet for teaching is just, don’t suck!” And this advice is not wholly inaccurate. At most research universities, advancement is largely tied to research success, and in our current competitive scientific market, one must devote all one’s resources to achieving this success. You can’t be a successful scientist by giving research only part of your attention. Hence the myth of the three-legged stool. A more accurate description would be a stationary unicycle that you have to balance on! But that doesn’t really roll off the tongue.

I’ve heard a lot of people arguing that we should reform the criteria for promotion and tenure to elevate service and teaching, if not to equal the expectations for research, at least to come closer. And while I don’t disagree that service and teaching are very important, it’s a complicated problem. Universities benefit from the strength of their faculty research programs, in both tangible and intangible ways. On the tangible side, research brings in money, through overhead costs and also research dollars spent on things like fee-for-service core facilities. Grants also pay graduate students, and can augment faculty salaries. While these tangible benefits are very important, the intangible benefits are priceless. Universities thrive on their reputation, and one of the major drivers of an institution’s reputation is the strength of the faculty scholarship. A well-respected research faculty gives an institution academic creds, and every press release and news story about a new finding or new big grant augments that reputation. Rankings rise, attracting more competitive students, more competitive faculty, more alumni donations, et cetera, et cetera, et cetera.

So what is the solution? Should we abandon the myth of the three-legged stool and just be honest up front? Should we try to tweak the formula to make it more balanced? I don’t have an answer, but I know that simplistic solutions won’t work!

The post The Myth of the Three-Legged Stool appeared first on DORA.

How a working group began the process of DORA implementation at Imperial College London

DORAadmin — Thu, 10 Oct 2019 13:30:16 +0000

By Stephen Curry (Imperial College London and DORA)

As declarations go, the San Francisco Declaration on Research Assessment (DORA) is relatively modest in its aspirations. For universities and research institutes, DORA’s main ask is to enhance research evaluation by focusing on the merits of individual papers and other valuable outputs (such as data, software, and training), thereby avoiding undue reliance on journal impact factors (JIFs).

Even so, it is much easier to sign DORA than to deliver on the commitment that signing entails. And while I would always recommend that universities sign as soon as they are ready to commit, because doing so sends such a positive message to their researchers, they should not put pen to paper without a clear idea of how signing will impact their approach to research assessment, or how they are going to develop any changes with their staff.

The precise path taken to implementing DORA will depend on the history and organisational idiosyncrasies of each institution. Nevertheless, it is likely that the establishment of an internal working group or committee to consider how best to infuse the spirit of the declaration within the institution will be a sensible move in most cases. This is the approach we took at my university, Imperial College London, after we signed DORA in January 2017.

Some of the groundwork for signing had been laid a couple of years earlier during an internal review of Imperial’s use of performance metrics. This review consulted widely – around 23% or academic staff responded with comments to a survey asking for their views on performance evaluation. Its recommendations, which took account of these views and drew on the principles of responsible metrics espoused by the Metric Tide report, established the important concept of ‘performance profiles.’ These profiles were explicitly constructed to recognise and reward academic contributions across a broad range of activities, including research, teaching and mentorship, departmental citizenship, and creativity.

While the development of performance profiles was very much aligned with the spirit of DORA, when Imperial signed the declaration a year or so later there was still work to do to ensure that our research evaluation procedures were compliant. Immediately after signing we therefore set up a working group composed of experienced academics from each of our four faculties (Natural Sciences, Engineer, Medicine, and Business) and senior staff from Human Resources – nine people in all (see report for details). The group was kept fairly tight to help focus discussions, though drew on the wide range of views collected as part of the earlier internal review. Its terms of reference were stated fairly simply:

To examine the implications of DORA for the College’s recruitment and promotion policies and procedures, and for its submission to the next REF (Research Excellence Framework).
To make recommendations on how the principles expressed in DORA can be embedded in the College’s culture and working practices.

In effect the group was charged with thinking through the details of what signing DORA would mean for all the key points when research or researchers are evaluated at the university. These include recruitment and promotion, annual appraisals, internal funding awards, and selection of outputs to be submitted to the UK’s national research assessment exercise, the REF. It met half-a-dozen times over a period of about nine months to complete its work and come up with recommendations, which were submitted to the Vice Provost’s Advisory Board for sign-off at the College level.

The main tasks involved were relatively technical and mostly involved reviewing processes and the phrasing used in internal documentation. Out went phrases such as “contributions to research papers that appear in high-impact journals” to be replaced by “contributions to high quality and impactful research.” The change is subtle but significant – the revised guidance makes it plain that ‘impactful research’ in this context is not a cypher for the JIF; rather it is work “that makes a significant contribution to the field and/or has impact beyond the immediate field of research.”

Other technical changes brought in a clear statement that is to be included in all advertisements for research or academic positions:

“The College is a proud signatory to the San Francisco Declaration on Research Assessment (DORA), which means that in hiring and promotion decisions we will evaluate applicants on the quality of their work, not the impact factor of the journal where it is published. More information is available at https://www.imperial.ac.uk/research-and-innovation/about-imperial-research/research-evaluation/.”

The working group also strongly recommended the adoption of a more narrative-based approach for promotion applications, which asks candidates to identify what they consider to be their four most important research papers and explain why they are so significant.

The report and recommendations of the DORA working group at Imperial, which were adopted in full in late 2017, were transmitted through staff briefings and published on the College’s dedicated Research Evaluation web-page.

Of course, it is one thing to come up with new policies and procedures, quite another to ensure that they take root within the university. The working group also had to address the challenge of shifting people away from embedded habits, a less technical but more difficult task. In part, that was done by placing renewed emphasis on more narrative than quantitative approaches that focus the attention of evaluators on people’s work, not their venue of publication. But there was also a concerted effort to engage the community in dialogue about DORA. To that end, a key recommendation was to hold a workshop to which the entire research community was invited.

This workshop, titled “Mapping the future of research assessment at Imperial,” was held in September 2018 and brought together researchers at all levels from across the university. To maximise its reach and to send a clear signal that Imperial is serious about reforming its research evaluation practices, the event was advertised internally and externally. The proceedings were also live-streamed and recorded – you can see the video on Imperial’s YouTube channel with presentations by Nick Jennings, Vice-Provost (Research and Enterprise), Stephen Curry, Chair of the DORA Steering Committee, Lizzie Gadd, Research Policy Manager at Loughborough University, and the Panel discussion chaired by Chris Jackson, Professor of Basin Analysis.

Even then, that is not the end of the matter. Like all universities, Imperial is a large and complex organisation, and real change will take time and regular repetition of the message that the institution is committed to DORA. While the workshop engendered good engagement from attendees with the thorny issues of research evaluation, which go well beyond Imperial, it obviously did not capture the attention of everyone who works here.

The work of the DORA working group at Imperial has concluded, but the effort needed to fully embed our commitment to the declaration has to go on. The effort is ably assisted by the strong advocacy of the Vice Provost for Research and the Director of Library Services, among others, along with vigilance, perhaps surprisingly, from the university’s Bibliometrics and Indicators team.

I doubt that everyone at Imperial yet knows that we have signed DORA, or that in every committee room across campus, our practices are entirely compliant with the declaration. But we have made a good start. This is a journey that necessarily involves many stakeholders: academics, universities, funders, publishers, learned societies – and DORA. I look forward to learning more about how institutions can make practical progress on reforming research evaluation at the DORA/HHMI meeting in October.

The post How a working group began the process of DORA implementation at Imperial College London appeared first on DORA.

What do preprints need to be more useful in evaluation?

DORAadmin — Mon, 07 Oct 2019 16:15:52 +0000

By Naomi C. Penfold & Jessica K. Polka (ASAPbio)

When scientists publish a journal article, they are doing more than just disseminating their work: they’re attaching it to a journal title that will, rightly or wrongly, telegraph signals about its quality. Preprints help to unbundle communication from the other functions of journal publishing, and they allow evaluators—funders, hiring committees, and potential mentors—to read a candidate’s most recent work. But there are situations in which this direct evaluation is difficult: for example, when there is too little time to read all papers for all applications, or when evaluators are not experts in an applicant’s immediate subject area. Therefore, it’s unsurprising that shortlisting for evaluation is often still based on journal names and/or impact factors. Without new ways to communicate summary indicators of quality from close expert reviewers to panel assessors, the utility of preprints in evaluation is limited.

How might indicators of quality be developed for preprints to make them more useful in research evaluation? We present three hypothetical processes to envision in 10 years time:

Transparency of reporting

To ensure that research can be scrutinized as necessary, adequate information needs to be transparently disclosed. A service could check to see that information (methods, data, code, disclosures) is complete and available.

Examples: CONSORT, STAR Methods, COS Open Science badges

Who might use it: Funders who want to ensure that their grantees are adhering to data sharing requirements (or even simply to find all outputs they support), or other evaluators (journal editors, etc.) could be more comfortable investing time in the evaluation with the knowledge that no information is missing.

Methodological rigor checks

Services could evaluate preprints to determine their adherence to community best practices. These services could review code, data, or statistics; detect image manipulation; and ensure that experimental techniques are technically sound. Extending on this, the newly announced mechanism (TRiP) to facilitate posting reviews on bioRxiv preprints could help provide expert reflections on the soundness and quality of the work in the form of transparent review reports earlier than (and perhaps in lieu of) any shared by journals.

Examples: Editorial checks offered by Research Square

Who might use it: PIs, when seeking to verify that the work going out of their lab meets their quality standards (especially helpful when collaborations result in interdisciplinary papers containing some work that is outside the lab’s regular scope of expertise), and also when hiring for their lab to see if the applicant’s previous work is rigorous and meets the community’s standards.

Overlay journals

Community interest in a paper could be recognized by its selection into a curated collection.

Examples: biOverlay, Discrete Analysis

Who might use it: General readers outside of a disciplinary niche, including evaluators looking for candidates who generate broad interest work, whether selecting new faculty candidates or funding grantees.

We present these scenarios to prompt exploratory discussion about the potential for preprints to help us move beyond journal-level indicators by nucleating evidence that assists article-level evaluation of science and scientists. Looking ahead, we question:

At which point in the publishing process do these scenarios naturally lie? Does it make sense to move them to the preprint stage?
Whom do they benefit? Who would pay for them?
What are the barriers (community-specific and general) toward establishing these scenarios?
What fraction of the community would need to use these models to effect widespread change? For example, assume that 5% of researchers voluntarily applied for methodological rigor badges. How would this action impact evaluators (funders, hiring and promotion committees, and potential mentors) and other researchers?
Finally, how might detailed preprint reviews and/or a combination of preprint quality indicators be accurately condensed into some indicator(s) that is concise and accurate enough to be useful when shortlisting? How might evaluators select the indicators that are most important to them?

Competing interests

We are employed by ASAPbio, a non-profit that promotes the productive use of preprints in the life sciences. ASAPbio is collaborating with EMBO on Review Commons, a new journal-independent platform that peer-reviews research papers before submission to a journal and uses bioRxiv’s TRiP mechanism to post review reports on preprints.

Public web commenting version of this doc at https://docs.google.com/document/d/1ztp33wl80HLp4Sd-zvcufHWZlkQRBUCY5QcIwnT6_n4/edit?usp=sharing

The post What do preprints need to be more useful in evaluation? appeared first on DORA.

The impact of research assessment on diversity

DORAadmin — Mon, 07 Oct 2019 16:07:27 +0000

By Olivia S. Rissland (University of Colorado School of Medicine)

Unfortunately, science is rife with examples where research assessment diminishes diversity. Hiring, promotion, and grant decisions are made with incomplete information that is also poorly predictive of success—the perfect conditions for bias to emerge. Metrics will naturally be weighted differently for different individuals, but what can be most telling are those individuals who are given a pass for specific metrics (e.g., they are hired for their potential) and those who are not (e.g., they are required to pass a threshold or show a track record). Combined with natural homophily and the make-up of committees making these decisions, bias then becomes a powerful force in opposition to diversity. To put it another way, often search committees don’t hire the person who fits the job, but rather they make the job fit the person.

Institutions and funding agencies can take immediate actions to mitigate the impact of bias. These include (but are not limited to):

Remove institutional nominations for awards. University gatekeepers can bias the pool of applicants that the selection committee sees. Because institutions are often restricted to one or two nominees, it is easy for bias and unfair procedures to lurk in the small sample size.
Track and publish the make-up of the applicant and interview pools as well as hires/awardees. Transparency about the fairness of selection procedures is critical not just for accountability, but also for garnering trust with underrepresented communities. For instance, awards that show an overwhelming male bias can lead to an underrepresentation in women applicants because applicants believe (possibly erroneously) that the selection committee is biased; this response, of course, further compounds the issue.
Ensure that interview pools are diverse. The interview stage thus represents the critical step to allow women and minorities to have a fair shot at being hired and selected. A recent study has shown that when only one interviewee was a woman, they had <5% chance of being hired. In contrast, when two interviewees were women, there was a 50% that a woman would be hired [1].
Make award eligibility criteria less restrictive. Age limits, time from Ph.D. limits, and nationality requirements all restrict the pool of people who can apply for awards, and these effects can then be accentuated by the Matthew effect (i.e., where the rich get richer). For instance, the NIH ESI eligibility window is 10 years from completion of the terminal degree, but an alternative criterion is that used by CIHR (5 years from starting a tenure-track position), which does not negatively affect faculty who had long postdoctoral training.

More broadly, any policy change in research assessment must be considered from the perspective of diversity. While the impact of any individual job searches or grant decision can be relatively minor, the Matthew effect can further entrench the effect of bias in research assessment. A critical step is to talk to many different scientists during policy development and take their concerns seriously, especially when specific groups highlight potential pitfalls or unintended consequences. Many policies have the potential to be weaponized against the people who have been historically kept out of science, especially if there is poor implementation, and so all policy decisions need to be explicitly considered for their impact on diversity.

[1] https://hbr.org/2016/04/if-theres-only-one-woman-in-your-candidate-pool-theres-statistically-no-chance-shell-be-hired

The post The impact of research assessment on diversity appeared first on DORA.

Role of societies in helping improve research assessment

DORAadmin — Mon, 07 Oct 2019 16:02:53 +0000

By Brooks Hanson (American Geophysical Union)

Scientific societies have a key role to play in changing and improving assessment of researchers. Many are key publishers of quality content and many of their journals are recognized as such without the burden of journal impact factors. They also play key roles in shaping the scientific culture of disciplines, including around ethics, authorship, and outreach, including in discussions at meetings and in career workshops. Societies primarily confer awards and honors including early career awards, best paper awards, and fellowships that recognize outstanding scholarship. Many volunteer and community service efforts are through societies—as editors, leaders, and reviewers. Societies thus could play an important role in helping shape and change the culture of research assessment by using these leverage points in productive ways—by adding to expectations and criteria around high quality research to include and foster open science; community service; team science; data, methods, and software development; education; career development, and more. For example, fellowship in most societies is now based solely on outstanding research. Many of the criteria do not also include or expect additional leadership around these other topics. Instead these are handled as additional, one-off awards. Consider the impact if multiple societies collectively asked also for leadership criteria or at least recognition around these other topics. Societies are also engaging in efforts around equity, diversity, and inclusion, which are critically needed to level initial playing fields and address bias in career advancement. Collective action may be needed to overcome inherent inertia and inhibitions in improving long-standing guidelines and criteria.

Some discussions that AGU has been having at meetings around this topic are here:

AGU Meeting 2018: https://agu.confex.com/agu/fm18/meetingapp.cgi/Session/45829 and recording: https://www.youtube.com/watch?v=k35H8Dt955Q
GSA Meeting 2019: https://community.geosociety.org/gsa2019/learn/technical/pardee
Across society effort that AGU is engaged with is here: https://pages.gms.sg/redefining-recognition/

The post Role of societies in helping improve research assessment appeared first on DORA.

Make it possible, make it rewarded, make it normal

DORAadmin — Wed, 02 Oct 2019 18:22:58 +0000

By David Mellor (Center for Open Science)

The Problem

There is widespread recognition that the research culture in academia requires reform. Hypercompetitive vying for grant funding, prestigious publications, and job opportunities foster a toxic environment. Furthermore, it distracts from the core value of the scientific community, which is a principled search for increasingly accurate explanations of how the world works. These values were espoused by Robert Merton in The Normative Structure of Science, 1942, in four specific principles:

Communal ownership of scientific goods
Universal evaluation of evidence on its own merit
Disinterested search for truth
Organized skepticism that considers all new evidence

The opposing values to these can be described as secrecy of evidence, evaluation of evidence based on reputation or trust, a self-interested motivation for conducting work, and dogmatic adherence to existing theories. The counternorms can be thought of roughly as an unprincipled effort to advance one’s reputation or the quantity of output over its quality.

The cultural barriers to more widespread practice of these norms of scientific conduct are well demonstrated in a study conducted by Anderson and colleagues in which they asked researchers three questions:

To what degree should scientists follow these norms?
To what degree do you follow these norms?
To what degree does a typical scientist follow these norms?

They found near universal support for the first question, strong support with some ambivalence to the second, and strong belief that norms are not being followed by typical scientists in the third question.

Think about the implication of these findings. How likely would someone be to act selflessly (in accordance with scientific norms) when they perceive that every other scientist will act selfishly? One could not design an experiment in game theory to more quickly lead to universal adoption of selfish actions.

This is the situation in which we find ourselves and demonstrates the barriers to reforming scientific culture. It also explains why the failures to date to improve scientific culture have occurred—mere encouragements to reform practice, to evaluate evidence by its rigor instead of where it is published, or to expose one’s ideas to more transparency and scrutiny are frankly naive when faced with such deep impressions that it will be personally harmful to one’s career.

Getting to Better Culture

But in the dismal state in which we find ourselves, there is an effective strategy for reforming the conduct of basic, academic research. The end goal is a scientific ecosystem that practices the norms of scientific conduct: transparent and rigorous research.

We can take inspiration from the process by which cultural norms change in other parts of our society. In Diffusion of Innovations, Everett Rogers describes how the most innovative early adopters of new practices and technologies are a small minority of society that pave the way for early and late majorities of people to take them up. And of course there are always laggards who prefer to stick with previous versions for many years. The effect of this distribution of adoption is an initially gradual uptake of new techniques, which quickly accelerates as the phenomenon is picked up by the majority of people. This trend plateaus as the technique reaches saturation within the community.

Rogers’ ideas can be transposed onto a strategy for culture change that is specifically relevant to a reform movement within the scientific community. Innovators will pick up a new, better technique as soon as it is possible. They represent the cutting edge of adoption and only a tiny minority of the community, however, which is mostly not yet ready to modify their workflows. As the new techniques become easier to use, however, early adopters will pick them up and integrate them into their workflows, seeing the benefits of the practice to their work. The great swell of adoption only begins as norms shift and it becomes visible for the majority of the community to identify these actions and begin to take them up. As recognition for accomplishment is the basic currency of scientific rewards (see, for example, the role of citations in academic stature), more adopters will take up a practice as it becomes a more recognized and therefore rewarded practice. Then the community is able to quickly rally behind updated policies that will codify such practices as bring required.

A Roadmap for Culture Change

This strategy lends itself to a series of specific initiatives that are useful for making this new culture a reality. Think of these initiatives as a roadmap for changing culture in the scientific community. Or, instead of treating the scientific community as a monolithic entity, take it as it is: a group of silos defined by disciplines—silos that are of course permeable as knowledge production increasingly relies on interdisciplinary efforts, but silos that are nonetheless rather resilient and reinforced by departments and academic societies.

The first necessary step is to make transparent and reproducible research possible with infrastructure to support it. The use of scripted statistical analysis software that makes perfectly clear how an analysis was conducted, online tools such as Github for collaborating and sharing this code, and hundreds of repositories that are helpful for storing the results of clear methods prove that reproducible research is possible. Depending on the exact procedure, it can take time to develop the skills to take these up, but practitioners know that this time is well spent in the overall efficiency gained by the reproducible workflow when it comes time to build upon recent work or onboard new members of the lab.

Improving the user experience for researchers and creating workflow spaces that are themselves repositories, as opposed to using end-of-life repositories that require curation and data cleaning prior to sharing, can make transparent and reproducible methods easier to implement and allow participation by more early adopters. An example of this is the OSF Registry that creates a project workspace where data and materials can be organized and, when the project is completed, shared as a persistent item in the repository

Shifting norms allow for researchers to see what current practices are widespread, to gradually shift expectations about what is standard, and to learn from what our colleagues are producing. These examples become the basis for one’s own work as we see and cite previous examples to model our own next steps. The adoption of open science practices such as sharing data, sharing research materials, and preregistering one’s study is now widely conducted, and that activity is visible to even the most casual observer of the field thanks to the table of contents of one of its most leading journals, Psychological Science.

Another example of shifting norms is the establishment of organizations in disciplines or geographic communities around these practices. The Society of the Improvement of Psychological Science and the UK Reproducibility Network are two such communities doing precisely that. These provide a venue for advocacy, for learning new methods, and for seeing the practices that the early adopters can share with the growing majority of practicing researchers.

Specific rewards can be accumulated by researchers for the activities that we’d like to see established. This will ensure that the ideal norms of scientific conduct are the precise activities that are rewarded through publication, grants, and hiring decisions. All three of these rewards can be directly integrated into existing academic cultures. Registered Reports is a publishing format in which the importance and rigor of the research question is the primary focus of scrutiny applied during the peer review process, as opposed to the surprisingness of the results, as is all too often the case.

The model can be applied to funding decisions as well, as is being done by progressive research funders such as the Children’s Tumor Foundation, the Flu Lab, and Cancer Research UK.

Universities have perhaps the most important role to play in furthering these conversations about rewarding ideal scientific norms. They can structure the job pathways so that transparent and rigorous methods are front and center for any evaluation process. Dozens of universities are doing this with their programs and job announcements. The essential element for these institutions is to ensure there is a clear signal that these activities are valued, and to back that up through decisions made based on adherence to these principles.

Finally, activities can be more easily required when the community expectations make policy change widely desired. The Transparency and Openness Promotion Guidelines (TOP) provide a specific set of practices and a roadmap of expectations for implementation. They include standards that journals or funders can adopt to require disclosure of a given practice, a mandate of that process, or a verification that the practice has been completed and is reproducible.

If not the impact, then how should evidence be evaluated?

There remains a gap in how recognition for scientific contributions should be recognized and rewarded. The most common metric remains the Impact Factor of a journal in which a finding was published. As DORA lays out eloquently, this is an inappropriate way to evaluate research. However, the Impact Factor remains an alluring metric because it is simple and intuitively appealing. There is, however, a better, more honest signal of research credibility—and that is transparency. The reason that transparency is a long-term solution for deciding which factor to use in evaluating research output is because it is necessary for evaluating the quality and rigor of the underlying evidence. And while transparency is not sufficient for evaluating rigor—it takes exceptional experience to fairly judge the details of a program and even sloppily completed science can be reported with complete transparency—it is the only way in which underlying credibility can be feasibly evaluated by peers.

Finally, transparency is a universally applicable ideal to strive for in any empirical research study. While no single metric or practice can be adopted by all research methods or disciplines, it is always possible to ask “How can the underlying evidence of this finding be more transparently provided?” If open data is not possible because of the sensitive nature of the work, protected access repositories can share meta-data about the study that describes how data are preserved. If the work could not be preregistered because it was purely exploratory or could not have been planned, materials and code can be provided that clearly documents the process by which data were collected.

The above roadmap for actions gives us an ideal way to expand transparent and reproducible research practices to a wider community of scholars. Research assessment is an essential component of the academic community, and transparency is the best viable path toward evaluating rigor and quality.

The post Make it possible, make it rewarded, make it normal appeared first on DORA.

Include Untenured Faculty in Departmental Tenure Decisions

DORAadmin — Wed, 02 Oct 2019 17:29:39 +0000

By Needhi Bhalla (University of California, Santa Cruz)

Once a scientist begins their position as a tenure-track faculty member, acquiring tenure becomes a primary goal. This crucial professional benchmark is a formal assessment of a scientist’s standing in their department, institution, and scientific field. Achieving tenure is associated with relative job security and can be accompanied by increased social capital and institutional power, all of which may be particularly relevant for minoritized¹ scientists. During the tenure process, members of a department, institution, and scientific field evaluate a scientist’s ability to effectively manage multiple academic roles, including obtaining and maintaining funding, publishing papers, teaching, and providing service to the department, institution, and field. Because of the wide array of these skills, it can often seem unclear how each of these individual roles are weighted during the tenure process. Further, there are often hidden, or unwritten, rules about what types of funding, publications, teaching, and service are valued by a scientist’s department, institution, and field during the tenure process. The fact that untenured faculty are excluded from discussions about and/or voting on tenure files of their more senior colleagues in most departments exacerbates the perception that tenure can be a moving target.

This lack of transparency, and its potential effect on consistency, makes tenure an unnecessarily gate-keeping enterprise. Having committed to the hiring and development of early career scientists, it is in the best interest of departments and institutions to make the tenure process as transparent and consistent as possible to ensure success. One mechanism to accomplish this is to allow untenured faculty to discuss and vote on the tenure files of more senior faculty members. Observing these discussions, and participating in them, allows untenured faculty to directly observe how these decisions are made. This insight into the criteria by which they will eventually be evaluated demystifies the tenure process, and clearly highlights what specifically they should focus on to ensure a successful tenure file. Having untenured faculty observe and participate in the tenure process can also provide accountability and trust so that the process can remain equitable, consistent, and less likely to be derailed by either overly powerful advocates or detractors.

This policy is currently in place in the Molecular, Cell and Developmental Biology Department at the University of California, Santa Cruz. My personal experience in both participating in these decisions and receiving tenure in this department is that being involved in the tenure decisions of my senior colleagues was simultaneously comforting and eye-opening, providing much-needed external input to my internal conversations about whether I was on track for tenure. This policy produces a unique combination of transparency, consistency, and accountability to a process that all too often can seem opaque and mysterious. Thus, it should be more widely practiced, particularly when the professional development and academic ascent of minoritized faculty is a priority.

¹https://www.theodysseyonline.com/minority-vs-minoritize

The post Include Untenured Faculty in Departmental Tenure Decisions appeared first on DORA.

Opportunities for review, promotion, and tenure reform

DORAadmin — Mon, 30 Sep 2019 14:37:20 +0000

By Erin C. McKiernan (Universidad Nacional Autónoma de México), Juan Pablo Alperin (Simon Fraser University), Meredith T. Niles (University of Vermont), and Lesley A. Schimanski (Simon Fraser University)

Faculty often cite concerns about promotion and tenure evaluations as important factors limiting their adoption of open access, open data, and other open scholarship practices. We began the review, promotion, and tenure (RPT) project in 2016 in an effort to better understand how faculty are being evaluated and where there might be opportunities for reform. We collected over 800 documents governing RPT processes from a representative sample of 129 universities in the U.S. and Canada. Here is what we found:

A lack of clarity on how to recognize the public aspects of faculty work

We were interested in analyzing how different public dimensions of faculty work (e.g., public good, sharing of research, outreach activities, etc.) are evaluated, and searched the RPT documents for terms like ‘public’ and ‘community’. At first, the results were encouraging: 87% of institutions mentioned ‘community’, 75% mentioned ‘public’, and 64% mentioned ‘community/public engagement’. However, the context surrounding these mentions revealed a different picture. The terms ‘community’ and ‘public’ occurred most often in the context of academic service—activities that continue to be undervalued relative to teaching and research. There were few mentions discussing impact outside university walls, and no explicit incentives or support structure for rewarding public aspects of faculty work. Furthermore, when examining an activity that could have obvious public benefit—open access publishing—it was either absent or misconstrued. Only 5% of institutions mentioned ‘open access’, including mentions that equated it with ‘predatory open access’ or suggested open access was of low quality and lacked peer review.

Fig. 1: Relative frequency of words surrounding the terms ‘community’ (left) and ‘public’ (right). Images from Alperin et al., 2019. eLife 2019;8:e42254 DOI: 10.7554/eLife.42254

An emphasis on metrics

The incentive structures we found in the RPT documents placed importance on metrics, such as citation counts, Journal Impact Factor (JIF), and journal acceptance/rejection rates. This is especially true for research-intensive (R-type) universities, with 75% of these institutions mentioning citation metrics in their RPT documents. Our analysis of the use of the JIF in particular showed that 40% of R-type institutions mention the JIF or closely related terms in their RPT documents, and that mentions are overwhelmingly supportive of the metric’s use. Of the institutions that mentioned the JIF, 87% supported its use, 13% expressed caution, and none heavily criticized or prohibited its use. Furthermore, over 60% of institutions that mentioned the JIF associated it with quality, despite evidence the metric is a poor measure of quality.

A focus on traditional scholarly outputs

Over 90% of institutions in our sample emphasized the importance of traditional outputs, such as journal articles, books, and conference proceedings. In contrast, far fewer institutions explicitly mentioned non-traditional outputs such as data (6-16%) and software (36-65%), or newer forms of scholarly communication like preprints (5-23%) and blogs that might be particularly important for communicating with the public. Even when these products were mentioned, it was often made clear they would count less than traditional forms of scholarship, providing a disincentive for faculty to invest time in these activities.

Fig. 2: Percentage of institutions of each type that mention at least one output in each category. Table from Alperin, J.P., Schimanski, L., La, M., Niles, M. & McKiernan, E. (in press). The value of data and other non-traditional scholarly outputs in academic review, promotion, and tenure. In Andrea Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren Collister (Eds.) Open Handbook of Linguistic Data Management. MIT Press.

Faculty value readership, but think their peers value prestige

Analyzing RPT documents has given us a wealth of information, but it cannot tell us how faculty interpret the written criteria or how they view the RPT process. So, we surveyed faculty at 55 institutions in the U.S. and Canada, focusing on their publishing decisions and the relationship to RPT. Interestingly, we found a disconnect between what faculty say they value and what they think their peers value. Faculty reported that the most important factor for them in making publishing decisions was the readership of the journal. At the same time, they thought their peers were more concerned with journal prestige and metrics like JIF than they were. Faculty reported that the number of publications and journal name recognition were the most valued factors in RPT. However, older and tenured faculty (who are more likely to sit on RPT committees) placed less weight on factors like prestige and metrics.

Opportunities for evaluation reform

There are several take-home messages from this project we think are important when considering how to improve academic evaluations:

More supportive language: We were discouraged by the low percentage of institutions mentioning open access and the negative nature of these mentions. There is a need for more supportive language in RPT documents surrounding the public dimensions of faculty work, like open access publishing, public outreach, and new forms of scholarly communication. Faculty should not be left guessing how these activities are viewed by their institution or how they will be evaluated.
Words are not enough: Institutions in our sample mentioned ‘public’ or ‘community’, but had no clear incentives for rewarding public dimensions of faculty work. Simply inserting language into RPT documents without the supporting incentive structure will not be enough to change how faculty are evaluated. RPT committees should think about how these activities will be judged and measured, and make those assessments explicit.
Valuing non-traditional outputs: Non-traditional outputs like data and software are often not mentioned, or relegated to lower status versus traditional outputs, providing clear disincentives. RPT committees should update evaluation guidelines to explicitly value a larger range of scholarly products across the disciplines. While some ranking of scholarly outputs is to be expected, some products clearly merit an elevation in status. For example, publicly available datasets and software may have equal if not higher value than journal articles, especially if wide community use of these products can be demonstrated.
Deemphasizing traditional metrics: Traditional metrics like citation counts and JIF give just a glimpse (and sometimes a very biased one) into use by only a small, academic community. If we want committees to consider diverse products and impact outside university walls, we have to expand what metrics we consider. And we should realize that relying on metrics alone will provide an incomplete picture. RPT committees should allow faculty more opportunities to give written descriptions of their scholarly impact and move away from dependence on raw numbers.
Discussing our values: Our finding that respondents generally perceive themselves in a more favorable light than their peers (e.g., less driven by prestige like JIF), elicits multiple self-bias concepts prevalent in social psychology, including illusory superiority. This suggests that there could be benefits in encouraging honest conversations about what faculty presently value when communicating academic research. Fostering conversations and other activities that allow faculty to make their values known may be critical to faculty making publication decisions that are consistent with their own values. Open dialogues such as these may also prompt reevaluation of the approach taken by RPT policy and guidelines in evaluating scholarly works.

Overall, our findings suggest that there is a mismatch between the language in RPT policy documents and what faculty value in publishing their research outputs. This is compounded by parallel mismatches between the philosophical ideals that tend to be communicated in RPT policy and other institutional documents (e.g., promoting the public nature of university scholarship) versus the explicit valuation of individual forms of scholarly work (i.e., little credit is actually given for sharing one’s scholarly work using venues easily accessible to the public). It seems that both institutions and individual faculty espouse admirable values—and it’s time for RPT policy to accurately reflect this.

The post Opportunities for review, promotion, and tenure reform appeared first on DORA.

Leveraging values to improve and align research assessment policies and practices

DORAadmin — Mon, 23 Sep 2019 17:13:06 +0000

By Anna Hatch and Stephen Curry (DORA)

Introduction

Universities cannot achieve their missions and visions if their stated values are out of line with research assessment policies and practices. Although most university mission statements specify research, teaching, and public service as their central commitments, contributions to research are often valued at the expense of teaching and public service. How serious is this misalignment and what can be done about it?

Mission and vision statements often also convey other valued aspects of scholarly work. For example, Cornell University explicitly mentions its collaborative culture.

“Cornell aspires to be the exemplary comprehensive research university for the 21st century. Faculty, staff and students thrive at Cornell because of its unparalleled combination of quality and breadth; its open, collaborative and innovative culture; its founding commitment to diversity and inclusion; its vibrant rural and urban campuses; and its land-grant legacy of public engagement.”

Cornell vision statement

The University of California, Los Angeles places an emphasis on open access, respect, and inclusion.

“UCLA’s primary purpose as a public research university is the creation, dissemination, preservation and application of knowledge for the betterment of our global society. To fulfill this mission, UCLA is committed to academic freedom in its fullest terms: We value open access to information, free and lively debate conducted with mutual respect for individuals, and freedom from intolerance. In all of our pursuits, we strive at once for excellence and diversity, recognizing that openness and inclusion produce true quality.”

Excerpt from the University of California, Los Angeles mission statement

The public dimensions of scholarly work directly relate to the public missions of many universities, but they are still commonly undervalued in review, promotion, and tenure policies. At DORA we believe that the clearest route for research institutes to enhance their research assessment policies and practices is to build them on the solid foundation of their institutional values.

Closing the gap

This is easier said than done, but there are excellent examples of practical steps that can be taken. For instance, working groups can help institutions co-create how their values are to be embodied in research assessment policies and practices. In particular, by bringing together a diverse and representative group of university members, standards and processes for evaluation can be developed that have buy-in the staff who are most likely to be called upon to conduct assessments, for example in recruitment or promotion processes.

Working groups can operate in different ways. For example, the University Medical Center (UMC) Utrecht hosted a series of meetings to collect input on research assessment from the academic community on campus. Policies were developed based on the feedback that was received. The Universitat Oberta de Catalunya (UOC) used a different strategy and assembled a formal task force to consider how to improve their research evaluation processes prior to signing DORA. This led to the creation of a multi-year action plan for the university to implement DORA, which includes recognizing the value of a broad set of outputs and outcomes from their research.

Building trust

Journal-based evaluation is so deeply rooted in academic culture that new policies alone are not guaranteed to bring about change in how researchers are assessed. Key to that change is building trust that new, values-based policies will be enacted.

Community engagement is essential for building that trust – and for aligning policies and practice. While workshops are an excellent tool for involving staff, they cannot involve everyone so communication with the wider academic community at the institution is vital. UMC Utrecht made sure that reports from their workshop discussions along with interviews from participants were published on the university’s internal website to engage the whole academic community on campus.

There are other ways to engage academics in research assessment reform. UOC is currently building support for its policy changes through presentations and training sessions on campus. Imperial College London hosted a half-day workshop in 2018 to discuss how the landscape of research assessment is changing.

Transparency is another key element for building trust in research assessment policies and practices. While there are many ways to increase transparency, rubrics (i.e., criteria of assessment) offer a versatile option. They can be shared as information with applicants at the beginning of the process, or used to provide individualized feedback when the assessment is complete. The University of California, Berkeley created a Rubric to Assess Candidate Contributions to Diversity, Equity, and Inclusion that departments can use. To increase consistency in its teaching evaluations, the Center for Teaching Excellence at the University of Kansas developed criteria that spanned seven dimensions of teaching practice.

Departments and institutions may also choose to openly share information about the integrity of the assessment process with applicants. For example, are applicants de-identified at any of the steps? Do applicants have faculty advocates (as happens in the Cell Biology Department at the University of Texas Southwestern Medical Center)? Or is an independent observer present during the decision-making discussions (a practice followed at Berlin’s Charité University hospital)? The more information that is shared about evaluation processes, the greater their credibility among those who are evaluated.

Getting started

Approaching research assessment reform can be daunting – a mountain to climb. There are many areas where we could and should do better, but where to start? We believe that leveraging institutional values to drive change is the most natural route forward, harnessing principles that are shared by most scholars and researchers. And we hope that the examples given above show some of the steps that others have already taken. All it takes to climb a mountain is to proceed one step at a time.

The post Leveraging values to improve and align research assessment policies and practices appeared first on DORA.