Advertisement · 728 × 90

Posts by Overly.Honest.Editor

Even better still, just asking where the data is from exactly, who takes responsibility for its integrity, and how was it validated, before you use it to validate something else are all a good start #ScientificPublishing #ResearchIntegrity #AImess 15/15

14 hours ago 3 1 0 0
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods | EQUATOR Network

But if you're an editor and handle clinical AI, becoming a friend with the TRIPOD AI checklist may not be the worst idea 14/

www.equator-network.org/reporting-gu...

14 hours ago 0 1 1 0

What can we do in the meantime? Whole sale probably very little, because people who need to pay attention (ML crowd) are (a) not paying attention and (b) too invested in proving ML works to realise it's all GIGO. 13/

14 hours ago 1 0 1 0

Probably started twitching somewhere around 20th green chemistry paper citing his work.

So I think it is a fair question to ask: what will the database do about it, and how (because the problem is not trivial). 12/

14 hours ago 0 0 1 0

These are not small numbers. An informed guess suggests that most datasets are used little to not at all, so a 100 academic citations bump should be very noticeable, and something I'd immediately worry (rather than boast) about. Just like I suspect Vickers (of Vickers 2017 paper)... 11/

14 hours ago 0 0 1 0

Autism diagnosis Piosenka dataset polluted probably another 100 or so papers directly, though given all the clones of this dataset - is hard to say. 10/

14 hours ago 0 0 1 0

Each bad dataset can have substantial impact on the literature at best, and people's lives and livelihoods at worst. The preprint cited in the Nature piece identified 120+ papers using either of those problematic datasets 9/

www.medrxiv.org/content/10.6...

14 hours ago 0 0 1 0

What I find interesting is that it's not clear to me if Kaggle realises this yet. It boasts 700k high quality datasets. Apart from, you know, the fake ones, the invalid ones, the unethically sourced ones. (I mean, they should know now, since preprint folks told them.) 8/

14 hours ago 0 0 1 0
Advertisement

This, in essence, is a matrioshka of disasters. Science has a bad ML problem. ML community has a major Kaggle problem (and apparently doesn't realise it yet), and Kaggle? Well Kaggle has a data problem. Which sucks if hosting data is your raison d'etre. 7/

14 hours ago 1 0 1 0

Which I'm guessing some of these tools may have been doing OK - it became some sort of autism voodoo, essentially new age phrenology. 40+ papers and not one of the authors stopped to ask if there is any scientific basis to the core premise that underlies their tool. 6/

14 hours ago 0 0 1 0

Once used, twice used, these quickly become ML-community benchmark for specific algos, without considering if these algos even make sense. The autism diagnosis case was a perfect example of that, because once taken out of image classification realm... 5/

14 hours ago 0 0 1 0

Kaggle is filled with datasets that are poorly described, of unclear provenance, and dubious validity. There is also a lot of legit data from elsewhere which was "cleaned" by enthusiasts and posted to Kaggle, stripping it of original metadata and with no clear description of changes. 4/

14 hours ago 0 0 1 0

But if you think that one autism, one stroke and one diabetes dataset is all there is, think again. More across other disciplines is percolating towards, I'm guessing, more retractions (Editor Cat-mondo is aware of at least one EV-related widely used dataset like this) 3/

14 hours ago 1 0 1 0
Springer Nature retracts, removes nearly 40 publications The dataset contains images of children’s faces downloaded from websites about autism, which sparked concerns at Springer Nature about consent and reliability.

This is a second case, in short succession, of a bad, possibly unethical, most likely invalid, potentially nonexistent Kaggle dataset(s). The first one earned some 40 #retractions already, and more are surely coming 2/

www.thetransmitter.org/retraction/e...

14 hours ago 0 0 1 0
Preview
Dozens of AI disease-prediction models were trained on dubious data The models are designed to predict someone’s risk of diabetes or stroke. A few might already have been used on patients.

There is so much to unpack in this recent reporting that I'm not sure where to even start #ScientificPublishing #ResearchIntegrity 1/

www.nature.com/articles/d41...

14 hours ago 2 2 1 0
Preview
Dozens of AI disease-prediction models were trained on dubious data The models are designed to predict someone’s risk of diabetes or stroke. A few might already have been used on patients.

This is why I keep banging on about the importance of trained humans in the loop for data analyses. You *cannot* just take a dataset and run, especially not unattributed data from kaggle. You *must* do the due diligence to look for artifacts, outliers, and edge cases with your actual human eyes.

1 day ago 62 20 1 0

Quite. But what gets me every time with those is that there is always some author who says "but it doesn't matter that data may be all wrong or even not real, because we only used it to test the tool, not for clinical predictions".

And I just can't help but feel pretty gaslit at this point...

1 day ago 2 0 0 0

Though I've feelings about the whole "journal that is not selective publishes most that it receives" because current data suggests that even for some of the most inclusive journals accept rates are within 30-40%. Much higher that your glam journals, but a far cry from everything.

1 week ago 0 0 1 0
Advertisement
‘Why is publishing so expensive?’ For many of us who work in scientific publishing, the title of this Editorial is a question we hear all the time when we're out talking to academics. And it's a perfectly reasonable one. After all, researchers produce the content, for free, and act as peer reviewers to assess whether it's ‘worthy’ of publication – also for free (in most cases; our sister journal Biology Open being a notable exception: https://journals.biologists.com/bio/pages/fast-fair). ‘All’ the journal does is put the final version online, so how can that possibly cost US$5400 [the current Article Processing Charge (APC) for Journal of Experimental Biology (JEB)]? In this Editorial, we (the Editor-in-Chief and Managing Editor of JEB and the Publishing Director of The Company of Biologists, our not-for-profit publisher) provide details and context on JEB's finances, and hope to dispel some of the myths around the economics of publishing.For those of you who want the take-home message up-front, the two key points we want to make here are firstly that quality editorial assessment and publishing involves a lot of people (meaning our largest costs are staff salaries and, to a lesser extent, Academic Editor stipends), and secondly that publishers have had to invest heavily in technologies to support professional and trustworthy online publishing – there's a lot more to it than just posting an article on a website. But, before we get into the details, here's how a typical conversation with a researcher might go when talk turns to money:You can also publish in JEB completely for free; it's just that your article then won't be available to non-subscribers until it is a year old. More broadly, though, while preprint servers are fantastic, they are not journals. Preprints don't incur the same costs because they don't go through the kind of rigorous screening that happens at a journal such as JEB – both through peer review and through the technical and integrity checks we perform, which help to ensure that readers can and will trust your work. Not only that, but by publishing in a community-centric journal such as JEB, you can maximise the chances of your paper being seen and used by your colleagues. Our post-acceptance processes (outlined in more detail below) will ensure it is in the best possible shape to be read by the community, it will get indexed in Web of Science and other services, it will land in people's inboxes and on their social media feeds and, in some cases, it will get featured through a highlight, interview or press release. As discussed below, all these things take time and money and are key services that publishers can provide but preprint servers can't. Professional online publishing isn't as cheap as it might initially seem. In the context of all our costs, the actual process of printing (which JEB only stopped at the end of 2024) was a very minor outgoing. And given that over 20% of full-text views to JEB articles are to the PDF version rather than the HTML, it's clear that many readers still want to read in PDF format, meaning we still need to typeset each article. Not only that, but we, in collaboration with our online hosting partners Silverchair, need to make sure that articles are preserved in perpetuity, meet modern accessibility requirements and that the online reading experience provides the kind of functionality that readers expect. It may not all be visible, but a lot of work goes into maintaining and improving the website.Firstly, none of The Company of Biologists' journals have page charges, although some journals at other publishers still do. But article length is still relevant – we are charged for typesetting by the page, and a longer paper typically takes longer (and therefore costs more) to copyedit and proofread. That said, the reason we believe length limits still have a place is not because of cost but rather because we want to encourage conciseness for the reader's benefit.APCs at well-established journals like JEB can vary from a few hundred dollars to over $10,000. There are many reasons for this variability, ranging from the economic model of the publisher (for-profit versus not-for-profit) and the scale of the operation through to the staffing model of the journal (professional versus academic editors; in-house versus outsourced production processes). One factor that's often under-appreciated is the selectivity of the journal, i.e. what percentage of papers received are eventually published. An Open Access (OA) journal that publishes the majority of its submissions receives income for most of the time and money expended in assessing those manuscripts. By contrast, a highly selective journal spends a lot of time processing papers that won't get accepted and therefore bring in no money. Many publishers have tried to get around this problem by providing trickle-down cascades to keep papers (and the income they represent) within the same publishing house, but at an individual journal level, the more selective you are, the more you have to charge to recoup your costs. Of course, you can argue about the value of such selectivity, but that's a whole separate conversation…In years gone by, the traditional subscription-based publishing model typically provided healthy revenues for publishers big and small. For The Company of Biologists (which publishes four journals in addition to JEB: Development, Journal of Cell Science, Disease Models & Mechanisms and Biology Open), we ensured this surplus went back to the community in the form of charitable activities while also maintaining an investment pot to provide financial security to the organisation (see Box 1). But we are now in a very different world. Over the past two decades, the laudable push for OA publishing has led to an almost bewildering array of publishing options, business models and author-facing charges. Meanwhile, as discussed below, publishing has become a more complex (and costly) process. Given that revenues from OA streams are typically smaller than those from subscriptions, finances at small publishers like us are now significantly tighter: we now operate our journals at a close to break-even budget, and our investment pot is the primary source of our charitable funds.Box 1. The Company of Biologists' finances and charitable activitiesAs a UK-registered charity, The Company of Biologists exists to benefit the scientific community and not to profit shareholders. Our charitable activities include the following: organising and hosting scientific Workshops and Meetings for the biological communityproviding scientific meeting grants to help defray the costs of organising conferences and support scientific interactionsfunding early career researchers undertaking collaborative visits to other labsproviding travel grants for researchers attending conferences and training coursesblock grants to three UK-based societies (the British Society for Developmental Biology, the British Society for Cell Biology and the Society for Experimental Biology) to support their activities.We know from community feedback how vital this support is to researchers across the globe.In the past, funds for these activities came from the surplus made by our publishing programme. However, our publishing costs have grown significantly in recent years (by 22% in real terms – after accounting for inflation – between 2015 and 2024). We have worked hard to avoid passing these costs on to our community and, as noted elsewhere, OA-based revenues are typically lower than those from subscriptions. Given these challenges, our income has decreased by 10% in real terms over the same period despite continued growth in R&P participation powered by successful relationships with libraries. Consequently, while in 2015 our surplus was able to fund all our charitable activities, in 2024 it represented less than 40% of our charitable spend, with the rest coming from the growth in our investments.We remain committed to supporting the scientific community well into the future and are fortunate – thanks to prudent investments over the past 100 years – to be in a position to do so despite the challenging financial circumstances that small publishers are facing.Academics also tend to be much more aware of the money side of things these days because they are often required to pay publishing fees from grant funds. In this context, publishers have a responsibility to be more transparent about where that money goes. So, let's start with the revenue coming in. JEB is a hybrid journal, offering both OA and subscription routes for publishing. Income comes from three main sources: library subscriptions, Read & Publish (R&P) agreements (which allow fee-free OA publishing for authors; see https://www.biologists.com/library-hub/read-publish/) and direct APC payments. As libraries increasingly move from a traditional subscription to an R&P agreement, this has become our main source of income, reducing the need for authors to pay APCs and enabling the majority of the journal's content to be made available OA at the point of publication. It's important to note here that we have been careful in setting our subscription pricing to avoid so-called ‘double-dipping’ – charging libraries for access to OA content for which we have received income through other routes. Fig. 1 shows the proportion of papers published in 2025 through each of the options available to authors.While income from APCs represents only a small proportion of our revenue, the APC feeds into our R&P pricing. Our aim is therefore to set the APC at a level that begins to cover our costs per article, allowing us to continue to function as traditional subscription income declines.It is important to note that as a small not-for-profit publisher of only five journals, we do not benefit from any of the economies of scale that can be achieved by larger organisations. We publish our journals ourselves (rather than partnering with a larger publisher, as many society and non-profit journals do) because this gives us full control of how we operate and the quality of our product. These are significant benefits to going it alone, but the financial challenges are exacerbated by not being part of a large stable of journals with common resources and opportunities for collaboration.Our biggest direct cost area is in salaries and stipends (Fig. 2). The JEB in-house team consists of eight people (7.8 full-time equivalents) – the Managing Editor, Senior Editor, Features and Reviews Editor, News and Views Editor, three Production Editors and three Editorial Administrators (who also work on other journals). Producing the journal also requires work from our Technical Operations department, including graphics and proofreading teams. Without these committed staff members, manuscripts wouldn't move through the assessment and peer review process, accepted papers wouldn't get copyedited (checked and corrected for accuracy, clarity and compliance with key journal policies) or scrutinised in-house for integrity issues and there would be no review-type articles in the journal. In short, JEB just wouldn't get published. Equally important are our team of Academic Editors, who are modestly compensated for their time and dedication to the journal – reading submissions, selecting peer reviewers, making decisions on which articles to publish, supporting authors through the publication process and contributing to the overall strategic directions of the journal.It's all very well producing a journal and papers to go in it, but we also need to make sure that people can find and read them, and this is where our Sales and Marketing department comes in. As mentioned earlier, The Company of Biologists has been very successful in growing its OA content through R&P agreements, but this has required significant investment in our Sales team. Meanwhile, we rely on our Marketing staff to help us promote the journal and its papers, to reach new readers and authors and to gather feedback on what we do well and where we can improve. Finally, of course, there are all the general expenses and overheads – like any organisation, we need to manage and develop the business and its people, we need finance and HR support, and we need to keep the lights on and the laptops running.As well as our people and facilities, we are also reliant on a host of technology partners and platforms – from our manuscript submission system to our typesetting service provider to our online hosting platform. And it's not enough just to put up a PDF: as noted above, we need to keep on top of emerging standards and evolving expectations from readers and librarians, providing, amongst other things, fully accessible XML, play-in-place movies, archiving, indexing and appropriate linking to relevant databases. Increasingly, we have also had to invest in software solutions and staff time to detect and address publication integrity issues, including plagiarism, inappropriate image manipulation and papermill submissions. The age of AI will only exacerbate these issues.Publishing, done well, is an expensive business. For every paper you read in JEB, at least five different in-house staff – an Editorial Administrator, the Managing Editor, a member of our graphics team, a Production Editor and a member of our proofreading team have been involved in its journey from submission to publication, as have at least one Academic Editor, two or three peer reviewers and the (outsourced) typesetting team. And across the whole organisation, we employ 62 full- or part-time staff whose roles support our publishing operations (not counting those solely dedicated to our charitable activities). We are proud to be a people-centric organisation, and while we're actively assessing options to streamline processes and reduce costs, the quality you expect from JEB will continue to be at the forefront of our thinking.

It does if you do it right

journals.biologists.com/jeb/article/...

1 week ago 2 1 1 0

Funny this was my first immediate thought: why was it unsigned to start with? I'd love to hear Lancet's thoughts on that.

1 week ago 1 0 0 0

Meh #ScientificPublishing

2 weeks ago 4 1 1 0

Fair enough

1 month ago 0 0 0 0

And the outcome should not depend on that label - because what's in the paper doesn't depend on it either.

Just wanted to be clear about this, because ppl seeing fraud everywhere is a pet peeve of mine. 2/2

1 month ago 0 0 1 0

I mean that's going a bit to far for me. There is a reason I didn't call it fraud but misconduct, and that's because it often isn't. It's laziness, carelessness, recklessness that seeps into the literature. And that's bad enough without us labelling it fraud. 1/

1 month ago 0 0 1 0

Timing is too precious bsky.app/profile/thew...

1 month ago 0 0 0 0
Advertisement

and yet somehow when this happens in research we just shrug and suggest things can simply be corrected. Let's just stop using corrections as sweeping under the carpet tool, shall we? #ScientificPublishing #ResearchIntegrity 5/5

1 month ago 2 0 0 0

But somehow it's never just one, it's never just a simple typo, and it's never the people who want to make sure what they publish is accurate.

It's also boggles cat's mind that courts can already penalise lawyers who do this - latest here:
storage.courtlistener.com/recap/gov.us... 4/

1 month ago 2 0 1 0

And you can of course argue over what counts as hallucination, or whether intentions matter (tho the point of using recklessness criterion is exactly to say that weaponised incompetence cannot be used as defence), and how many do you need to find to cross the threshold... 3/

1 month ago 1 0 1 0

Editor Cat-Zhang is only surprised we actually need to debate this - though note this comes from me as a proponent of using recklessness as a #misconduct criterion
bsky.app/profile/edit... 2/

1 month ago 2 0 1 0
Preview
Hallucinated citations produced by generative artificial intelligence may constitute research misconduct when citations function as data in scholarly papers In this article, we discuss the growing problem of hallucinated citations produced by Generative Artificial Intelligence (GenAI) in scholarly research and writing. We argue that GenAI hallucinated ...

Question you never asked but will get an answer to anyway: yes, use of hallucinated references should be considered a misconduct #OverlyHonestEditor #ResearchIntegrity Resnik & Hosseini tackle this in their fresh paper
www.tandfonline.com/doi/full/10.... 1/

1 month ago 6 4 2 1