You have an assignment due tomorrow. Or rather, today. It’s 2 AM, and it’s due at 9 AM. You’re on your third cup of coffee, staring at what appears to be a very (very) rough draft. Your ideas are present, but the execution is clumsy, and you dislike the flow. You haven’t included any citations or built your bibliography. You haven’t run a spell check or proofread. And this essay is long, so you’re in for hours of work. You’ll be lucky to get any sleep. You stare at the screen, trying not to panic.
It is at this point that you consider doing something you would normally never do: using AI. You launch an LLM, paste your draft, and ask it to make it “more readable,” fact-check, and correct typos and grammar. Next, you upload the sources you’ve analyzed and ask the LLM to add structured citations and an APA-formatted bibliography. Minutes later, it generates the final version. Your prose is clearer and more polished, everything is fully cited and verified, your structure is up to par, all the i’s are dotted and t’s are crossed.
Here is the question:
Did you just cheat? Should you tell someone? Is what you did even allowed? And if so, where exactly do you draw the line?
If you have found yourself in this situation, you are not alone. And you are certainly not wrong to be uncertain about the answers. Why? Because academia is still defining the rules.
Today, we examine an article that attempts to bring some order to this mess. The authors are health researchers who found themselves in the exact same situation as everyone else: trying to understand how to use these powerful new tools without compromising academic integrity.
What makes this article interesting is that they didn’t just draft guidelines; they applied their own recommendations. They actually used ChatGPT to help write certain parts of the article, then precisely documented how they used it and what they learned from the process. And yes, it’s very meta. They used an LLM to help write an article about how to use LLMs to help you write articles. Inception.
Let’s delve into it.
They begin by acknowledging that these tools (LLMs) are fundamentally different from the software that preceded them. Researchers have long used technology to accomplish tasks more efficiently. This is the status quo. From academic search engines that save you from sifting through library shelves, to statistical software that accelerates complex analyses, to proofreaders that highlight errors, to reference managers that organize citations, to transcription software that converts old interviews, broadcasts, and speeches into text. The consensus is that all of this is acceptable. It is not cheating. But generative AI tools do not merely help you accomplish tasks. They produce original written content themselves, for you. And for many, this appears to be of an entirely different nature. These capabilities pose a challenge that traditional notions of academic integrity were not designed to handle. The line between “assistance” and “substitution” quickly becomes blurred.
So, how do we navigate this? Is there a structured way to think about it? This is precisely the objective of this article: to propose a framework (and safeguards) for an amorphous, grey, and rapidly evolving subject. To begin, the authors delved into the literature and compared what different articles said on the subject. From this, they derived a framework that conceptually organizes AI usage into three ethical levels, each with different degrees of acceptability and precautions.
Level 1 encompasses the most ethically acceptable uses. This is when you primarily use AI to restructure existing text, rather than to generate new content. Grammar and spell-checking tools fall into this category. This includes tools like Grammarly, as well as AI-powered spell and grammar checkers integrated into your word processor. Readability enhancement also belongs to Level 1, as long as authors ensure that modifications preserve their original voice and reasoning. At this level, the model should refine expression, not alter meaning. Translation tools complete this level, with some caveats. In many cases, an AI translator provides a starting point, not a final product.
Level 2 corresponds to “ethically contingent” uses that require careful handling of auto-generated content. This includes generating outlines from existing content, synthesizing materials, improving the clarity of existing text, or brainstorming ideas. The distinction lies in asking the AI to work from substantial input versus asking it to create content from almost nothing. In other words: do you provide it with content so it can organize concepts already present? Or do you ask it to create an outline from minimal elements and formulate the basic ideas itself? The former relies on the LLM’s organizational capabilities; the latter risks introducing concepts that are not your own and may be inaccurate.
Level 3 encompasses “ethically questionable” uses. Namely: asking the AI to write original text without providing substantial original content. Why? Because this approach bypasses the intellectual engagement essential for good research. You don’t think; you let it think for you. Data interpretation also falls into this category. Using AI for primary analysis short-circuits the deep engagement with data that leads to true understanding and insights. The authors argue that if you first analyze the data yourself, you gain a more complete understanding that then allows you to critique the AI’s interpretations. Without this grounding, you entrust your analysis to the machine and miss out on an essential part of the experience. Literature reviews pose similar problems. LLMs are notoriously unreliable for citing references, so for now, they are not suitable for this task.
To operationalize this framework, they propose a four-question checklist for evaluating the use of generative AI.
· Have I used generative AI in a way that ensures the main ideas, insights, interpretations, and critical analyses are my own? This question addresses intellectual property. AI should augment your thinking, not replace it.
· Have I used generative AI in a way that ensures humans retain their core research and writing skills? This addresses the authors’ concern about skill atrophy. Over-reliance on AI for fundamental tasks such as ideation, writing, and analysis could prevent researchers, especially novices, from developing essential capabilities.
· Have I verified that all content and references in my manuscript are accurate, reliable, and free from bias? This acknowledges that, regardless of AI usage, authors bear full responsibility for the accuracy and integrity of the manuscript.
· Have I disclosed exactly how generative AI tools were used to write the manuscript, and which parts involved AI assistance? This ensures transparency and allows readers to evaluate the work appropriately.
The authors recommend that even if the first three questions receive an affirmative answer, the fourth remains mandatory. Transparency (in their view) is not optional. And this applies even to ethically sound AI usage.
But how to disclose? Where and how to do it? While some journals, for example, allow “acknowledgments” sections, the authors believe the “methods” section is the most transparent place for disclosure. They suggest specifying which AI tools were used, for what tasks, how the AI’s output was processed (reviewed, edited, verified), and how the content was integrated into the final versions. Furthermore, for translation uses, they recommend that native speakers of the target language proofread the final manuscripts.
They also address the issue of academic development. Their concern is not only about immediate productivity but about professional growth and the evolution of researchers in their field. It is about ensuring that researchers can think deeply and creatively about research problems and the resulting data. While novice researchers most need these skills, experienced researchers also benefit from continuously adapting their capabilities to an evolving landscape. The fear is that over-reliance on LLMs (for ideation, primary content generation, and data interpretation) creates a dependency that hinders development. This is why their framework emphasizes, above all, the maintenance of human intellectual contribution.
That said, this field is evolving very quickly. Some problems (like hallucinations) might diminish over time. Others (like biases) might worsen before they improve. No framework will last very long. Conditions are changing too rapidly. The authors are firm on one point that, in their opinion, cannot and should not change: ultimately, it is the authors of an article who bear full responsibility for the manuscript’s originality, content accuracy, and appropriate references. Whether you used AI or not. The widespread adoption of these tools must never serve as an excuse to evade responsibility for errors or avoid culpability or legal liability. It’s your article, and it’s your work; whether you wrote it yourself or not.
If you wish to delve deeper into their analysis, explore the examples they studied, or obtain more details on their framework, I highly recommend downloading the article. It is a useful guide, and their recommendations extend far beyond what we could cover here.
Source: https://advancesinsimulation.biomedcentral.com/articles/10.1186/s41077-025-00350-6
