Stewardship Over Automation: FDA Redefines AI Accountability in Pharma

In reaction to the April 02, 2026 landmark FDA warning letter issued to Purolea Cosmetics Lab (MARCS-CMS 722591)^[1], The Regulatory Mix^[2] suggests that a trap exists in the shift from Production to Stewardship while using AI in Pharma. This trap, as illustrated in the warning letter, is the failure to verify, while passively observing, the outputs of Generative AI.

In utilizing artificial intelligence (AI) agents to “produce” drug product specifications, procedures and master production or control records without reviewing the AI generated documents to ensure they were accurate and compliant with cGMP, the manufacturer demonstrated an overreliance on the tool. The Regulatory Mix attributes this lack of stewardship to an “Autopilot” mindset.

Immaturity of the quality management system and its quality assurance processes is the likely cause for this mindset. Interestingly, the core QA task to review and approve all GMP documents was philosophically absent or not fully developed and maintained in a controlled procedure let alone supported by employee training and competency assessments.

Drilling down, the manufacturer “…replied that [they] were not aware of the legal requirement, as the AI agent [they] used (b)(4), never told [them] it was required.” A 2025 Microsoft Research survey also discussed in this article defines the term “cognitive offloading” as accepting outputs without scrutiny because of high confidence in the AI tool.

A similar complacency occurred within the US District Court for the Southern District of New York where a lawyer who used ChatGPT to help write court filings that cited six nonexistent cases admitted that he “…didn’t previously consider the possibility that an artificial intelligence tool like ChatGPT could provide false information.”^[3] This has prompted a federal judge in the US District Court for the Northern District of Texas to ban “…submissions written by artificial intelligence unless the AI’s output is checked by a human. US District Judge Brantley Starr also ordered lawyers to file certificates attesting that their filings are either written or reviewed by humans.”^[4]

The FDA warning letter drives actions to prevent these failures with the following stated requirement:

If you plan to resume drug production, and use AI to help with CGMP activities, such as development of procedures and specifications, any output or recommendations from an AI agent must be reviewed and cleared by an authorized human representative of your firm’s QU in accordance with section 501(a)(2)(B) of the FD&C Act. See also 21 CFR 211.22; 21 CFR 211.100.

In the AI domain, the review and clearance of an AI output by an authorized human representative is called human in the loop (HITL). This design approach involves humans at critical control points to provide feedback, approve or reject outputs and correct errors, thereby building the credibility of the AI tool.

To comply with the requirements of 21 CFR 211.22, the manufacturer must first change its mindset to prevent the recurrence of this failure by documenting new requirements in standard operating procedures (SOPs) and fully implementing these SOPs in development processes intended to mitigate adverse impacts on the product (e.g., specification creation). The challenges of these SOPs are to ensure that all AI outputs are reviewed; account for the unique review requirements; and provide reviewers effective guidance to mitigate problems.

When the mindset changes with the maturation of the QMS, the “HITL Workflow” equates to the change control process where all newly created or revised documents are controlled through a QA review and approval prior to release and use. While the AI output in this case is a written document, the workflow should ensure the reviewer is provided sufficient context and presented finished products for proper “interfacing”.

The manufacturer should define an effective procedure for the detection and mitigation of problems or errors including examples of problems and the necessary tools. Multiple tools exist for this purpose^{3^[5]}, but this case requires the manufacturer to implement HITL as the tool to evaluate all information, attest to the validity of the final content and provide the approval.

When AI fails to produce the expected results within the context of use (COU), the employee assigned the review and approval responsibility must be able to identify problems such as hallucinations or “…the phenomenon of AI presenting fabricated information as fact.”^[6] These instances of “…plausible-sounding but factually incorrect or misleading information”^[7] must be determined to be factually accurate, logically consistent, and regulatory compliant.

To comply with the requirements of 21 CFR 211.25, the manufacturer must ensure that these reviewers have the education, training and experience, or any combination thereof, to enable them to mitigate the risks associated with the use of AI tools. This is a new challenge to the industry since QA/RA professionals with education and/or experience in detecting AI errors are likely to be as scarce as those who could detect and investigate digital data integrity problems in years past.

Therefore, an appropriate training and competency assessment program is needed to improve the capability of the review process. This program should build on the reviewers’ QA/RA domain knowledge with an understanding of AI system capabilities and limitations such as edge cases or ambiguous situations that weren’t adequately represented in their training data.

Not only should reviewers be trained to perform the task, but they should also be empowered to identify problems. The manufacturer must look at error detection holistically as opportunities to not only improve the AI outputs in a feedback loop, but also to improve SOPs and training materials for both “writers” and reviewers.

To close the loop on this process, the effectiveness of HITL should be considered in the overall monitoring of processes and procedures during management review. Humans bring irreplaceable qualities such as contextual understanding and creative problem-solving, but are subject to error themselves. Therefore, an analysis of the process capability should lead to regular training and skill development opportunities.

As this warning letter is likely only the first of many actions taken by the agency for AI application failures, the manufacturer with a commitment to quality and a mature QMS will not repeat this violation. Increasing regulatory focus on AI transparency and accountability is likely to drive greater adoption of HITL approaches. With this approach comes the need to ensure that this control measure effectively mitigates the associated risks.

^[1] https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/purolea-cosmetics-lab-722591-04022026

^[2] https://www.linkedin.com/pulse/stewardship-trap-why-your-ai-shortcut-might-regulatory-8y66c/

^[3] https://arstechnica.com/tech-policy/2023/05/lawyer-cited-6-fake-cases-made-up-by-chatgpt-judge-calls-it-unprecedented/

^[4] https://arstechnica.com/tech-policy/2023/05/federal-judge-no-ai-in-my-courtroom-unless-a-human-verifies-its-accuracy/

^[5] https://dev.to/kamya_shah_e69d5dd78f831c/5-ways-to-detect-ai-agent-hallucinations-3hb8

^[6] https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

^[7] https://www.getmaxim.ai/articles/top-5-tools-to-detect-hallucinations-in-ai-applications-a-comprehensive-guide/