Artificial intelligence (AI) is increasingly being leveraged to solve important scientific challenges. Given the ability of large language models (LLMs) to process and synthesize vast amounts of information, they may hold promise for predicting experimental outcomes, informed by the extensive body of scientific literature. This would assist scientists in formulating hypotheses and designing experiments more effectively. Luo and colleagues explore this potential in a recent publication in Nature Human Behaviour. They developed a forward-looking benchmark, BrainBench, which demonstrated that LLMs outperformed 171 human experts in predicting the true outcomes of 200 published neuroscience studies when provided only with the background and methods sections of the abstracts. Interestingly, LLMs and humans did not struggle with the same examples, whereas the four LLMs were more aligned. The authors then finetuned one of the LLMs using a dataset of 1.3 billion tokens from neuroscience publications across 100 journals from 2002 to 2022, which improved the predictive accuracy. It could be argued that overreliance on such methods might reduce unexpected but disruptive findings; however, they may also enhance multidisciplinary communication by aligning hypotheses with broader scientific insights, and potentially bring us closer to fundamental truths.
Original reference: Nat. Hum. Behav. https://doi.org/10.1038/s41562-024-02046-9 (2024)
留言 (0)