AI Search Engines for Science: the Good, the Bad, and the Ugly

Rowan Copley

One of the promises, arguably the main promise, of Google in the early years was that now all the world's information was at your fingertips. It was a huge step change at the time. And though the search engine has continued to change (in recent years for the worse), another product they created way back when has changed a lot less: Google Scholar.

Google Scholar works well for what it is: I can throw in a few keywords, maybe filter by year, and find a bunch of articles that are in the general vicinity of what I'm looking for. But we have come so far since the days of PageRank and keyword search plus a bit of semantic dusting on top. There's a whole new crop of LLM and AI-infused search engines for academic articles. What's out there? And does it tell us anything about where knowledge retrieval is heading?

Surveying the landscape

I reached out to various academic communities I'm part of and did some searches, ending up with a list of a few dozen search engines–some I'd tried, some I hadn't. Obviously, for this article I'm not interested in variations on Google Scholar or indexes: so while of course RefSeek, Science.gov, and BASE have their uses, they don't appear to be leveraging modern language models. There are also a few domain-specific engines I encountered like Luva and Emergent Mind, but I want all of science, or as close as is possible given the current realities of licensing journal content. Then I eliminated projects which mostly ignore the paper itself for the metadata (Research Rabbit, Litmaps, and Connected Papers), and neat tech demos (HasAnyone), and I also decided to eliminate a few platforms that seemed to just be variations of article retrieval and RAG but worse. It's never a good sign when I try out your platform and my first search question yields only irrelevant results, and then an LLM is forced to write a paragraph regretfully explaining how each result is irrelevant to my query. It feels like a waste of time for me and, frankly, the model.

Eventually I whittled the list down to this:

- Exa: an SF-based VC-funded startup that looks like it wants to build a better version of Google

- Elicit: another new VC-funded Bay Area startup that is focused on academic search, as well as consulting services

- Consensus: a Boston-based startup founded by people from the sports world

- Scite: a Brooklyn startup acquired by Research Solutions last year as part of that company's pivot into AI

- Semantic Scholar: a tool created by the Allen Institute for Artificial Intelligence, a non-profit

And as my baseline comparisons:

- Google Scholar

- Claude and/or Perplexity

I unfortunately wasn't able to have one of my baselines be "go to an academic library and ask a librarian" but that's definitely a viable research pathway for academics and something these startups know they're competing with.

I tried a range of questions on them, from "What's the ellipticity of Earth?" to "What are the measurable cognitive effects of caffeine?" to fishing for my own published papers from back in the day with searches like "What are the effects of embargoes in simulated agent-based markets?" Eventually three major questions emerged:

1. How much generated text is the right amount for a research assistant?

2. Are these searches significantly more precise (than Google Scholar, et al)

3. Does this clearly beat Google Scholar + chatting with an LLM as a product that seems to have legs?

A few facepalm moments to eliminate a few more

Consensus gives me consistently bad search results, even with their paid Pro plan. I honestly couldn't run a test on this platform without being frustrated about something: either bafflingly irrelevant search results or an overconfident generated text undermined at least half of my tests. I just don't understand how they can be this bad: how does a search for papers on agent-based models lead to a nursing school training research? Or a query about the environmental impacts of LLMs bring up papers about lean manufacturing? These are the first papers in the list of search results.

I also wasn't a fan of Epsilon AI. It didn't seem to be any more than a RAG of all of science, and there are better alternatives of that. And it is slow, taking tens of seconds to show me retrieved papers. I get that generated text is expensive, but making me wait over a minute to see the generated text that is the entire point of your product kills it for me.

How much generated text is ideal, if any?

Sometimes a language model will give you exactly the answer that you need. Claude is better than any of the services I tried at answering questions like, "what's the ellipticity of Earth?" or "Who in Seattle built a video game where you make robots out of DNA?" (The answer to the first is 0.0033, and the answer to the second is my team at a University of Washington lab in 2013.) But language models hallucinate, so I can never fully trust generated text. So one model of a research platform is to just give language models papers to reference while they're writing a short essay answering my question. The platforms that take this tack are Elicit, Epsilon, and Consensus.

But another model, and the one I ended up preferring, is to use generated text as either an on-ramp into literature or a way to interrogate results after you'd already found them. These search engines prioritize getting the most relevant passages of human-written text in front of you fast, possibly with some intermediary generated summaries along the way. The platforms I tried in this category were Scite, Exa, and Semantic Scholar.

I have two main problems with the services that heavily emphasize chatting with an LLM: 1) I just can't fully trust what it's telling me, and 2) an essay is usually not my preferred user interface. The more the service emphasizes being in dialogue with the chatbot, the more it feels like homework to actually check its sources, but we just don't have the language models today where you can take everything they summarize from the retrieved papers as gospel. The essays are sometimes nice when I don't know much about the domain I'm poking around in, but they're just less information-dense than the actual paper results so I usually want to see those pretty quickly.

Exa's product decision here is simple: put the chatbot behind a button push, after the user gets their results. You can throw a few papers into a single chat, and it makes sense because that lets you choose the papers that are actually relevant, and opt-in to the slower results of generated text.

Scite also doesn't return any generated text by default, instead giving you a list of passages from the papers that should be most relevant to your query, along with some context and metadata about those passages' origins. They have a chatbot that relies on the "smart citation" results set, but puzzlingly, you have to perform the search all over again if you want to switch from search mode to assistant mode.

That said, there are definitely times when a generated essay is better than search results, like if your question itself is wrong. Searching for "what's the eccentricity of the Earth?" it actually took Claude to mention that it was giving me the numerical value for the eccentricity of Earth's orbit–the term for the deformation of a sphere is ellipticity.

Are these platforms significantly more precise?

It depends on what kind of search you're doing, but I'd say overall the searches seem better, but not miles ahead. Sometimes Google Scholar would actually be better than Elicit or scite or Semantic Scholar. Sometimes even Consensus would return decent results. This probably also depends on the subset of scientific papers that each service has–I know Consensus, Elicit, and Epsilon all pull from Semantic Scholar, but I couldn't find substantive information on where Exa or Scite get their papers. Sometimes Semantic Scholar's results are great, and sometimes it just returns zero results with queries other platforms handle just fine.

Overall, from the week of testing I gave all the platforms, they're definitely better than Google Scholar et al–just not miles ahead.

Do any clearly beat Google Scholar?

Going through this whole process has kind of reinforced to me how great Google Scholar is. There are some compelling features in these new search engines–scite's "smart citations" and Elicit's tabulated generation are nice–but Google Scholar is free and, possibly more importantly, has been stable for 20 years. If I was using these platforms every day I'm sure I'd want a platform that gave me lots more power user features, but yeah I'm just gonna say it, Google Scholar is still holding its own for now.

If scite's search was as good as Exa's (or even Google Scholar's, depending on your query), I would probably use it frequently, maybe even daily. But the fact that each search result takes up more space means that each missed result's effect is magnified, and I've had enough missed results that I can't enthusiastically recommend the service in its current form. It's just not (yet) the killer feature that it could be.

Parting thoughts

I'm not an academic, I'm just a humble hacker who has spent years as a research engineer and at startups spinning research out into products. So my interest in these tools skews heavily towards the power user side of products, but I also can't speak much on the tools that someone writing papers as part of their job is going to need.

There's also a huge caveat to all these platforms, especially those that are young startups. Change is very likely--not only might we see more step changes in model capabilities, but we'll almost certainly see business models change as startups try to pivot into profitability. Plenty of them will be acquired by larger companies, which might mean an end to the product as it exists--and some of the search engines feel less like viable products and more like advertisements for the team.

Maybe one of these platforms will figure out how to make the generated essay format work well, but embeddings-based search is just better right now as the main search modality. Keep the chatbot off to the side so the user can opt-in when they want it. As for what this all says about the future of knowledge retrieval? All the pieces of a great pro search experience are there. Someone just needs to put them together and make it a sustainable business.

Related Stories

Applied AI

7 Prerequisites for AI Tranformation in Healthcare Industry

Applied AI

7 Strategies to Improve Customer Care with AI

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Finance

Applied AI

A Guide to AI in Insurance: Use Cases, Examples, and Statistics

Applied AI

How the U.S. can accelerate AI adoption: Tribe AI + U.S. Department of State

Applied AI

10 AI Techniques to Improve Developer Productivity

Applied AI

Thoughts from AWS re:Invent

Applied AI

How to build a highly effective data science program

Applied AI

How AI is Cutting Healthcare Costs and Streamlining Operations

Applied AI

AI and Predictive Analytics in Investment

Applied AI

Why do businesses fail at machine learning?

Applied AI

Common Challenges of Applying AI in Insurance and Solutions

Applied AI

10 Expert Tips to Improve Patient Care with AI

Applied AI

AI in Banking and Finance: Is It Worth The Risk? (TL;DR: Yes.)

Applied AI

What our community of 200+ ML engineers and data scientist is reading now

Applied AI

Tribe's First Fundraise

Applied AI

How to Build a Data-Driven Culture With AI in 6 Steps

Applied AI

No labels are all you need – how to build NLP models using little to no annotated data

Applied AI

A Deep Dive Into Machine Learning Consulting: Case Studies and FAQs

Applied AI

8 Prerequisites for AI Transformation in Insurance Industry

Applied AI

Leveraging Data Science – From Fintech to TradFi with Christine Hurtubise

Applied AI

How AI Improves Knowledge Process Automation

Applied AI

Key Generative AI Use Cases From 10 Industries

Applied AI

8 Ways AI for Healthcare Is Revolutionizing the Industry

Applied AI

Top 9 Criteria for Evaluating AI Talent

Applied AI

5 machine learning engineers predict the future of self-driving

Applied AI

AI in Construction in 2024 and Beyond: Use Cases and Benefits

Applied AI

Generative AI: Powering Business Growth across 7 Key Operations

Applied AI

AI for Cybersecurity: How Online Safety is Enhanced by Artificial Intelligence

Applied AI

AI Consulting in Finance: Benefits, Types, and What to Consider

Applied AI

AI Implementation: Ultimate Guide for Any Industry

Applied AI

AI Security: How to Use AI to Ensure Data Privacy in Finance Sector

Applied AI

Tribe welcomes data science legend Drew Conway as first advisor 🎉

Applied AI

AI in Construction: How to Optimize Project Management and Reducing Costs

Applied AI

7 Key Benefits of AI in Software Development

Applied AI

Current State of Enterprise AI Adoption, A Tale of Two Cities

Applied AI

How to Use Generative AI to Boost Your Sales

Applied AI

An Actionable Guide to Conversational AI for Customer Service

Applied AI

Top 8 Generative AI Trends Businesses Should Embrace

Applied AI

Scalability in AI Projects: Strategies, Types & Challenges

Applied AI

Key Takeaways from Tribe AI’s LLM Hackathon

Applied AI

Self-Hosting Llama 3.1 405B (FP8): Bringing Superintelligence In-House

Applied AI

Top 10 Common Challenges in Developing AI Solutions (and How to Overcome Them)

Applied AI

How data science drives value for private equity from deal sourcing to post-investment data assets

Applied AI

A Gentle Introduction to Structured Generation with Anthropic API

Applied AI

AI and Blockchain Integration: How They Work Together

Applied AI

Navigating the Generative AI Landscape: Opportunities and Challenges for Investors

Applied AI

AI in Customer Relationship Management

Applied AI

Everything you need to know about generative AI

Applied AI

AI in Portfolio Management

Applied AI

10 ways to succeed at ML according to the data superstars

Applied AI

Announcing Tribe AI’s new CRO!

Applied AI

How to Enhance Data Privacy with AI

Applied AI

Welcome to Tribe House New York 👋

Applied AI

Using data to drive private equity with Drew Conway

Applied AI

Top 5 AI Solutions for the Construction Industry

Applied AI

Segmenting Anything with Segment Anything and FiftyOne

Applied AI

How 3 Companies Automated Manual Processes Using NLP

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Insurance

Applied AI

Advanced AI Analytics: Strategies, Types and Best Practices

Applied AI

How AI Enhances Hospital Resource Management and Reduces Operational Costs

Applied AI

3 things we learned building Tribe and why project-based work will change AI

Applied AI

AI Consulting in Healthcare: The Complete Guide

Applied AI

AI in Private Equity: A Guide to Smarter Investing

Applied AI

Understanding MLOps: Key Components, Benefits, and Risks

Applied AI

How to Seamlessly Integrate AI in Existing Finance Systems

Applied AI

From PoC to Production: Scaling Bright’s Training Simulations with Tribe AI & AWS Bedrock

Applied AI

AI Diagnostics in Healthcare: How Artificial Intelligence Streamlines Patient Care

Applied AI

How to Optimize Supply Chains with AI

Applied AI

Making the moonshot real – what we can learn from a CTO using ML to transform drug discovery

Applied AI

AI Implementation in Healthcare: How to Keep Data Secure and Stay Compliant

Applied AI

A primer on generative models for music production

Applied AI

State of AI: Adoption, Challenges and Recommendations by Tribe AI

Applied AI

What the OpenAI Drama Taught us About Enterprise AI

Applied AI

AI in Finance: Common Challenges and How to Solve Them

Applied AI

10 Common Mistakes to Avoid When Building AI Apps

Applied AI

How to Measure and Present ROI from AI Initiatives

Applied AI

AI and Predictive Analytics in the Cryptocurrency Market

Applied AI

How to Evaluate Generative AI Opportunities – A Framework for VCs

Applied AI

Write Smarter, Not Harder: AI-Powered Prompts for Every Product Manager

Applied AI

How AI Enhances Real-Time Credit Risk Assessment in Lending

Applied AI

Best Practices for Integrating AI in Healthcare Without Disrupting Workflows

Applied AI

The Secret to Successful Enterprise RAG Solutions

Applied AI

7 Effective Ways to Simplify AI Adoption in Your Company

Applied AI

The Hitchhiker’s Guide to Generative AI for Proteins

Applied AI

AI-Driven Digital Transformation

Applied AI

How AI for Fraud Detection in Finance Bolsters Trust in Fintech Products

Applied AI

AI Consulting in Insurance Industry: Key Considerations for 2024 and Beyond

Applied AI

How to Improve Sales Efficiency Using AI Solutions

Applied AI

How to Measure ROI on AI Investments

Applied AI

Machine Learning in Healthcare: 7 real-world use cases

Get started with Tribe

Companies

Find the right AI experts for you

Talent

Join the top AI talent network

Close
Rowan Copley
Rowan Copley is a research engineer and writer who lives in the City of Roses, aka Portland, Oregon.