Google Research just dropped two new AI agents aimed at the academic workflow, and honestly, they’re more interesting than I expected. We’ve all seen the flood of “AI for X” announcements, but these two actually target real pain points that anyone who’s ever submitted to a top conference will recognize instantly.
The first is PaperVizAgent (formerly called PaperBanana, which I kind of prefer), a system that generates publication-ready figures from your manuscript text. The second is ScholarPeer, an automated reviewer that evaluates papers, including their figures. Both are described in papers and code that are already available.
Let’s start with PaperVizAgent, because this is the one that made me sit up. If you’ve ever spent a weekend wrestling with matplotlib or Inkscape to produce a methodology diagram that actually communicates what your model does, you know the pain. The agent takes two inputs: your method section (source context) and a figure caption (communicative intent). Then it orchestrates five specialized sub-agents: a retriever, a planner, a stylist, a visualizer, and a critic.
The retriever and planner gather reference figures from existing literature and organize the content. The stylist ensures the output matches academic aesthetics — no Comic Sans, no garish color schemes. The visualizer renders the actual image or generates executable Python code for statistical plots. Then the critic checks the output against the original text and loops back for refinement if something’s off.
Their evaluations show it beats GPT-Image-1.5, Nano-Banana-Pro (cute name), and Paper2Any on quality metrics. The examples in the paper look genuinely good — clean architecture diagrams, properly labeled axes, consistent color palettes. This isn’t just “AI slop” that looks okay at first glance but falls apart under scrutiny.
ScholarPeer is the more controversial one. It’s an automated reviewer that reads a paper, evaluates its contributions, checks the figures, and produces a structured review with scores and actionable feedback. They claim it outperforms existing automated reviewers, which isn’t a high bar, but the key innovation is that it can actually interpret figures, not just text.
The system retrieves relevant literature to ground its evaluation, which means it can flag claims that don’t match established results. It also checks for common issues like missing baselines, insufficient statistical reporting, or figures that don’t support the claims in the text. The output includes a summary, a list of strengths and weaknesses, and a recommendation.
Now, I have mixed feelings about automated peer review. On one hand, the current system is clearly broken. Reviewers are overworked, reviews are inconsistent, and the sheer volume of submissions means quality suffers. I’ve had reviews that clearly didn’t read past the abstract. Something is better than nothing.
But there’s a real danger here. If ScholarPeer becomes widely used, authors will optimize for it. They’ll write papers that score well with the automated reviewer, even if that means gaming the system. We’ve seen this with every automated evaluation metric in NLP — BLEU, ROUGE, you name it. People optimize for the metric, not the underlying quality.
The other concern is that ScholarPeer might reinforce existing biases in the literature. It retrieves and references existing papers, which means it’s more likely to accept work that aligns with established paradigms and reject genuinely novel approaches that don’t fit the template. That’s already a problem with human reviewers, but at least humans can be convinced by a compelling argument.
Still, I think Google is on the right track here. These agents are designed to assist, not replace, human researchers and reviewers. PaperVizAgent can save hours of figure-drawing time. ScholarPeer can catch obvious issues before submission, acting as a sanity check. Used properly, they could actually improve the quality of submissions and reduce the burden on reviewers.
Both projects are open source, which is the right move. The code is available, and the papers are detailed enough that the community can replicate and extend the work. I’d love to see what happens when people start building on these ideas, maybe adding domain-specific knowledge or integrating with existing tools like Overleaf or Jupyter.
The bottom line: these are practical tools that solve real problems. They’re not perfect, and they come with risks, but they’re a genuine attempt to use AI to improve the academic process rather than just generate more papers. That’s refreshing.
Comments (0)
Login Log in to comment.
Be the first to comment!