Strawberry & AlphaProof: GOFAI* to the rescue of Generative AI!

5 min readSep 11, 2024

In case you missed it… To move forward, AI research is acknowledging the limitations of Large Language Models (LLMs) and is hybridizing with good old-fashioned symbolic reasoning.

Note after the release of Strawberry, ChatGPT o1, or OpenAI o1 — Friday, Sept 13, 2024 😱:

We don’t have much information about the training with synthetic data, but the hypothesis of training the GPT LLM through Low-Rank Adaptation (LoRA) still holds. We even know that hordes of contributors, with or without PhDs, have helped enrich the data with new problems in mathematics, physics, or puzzles… Let’s be honest, this is what we call “plugging holes” through sheer manpower. For me, the true test is ARC-AGI, for which OpenAI o1 only manages a meager 21% [9].

Note that with sufficient effort, OpenAI or Google could very well close the gap by specifically addressing some typical problems of ARC-AGI using the following approach: 1) generate a large amount of synthetic solution data using an LLM; 2) validate and correct the solutions using reinforcement learning from human feedback (RLHF) and symbolic theorem prover; 3) train another LLM on that solution data; 4) finally, at inference time, generate and try many different solutions, choosing the best one.

The longer response time is easily explained by trying several solution approaches, then selecting the best solution by majority vote or another similar mechanism. To mystify or through anthropomorphism, OpenAI talks about “Chain of Thought,” but in fact, it’s a decomposition request (designated as such, it loses its magic). A decomposition request comes in several strategies: from the simple list of steps, i.e., “proceed step-by-step,” the tree-like decomposition request (Tree-of-Thought, ToT), the Skeleton-of-Thought (SoT) request that asks the conversational robot to start by generating a skeleton of the solution, the panel request (PanelGPT) aimed at involving a group of experts with different roles to obtain different scenarios to complete a task, and finally the explicit request for different approaches, i.e., “propose/try different methods,” which corresponds quite well to the concept of “Chain of Thought.”

This is a well-known “advanced prompt engineering” technique 🤣. By proceeding step-by-step or according to different approaches, this allows enriching the context of the previous steps to continue the processing. It’s the very principle of an autoregressive algorithm (the next word is based on all past values). OpenAI is more clear on this point… 😉 in their post, which is more marketing than technical and probably generated with ChatGPT. 🙂

Where there might be something new is in reinforcement learning to generate, choose, and merge the best solutions. The exact formulation of this request is an OpenAI “trade secret.” OpenAI is only ‘Open’ in name.

Sam Altman’s tantalizing hints about OpenAI’s Project Strawberry [1], likely to become GPT-5, show that cutting-edge AI research is moving towards symbolic reasoning. Let’s hope Strawberry can count the exact number of R in its name [2].

For those still in doubt, blinded by anthropomorphism and the ELIZA effect, this is an admission of the limitations of the Large Language Model (LLM) approach in terms of reasoning. This is something many people, like Yann LeCun [3], François Chollet [4], and myself [5], have been tirelessly repeating since the release of ChatGPT, the first generative chatbot.

Previously known by the codename Q* (Q-star) [6], project Strawberry is expected to offer autonomous internet research (already included in SearchGPT [7]) and “considerably” improve the GPT model’s mathematical reasoning capabilities. It is presented as OpenAI’s effort to create Artificial General Intelligence (AGI) — an AI with capabilities similar to those of the human brain.

I expect an extension of ChatGPT’s reasoning capabilities, especially in mastering mathematical language, resulting from the generation of examples (synthetic data) to mitigate the lack of training data in mathematics.

In this respect, it would be a combination of a Large Language Model (LLM) and a symbolic reasoning system of the theorem prover type, following the same approach as Deepmind’s AlphaProof and AlphaGeometry [8].

In AlphaProof, DeepMind uses a Large Language Model (LLM) and a theorem prover, reproducing the old classic AI meta-algorithm “generate-and-test”, but adding a “train” step that uses the solutions validated by the theorem prover for fine-tuning of the LLM (see illustration above).

Data generation in the form of candidate solutions is performed by a LLM. Since the generated solutions may be incorrect or incomplete, an external tester (or verifier), in fact a theorem prover based on symbolic logic, verifies the solution proposed by the LLM. The verifier can also correct or complete the solution if necessary. Finally, the corrected solution is used to train the LLM through low-rank adaptation (LoRA), which avoids having to retrain the entire pre-trained model.

In AlphaProof, two LLMs are used. First, a formalization LLM converts the natural language problem description into a formal mathematical language. Then a second LLM serves as a solution generator. These solutions are then validated by a theorem prover based on symbolic logic. The corrected and completed proof is used as data to train the solution generator LLM.

In the same vein, other promising approaches are neurosymbolic AI and Generative Flow Networks (GFlowNets).

In conclusion, to progress, AI research is acknowledging the limitations of large language models (LLM) based on deep learning and is advocating for hybridization with good old-fashioned symbolic reasoning.

Notes and references:

*GOFAI: Good Old-Fashioned AI

« OpenAI’s Project Strawberry will become ChatGPT-5, launch soon, and be better at math than any chatbot, insiders say » — techradar — August 28, 2024.
Until recently, the simple query “How many Rs are in the word strawberry?” returned an incorrect result.
« Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI » — Lex Fridman Podcast #416 — March 7, 2024.
« Francois Chollet — LLMs won’t lead to AGI — $1,000,000 Prize to find true solution » — Dwarkesh Patel’s YouTube channel — June 11, 2024.
« ChatGPT, un enthousiasme prudent s’impose… » — (In French) LinkedIn post by Claude Coulombe — December 7, 2022.
« These Clues Hint at the True Nature of OpenAI’s Shadowy Q* Project » — Wired — November 30, 2024.
Probably just by adding search results to the inference context of the generative chatbot.
« AI achieves silver-medal standard solving International Mathematical Olympiad problems » — DeepMind Blog — July 25, 2024.
OpenAI o1 Results on ARC-AGI-Pub — ARC Prize blog post — September 13, 2024.

Strawberry & AlphaProof: GOFAI* to the rescue of Generative AI!

Notes and references:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Claude COULOMBE

No responses yet