History

Leni Aniva ffacb67a03 fix: DSP output		2024-12-11 21:30:40 -08:00
..
debug	refactor: Experiments into their own folders	2024-10-01 11:06:01 -07:00
lean_src_proj	chore: Update Pantograph and Lean version to 4.12	2024-10-06 22:04:10 -07:00
solve	feat: Plot generation for DSP	2024-10-08 19:20:57 -07:00
.gitignore	feat: Output experiment result into folder	2024-10-05 01:23:38 -07:00
README.md	feat: o1-preview experiments	2024-10-08 21:44:14 -07:00
main.py	fix: DSP output	2024-12-11 21:30:40 -08:00
plot.py	feat: Plot generation for DSP	2024-10-08 19:20:57 -07:00

README.md

Lean Draft Sketch Prove (DSP)

based on Sean Welleck's DSP for Isabelle: https://github.com/wellecks/ntptutorial/tree/main/partII_dsp

Execution

First of all, build the experiment repo.

# experiments/dsp
cd lean_src_proj
lake build

Then run main.py

python3 main.py -h

The main command for running DSP is eval. Due to the multitude of data format out there, use the --format flag to specify the data format. For example, running DSP on minif2f is:

python3 main.py eval \
    --dataset ../minif2f/valid.jsonl \
    --format minif2f \
    --output results-minif2f-valid

Then, use plot.py to generate the plots

python3 plot.py \
    --result results-minif2f-{valid,test} \
    --names valid test \
    --plot-output output-plot

Tony's AF

Ton'y original AF: ((Yuhuai et al.))[https://arxiv.org/abs/2205.12615] Tony's paper improve MiniF2F from: 29.6% to 35.2%, by 5.6%.

Expert Iteration:

AF used: "We explore if one can improve neural theorem provers by training the neural models on proofs of automatically translated theorems".
- they only translate problem theorems (nl_thm := "problem + answer") then use a prover to get the formal proof.
ExpIt algorithn:
- M_0 := Isabelle_Thor()
- Search/Prover := Best_First_Search() # TODO recall best first search
- ExpIT.fine_tune := "train model to predict next proof_step/tactic given current proof_state and previous proof_step on successful proofs.
  - i.e., <x=(proof_state_{t}, proof_step_{t-1}), y=(proof_step_{t})> #TODO: I think, confirm with Albert https://twitter.com/messages/1253358235-1267913180153548800

Base Model for Neural Theorem Prover (NTP):

Thor_GPT2 := "We use a pre-trained and fine-tuned Thor based on a GPT-2 with 700M non-embedding parameters." Note: ReProver used 299M parameters enc-dec.
fine-tuned on the PILE arxiv + github

Neural Theorem Prover (NTP) for M_0:

Thor :=
- The Thor agent is fine-tuned on the PISA dataset which consists of 2.49 million proof steps from the Isabelle/HOL library.
- The model is trained with the objective to predict the next token in va proof step, given the proof state and the last proof step.
- proof step := "tactic in Isabelle" #TODO confirm with Albert https://twitter.com/messages/1253358235-1267913180153548800

Questions:

Q1: what is this: "we perform deduplication by problem statements" when does it matter? All MATH train are unique, so why would I care about this?

Idea:

Idea1: use the formal ground truth solution string in MATH, implement Draft Sketch Proof (DSP) for Lean4 + use some symbolic/ntp solver (hammer/tidy/ReProver)

README.md

Lean Draft Sketch Prove (DSP)

Execution

Related work

Tony's AF