Pantograph/docs/data.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe7a3037-5c49-4097-9a5d-575b958cc7f8",
   "metadata": {},
   "source": [
    "# Data Extraction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "fc68ad1d-e64c-48b7-9461-50d872d30473",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "from pantograph.server import Server"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd13c644-d731-4f81-964e-584bbd43e51c",
   "metadata": {},
   "source": [
    "## Tactic Invocation\n",
    "\n",
    "Pantograph can extract tactic invocation data from a Lean file. A **tactic\n",
    "invocation** is a tuple containing the before and after goal states, and the\n",
    "tactic which converts the \"before\" state to the \"after\" state.\n",
    "\n",
    "To extract tactic invocation data, use `server.tactic_invocations(file_name)`\n",
    "and supply the file name of the input Lean file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6282dc6f-4eac-4263-8277-9d54d19ad1a5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "$PWD: /home/aniva/Projects/atp/PyPantograph/examples/Example\n"
     ]
    }
   ],
   "source": [
    "project_path = Path(os.getcwd()).parent.resolve() / 'examples/Example'\n",
    "print(f\"$PWD: {project_path}\")\n",
    "server = Server(imports=['Example'], project_path=project_path)\n",
    "units = server.tactic_invocations(project_path / \"Example.lean\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3c1be91-27a5-4481-b09d-a32dbb94b058",
   "metadata": {},
   "source": [
    "The function returns a list of `CompilationUnit` objects, corresponding to each compilation unit in the input Lean file. For performance reasons only the text boundaries are loaded into `CompilationUnit`s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e994aa2b-5d5e-4f86-af6c-40e0b3a032d2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#0: [14,85]\n",
      "/-- Ensure that Aesop is running -/\n",
      "example : α → α :=\n",
      "  by aesop\n",
      "\n",
      "\n",
      "#1: [85,254]\n",
      "example : ∀ (p q: Prop), p ∨ q → q ∨ p := by\n",
      "  intro p q h\n",
      "  -- Here are some comments\n",
      "  cases h\n",
      "  . apply Or.inr\n",
      "    assumption\n",
      "  . apply Or.inl\n",
      "    assumption\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open(project_path / \"Example.lean\", 'rb') as f:\n",
    "    content = f.read()\n",
    "    for i, unit in enumerate(units):\n",
    "        print(f\"#{i}: [{unit.i_begin},{unit.i_end}]\")\n",
    "        unit_text = content[unit.i_begin:unit.i_end].decode('utf-8')\n",
    "        print(unit_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52e650fc-4a87-445f-8aa8-707ed9e36c03",
   "metadata": {},
   "source": [
    "Each `CompilationUnit` includes a list of `TacticInvocation`s, which contains the `.before` (corresponding to the state before the tactic), `.after` (corresponding to the state after the tactic), and `.tactic` (tactic executed) fields. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8e0f0def-dd3c-4550-8a7c-b4aec6c7fd7f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Before]\n",
      "α : Sort ?u.7\n",
      "⊢ α → α\n",
      "[Tactic]\n",
      "aesop (using [])\n",
      "[After]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in units[0].invocations:\n",
    "    print(f\"[Before]\\n{i.before}\")\n",
    "    print(f\"[Tactic]\\n{i.tactic} (using {i.used_constants})\")\n",
    "    print(f\"[After]\\n{i.after}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "51f5398b-5416-4dc1-81cd-6d2514758232",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Before]\n",
      "⊢ ∀ (p q : Prop), p ∨ q → q ∨ p\n",
      "[Tactic]\n",
      "intro p q h (using [])\n",
      "[After]\n",
      "p q : Prop\n",
      "h : p ∨ q\n",
      "⊢ q ∨ p\n",
      "[Before]\n",
      "p q : Prop\n",
      "h : p ∨ q\n",
      "⊢ q ∨ p\n",
      "[Tactic]\n",
      "cases h (using ['Eq.refl', 'Or'])\n",
      "[After]\n",
      "case inl\n",
      "p q : Prop\n",
      "h✝ : p\n",
      "⊢ q ∨ p\n",
      "case inr p q : Prop h✝ : q ⊢ q ∨ p\n",
      "[Before]\n",
      "case inl\n",
      "p q : Prop\n",
      "h✝ : p\n",
      "⊢ q ∨ p\n",
      "[Tactic]\n",
      "apply Or.inr (using ['Or.inr'])\n",
      "[After]\n",
      "case inl.h\n",
      "p q : Prop\n",
      "h✝ : p\n",
      "⊢ p\n",
      "[Before]\n",
      "case inl.h\n",
      "p q : Prop\n",
      "h✝ : p\n",
      "⊢ p\n",
      "[Tactic]\n",
      "assumption (using [])\n",
      "[After]\n",
      "\n",
      "[Before]\n",
      "case inr\n",
      "p q : Prop\n",
      "h✝ : q\n",
      "⊢ q ∨ p\n",
      "[Tactic]\n",
      "apply Or.inl (using ['Or.inl'])\n",
      "[After]\n",
      "case inr.h\n",
      "p q : Prop\n",
      "h✝ : q\n",
      "⊢ q\n",
      "[Before]\n",
      "case inr.h\n",
      "p q : Prop\n",
      "h✝ : q\n",
      "⊢ q\n",
      "[Tactic]\n",
      "assumption (using [])\n",
      "[After]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in units[1].invocations:\n",
    "    print(f\"[Before]\\n{i.before}\")\n",
    "    print(f\"[Tactic]\\n{i.tactic} (using {i.used_constants})\")\n",
    "    print(f\"[After]\\n{i.after}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}