JAY ZENITH

I enjoy training, evaluating, and serving LLM agents - right now that's glyph, a Rust tool-use agent taken from SFT through verifier RL against strict whole-trace evals.

  • glyph

    Rust tool-use agent on a 4B model, taken from SFT through verifier RL against strict whole-trace evals - owning the whole stack from synthetic data to pass@k analysis. [blog]

  • llama.cpp - CUDA upstream

    Added GGML_OP_FILL on the CUDA backend for the Qwen3-Next path; removed a CPU fallback. Merged upstream.

  • llama.cpp - sampling hot path

    Cut a redundant O(vocab) allocation from token sampling. 1.9-2.2x on the sampling microbench.

  • mini-sglang - Mistral support

    Sliding-window attention for Mistral-7B, validated against Hugging Face past 6k tokens.

  • pd disaggregation benchmark

    A/B harness for prefill/decode split vs colocated serving.