Posts

Fine-Tuning Instability for Large Language Models

This blog post investigates the current state of instability for fine-tuning large language models (LLMs), and some improvements that have been made since the advent of BERT. In particular, it is determined that the most serious forms of instability, failed fine-tuning runs and catastrophic forgetting, still remain an issue. An in depth look at \(L^2-\mathrm{SP}\) regularization is taken, a technique claimed to mitigate these.

August 31, 2022 Read

Investigating Reinforcement Learning for Extremal Combinatorics

Reinforcement learning is used to construct counterexamples to a conjecture relating the index and matching number of a graph. The possibility of applying RL to Sperner families is also investigated. This is posted as a short online book.

March 27, 2022 Read

Transformer Implementation with the High-Level Keras API

The purpose of this post is to gain an understanding of the transformer architecture by constructing a transformer from scratch. What differentiates our construction from the numerous other tutorials is the use of the high-level Keras API. An in depth explanation of the transformer model is also given that might be helpful especially for others coming from a mathematical background. Model Construction and Explanation This content is posted in an online book or alternatively as a PDF book.

June 22, 2021 Read

Text Classification with the High-Level TensorFlow API

The present post was originally published as a Medium article of the same title: Text Classification with the High-Level TensorFlow API. It is reproduced here so that a Medium membership is not required for access. Update. The TensorFlow API has changed somewhat since this article was written. Notably the article predates TensorFlow version 2, and uses the outdated version 1. In this blog post we share our experience, in considerable detail, with using some of the high-level TensorFlow frameworks for a client’s text classification project.

April 3, 2018 Read