Fine-Tuning Machine Learning Models on MacBook Pro M1 Max

A Developer’s Guide

Apr 23, 2024

Developing and fine-tuning machine learning models on a MacBook Pro with the M1 Max chip can be an exhilarating experience, thanks to its impressive capabilities and performance optimizations. For developers poised to leverage this powerful hardware, here’s a comprehensive guide, tailored to enhancing your machine learning workflows and harnessing the full potential of your device. Based on the insights from my experience using Mac M1 Max for fine tuning, this post provides actionable advice for optimizing your setup.

First and foremost, make sure Tensorflow and Metal are installed and running. Instructions can be found here. This also has a nice small GPU project that you can run to validate the GPU efficiency.

The first training and testing file is image classification algorithm. The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class. There are 500 training images and 100 testing images per class. There are 50000 training images and 10000 test images. The 100 classes are grouped into 20 superclasses. There are two labels per image - fine label (actual class) and coarse label (superclass).

More details about this can be found here.

Here is how a sample training set will look like:

{
  'img': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32 at 0x2767F58E080>, 'fine_label': 19,
  'coarse_label': 11
}

The training data set is about 50k and testing data is about 10k. It usually runs with 5 EPOCHs, meaning 782 cycles for each EPOCH consuming about < A GB per cycle. By the end of this cycle, your GPU would have processed: ~.75x782x5 = 2 TB of memory process and 4 hours minimum on Mac M1 Pro with 64 GB RAM with max cache of 24 GB.

Things to watch: Pay attention to the loss function, accuracy, EPOCH execution cycles and respective processor load. Keep the device utilization level lower so the renderer utilization is kept under 25%

Congratulations!! You are now Mac Hardware capable AI Fine Tuner. Remember, LLM fine tuning is whole different ball game. The above example is only a warm-up of ML execution capabilities on your Mac. More high intensity work load is yet to begin

What next?

Here are some tips to consider:

1. Leverage the Power of Core ML

Take full advantage of Apple’s Core ML tools which are optimized for the M1 Max architecture. Core ML can dramatically accelerate machine learning tasks by utilizing the full capability of the hardware. Convert your models to Core ML format to see reductions in inference latency and increased processing speed.
Start with Ollama. I have created a small Jupyter notebook that can be used as a starter.
Use GPU Pro app from the AppStore for monitoring GPU loads, utilization, memory in use. Watch the spike when you actually run LLM loads or Fine tune loads to profile the memory and GPU usage.

2. Optimize Training Parameters

Begin by configuring your training parameters specifically for the M1 Max’s capabilities. Adjust num_train_epochs, per_device_train_batch_size, and gradient_accumulation_steps to find the right balance that maximizes the throughput and efficiency of your GPU and CPU.

3. Use Optimizers Smartly

Opt for modern optimizers like AdamW or paged AdamW, which are more efficient and better suited for the kinds of parallel computations possible on M1 Max. This choice helps in reducing memory overhead and enhances the speed of convergence.

4. Implement Learning Rate Schedulers

Employ learning rate schedulers to dynamically adjust the learning rate during training. The M1 Max’s fast computation speeds allow you to experiment with more aggressive schedules, potentially speeding up training times without sacrificing accuracy.

5. Enable Mixed Precision Training

Utilize the mixed precision training capabilities of the M1 Max by setting fp16 to True. This will allow you to double the batch size or speed up training, making efficient use of the M1 Max’s GPU architecture.

6. Monitor with TensorBoard or GPU Pro

Integrate TensorBoard to track and visualize model training. This tool is invaluable for monitoring model performance in real-time, helping you make informed adjustments to training parameters based on live data.

7. Regularize to Prevent Overfitting

With powerful hardware, there’s a temptation to build larger models that might overfit. Employ regularization techniques like weight decay (weight_decay=0.001) to prevent this. Regularization ensures that your model generalizes well to unseen data.

8. Implement Gradient Clipping

To avoid issues with exploding gradients, especially in large models or complex datasets, implement gradient clipping (max_grad_norm=0.3). This practice ensures stable and reliable model training under various data conditions.

9. Experiment with Batch Sizes and Group Data

Experiment with different per_device_train_batch_size settings to optimize memory usage and processing speed. Also, enable group_by_length to minimize padding in datasets, which is particularly effective when processing large sequences of data.

10. Utilize Efficient Data Handling

Make use of the M1 Max’s fast SSD by optimizing data loading and preprocessing. Efficient data handling reduces I/O bottlenecks, enabling faster iterations during model training.

11. Adapt to the Ecosystem

The macOS and the M1 Max architecture support a unique ecosystem. Ensure that all your tools, libraries, and dependencies are updated and optimized for ARM architecture to avoid compatibility issues.

12. Use Virtualization Sparingly

While tools like Docker and virtual machines are invaluable, they might not yet be fully optimized for M1 chips. Use native tools whenever possible to avoid the overhead that comes with emulation or virtualization.

13. Stay Updated

Keep your system and all machine learning libraries up to date. Apple frequently releases updates that enhance the performance and capabilities of their chips, which can directly improve the efficiency of your machine learning workflows.

14. Engage with the Community

Participate in forums and communities that focus on machine learning with Apple hardware. Exchanging tips and solutions with other developers can help you solve unique challenges and stay ahead of the curve.

Conclusion

Fine-tuning machine learning models on the MacBook Pro M1 Max opens up a realm of possibilities for developers. By optimizing your setup and workflow as suggested, you can truly capitalize on this powerful hardware, pushing the boundaries of what your machine learning models can achieve.

Ram’s Substack

Discussion about this post