Download a file from HugginfFace: This means: If you want to run this model with more context, you have to set the KV cache to a smaller size: This way it still runs on a Macbook Pro M1 with 32GB of RAM. Note that if you later want to run the same model, just use […]