Which generative AI chatbot should I use?
If you need to work with confidential, personal, or proprietary information:
Microsoft Copilot is included in Western’s institutional license with Microsoft. When logged in with your Western credentials, any data you input into Copilot receives the same protections as data in Outlook, OneDrive, or other enterprise Microsoft products.
You may use Copilot for confidential, personal, or proprietary information. However, remember that AI chatbots are tools, and sharing data with them requires careful risk management on a case-by-case basis. No blanket policy can declare a particular product completely “safe.”
To reiterate, these protections are only applied if you are logged in using your Western credentials and you are accountable for verifying this before sending confidential, personal, or proprietary information to Microsoft copilot.
(Note: You can verify if you are logged into Copilot by looking in the upper right-hand corner of the Copilot page for your UWO email address.)
If your work does NOT involve any confidential, personal, or proprietary information and you want the most capable generative AI model available
- ChatGPT from OpenAI now offers their most capable Large Language Model (LLM), GPT-4o, for free. By subscribing to ChatGPT Pro, you can ask more queries before reaching the usage cap. GPT-4o is an advanced LLM suitable for many tasks.
- OpenAI has also released the first productized "reasoning-enhanced" LLM with OpenAI o1. These models are currently only available to paying subscribers but are very good for tasks that require rigorous reasoning (e.g., science, math, coding, legal analysis). They are not, however, universally the best at everything.
- Gemini Advanced from Google offers an advanced model of similar quality to GPT-4o. The default, free, Gemini product is a smaller model that is not competitive with other top models. Access to Gemini advanced requires a subscription. An especially strong use case for Gemini advanced is queries involving books or substantial amounts of text.
- Anthropic Claude 3 and Claude 3.5. The Claude 3 series has three models to choose from: Haiku, Sonnet, and Opus. The top model right now is Claude 3.5 Sonnet (Claude 3.5 Opus is not yet released, and Claude 3.5 Sonnet outperforms the older Claude 3 Opus). Claude 3.5 Sonnet outperforms other frontier models on a range of benchmarks and appears to be particularly strong at coding.
Which one is the best?
There is no “best” model at the moment. Personal preference and use case play a significant role in what will suit you best. As a starting point, you can’t go wrong using any of the models mentioned above. If you want to see a ranking based on consensus human preference, check out the lmsys chatbot arena and click on the Leaderboard tab.
Bonus: If you want open LLM weights that you can run on your own machine:
A wide array of options is available, many of which you can explore on the Huggingface Open LLM Leaderboard. If you are asking this question, you probably already know this.
Our recommendations for open weights as of June 2024 are:
- Meta’s Llama 3.1 models: The 70B parameter model is exceptionally capable for its size, surpassing even earlier GPT-4 models in some benchmarks. The 8B parameter model is much less capable, but still surprisingly good for its size; it can shine when finetuned for narrow tasks. The 405B model is competitive with the original GPT-4 release (though the resources required to run it locally are not to be underestimated!)
- Cohere’s Command R+: an excellent model especially for integration into a Retrieval-augmented generation (RAG) and tool use pipeline.
- Google's Gemma 2 models (9B and 27B) are extremely capable for their sizes.
- Alibaba's Qwen 2 72B outperforms others in its size class on some tasks.
- Mistral’s Mistral Large 2 is very capable.
- Nvidia's Nemotron-4: a 340B parameter model that outperforms Llama 3 70B on some benchmarks but the performance increase is small in comparison to the very large size of the model. Running this model, even quantized, requires a significant amount of VRAM.
Be sure to read the licenses that come with these models. Open weights are not open source, and your use case may be constrained by these agreements, especially if it has commercial elements.
We recommend running quantized models as the small gains from running with full 16-bit floating-point weights are negligible (especially when compared to the cost of VRAM!) 8-bit quantized models are nearly as good as fp16, and even 5-bit quantization does relatively little damage to a model’s capabilities.
High end Apple silicon Macs (e.g., the M2 Ultra) with unified memory and decent memory bandwidth (800 GB/s) can be a surprisingly affordable (and less energy-intensive) alternative to multiple GPUs for running inference on very large models. If you’re just getting started and want to experiment, cloud GPU services are a good alternative, too.