Which generative AI chatbot should I use?
If you need to work with confidential or sensitive information:
Microsoft Copilot Chat and M365 Copilot are included in Western's institutional license with Microsoft. When logged in with your Western credentials, your data receives enterprise-level protections comparable to Outlook, OneDrive, and other enterprise Microsoft products.
This means Copilot is generally appropriate for confidential or sensitive information, though personally identifiable information (PII) such as student and staff numbers, addresses, academic records, health records etc. should still be avoided. AI tools require thoughtful, case-by-case judgment and no blanket policy can declare any product completely "safe."
To reiterate, these protections are only applied if you are logged in using your Western credentials and you are accountable for verifying that you're logged in before entering sensitive data. You can verify if you are logged into Copilot by looking in the upper right-hand corner of the Copilot page for the green shield icon. 
If you are working with personal information a Privacy Impact Assessment (PIA) is mandatory as of July 1, 2025. Please see the privacy website for details.
For more information about Western's Data Classification Standard please visit the Cybersmart Website.
If your work does NOT involve any confidential or sensitive information, you want the most capable generative AI model available:
-
ChatGPT from OpenAI is available at no cost with a free tier that lets you interact with powerful language models subject to usage limits and stricter rate caps on advanced features like file uploads and some forms of reasoning. Paid subscription plans, such as Plus, Pro, and Business/Enterprise unlock higher usage limits and access to more capable models and features.
- Gemini from Google offers both free and paid tiers. The free tier provides access to a highly capable model optimized for fast, general assistance with common tasks like summarization and drafting. The paid tier, known as Gemini Advanced, utilizes a significantly more powerful, state-of-the-art model. This premium model excels in complex analysis, advanced reasoning, and is specifically designed to handle and process much larger documents or substantial amounts of text with greater depth and capacity.
-
Claude from Anthropic offers both free and paid tiers, with a range of models at different capability and price levels (Haiku, Sonnet, and Opus). Claude is known for strong performance in writing, analysis, coding, and handling nuanced or complex instructions. Anthropic emphasizes safety and helpfulness in Claude's design, and the models tend to be particularly capable at following detailed guidance and producing thoughtful, well-structured responses.
Which one is the best?
There is no “best” model at the moment. Personal preference and use case play a significant role in what will suit you best. As a starting point, you can’t go wrong using any of the models mentioned above. If you want to see a ranking based on consensus human preference, check out the lmsys chatbot arena and click on the Leaderboard tab.
Bonus: If you want open LLM weights that you can run on your own machine:
A wide array of options is available, many of which you can explore on the Huggingface Open LLM Leaderboard. If you are asking this question, you probably already know this.
Our recommendations for open weights as of June 2024 are:
- Meta’s Llama 3.1 models: The 70B parameter model is exceptionally capable for its size, surpassing even earlier GPT-4 models in some benchmarks. The 8B parameter model is much less capable, but still surprisingly good for its size; it can shine when finetuned for narrow tasks. The 405B model is competitive with the original GPT-4 release (though the resources required to run it locally are not to be underestimated!)
- Cohere’s Command R+: an excellent model especially for integration into a Retrieval-augmented generation (RAG) and tool use pipeline.
- Google's Gemma 2 models (9B and 27B) are extremely capable for their sizes.
- Alibaba's Qwen 2 72B outperforms others in its size class on some tasks.
- Mistral’s Mistral Large 2 is very capable.
- Nvidia's Nemotron-4: a 340B parameter model that outperforms Llama 3 70B on some benchmarks but the performance increase is small in comparison to the very large size of the model. Running this model, even quantized, requires a significant amount of VRAM.
Be sure to read the licenses that come with these models. Open weights are not open source, and your use case may be constrained by these agreements, especially if it has commercial elements.
We recommend running quantized models as the small gains from running with full 16-bit floating-point weights are negligible (especially when compared to the cost of VRAM!) 8-bit quantized models are nearly as good as fp16, and even 5-bit quantization does relatively little damage to a model’s capabilities.
High end Apple silicon Macs (e.g., the M2 Ultra) with unified memory and decent memory bandwidth (800 GB/s) can be a surprisingly affordable (and less energy-intensive) alternative to multiple GPUs for running inference on very large models. If you’re just getting started and want to experiment, cloud GPU services are a good alternative, too.