Why AI PCs Are Not For Developers

Do developers need AI PCs?

Currently, there is no reason to persuade developers to buy an AI PC and take advantage of the new AI processors in it to compile local AI models.

——AI PC still has many problems: insufficient hardware capabilities, unavailable models, and headaches in deploying development tools.

I have been testing for several months, trying to run LLM locally offline on so-called AI PCs with Windows 11 and dedicated AI processors. These laptops include chips from Intel and Qualcomm Neural Processors, which are designed specifically for AI.

Microsoft has touted its AI PC support for lightweight AI models like Meta's Llama 2 and Microsoft's Phi Silica.

My attempts to load these models onto my PC were extremely frustrating and each step was a struggle. First it was finding lightweight models that were compatible with the neural processors in Qualcomm and Intel chips, then loading up the Jupyter notebooks and neural networks needed to run these SLMs.

When I ran the model successfully, I discovered that SLM did not use a dedicated AI processor, but instead relied on the GPU or CPU.

1. PC vendors’ AI hype

Microsoft announced Copilot+ PCs at this year's Build conference. The first Co-Pilot PCs are equipped with hardware that can run inference on the device, avoiding cloud operations.

Copilot PCs have some minimum requirements, including a minimum AI performance requirement of at least 45 TOPS. The first batch of AI PCs equipped with Qualcomm Snapdragon chips meet this requirement.

Microsoft CEO Satya Nadella said the company has prepared more than 40 models that can be run locally directly on Copilot+ PC. One of them is Phi Silica, a 3.8 billion parameter SLM.

DirectML and ONNX runtimes allow users to run Phi-3 models on Windows devices, but they were not ready when Qualcomm chips were released. Qualcomm provides a list of supported AI models through the AI ​​Development Center.

Early attempts to load Llama v2 were not smooth and did not work for me. I sought help from Qualcomm to load the model but with no clear results.

Creating Jupyter notebooks using Qualcomm's recommended tools was confusing, and I was unable to load any AI models manually. Qualcomm's recommendation to download the ONNX runtime to take advantage of NPUs was also puzzling.

Recently, LMStudio provided a version of its AI software for Qualcomm chips.

I loaded the 8 billion parameter Llama v3.1 model using LMStudio, but it only used the Snapdragon CPU and did not take advantage of the GPU or NPU. It outputted 17.34 tokens per second, but memory usage reached 87% after just a few queries.

There are no real meaningful models that can take advantage of Qualcomm’s NPUs, which are processors that, like GPUs, are designed to accelerate AI. Even if the NPUs worked, the Copilot PCs don’t have enough memory to run long queries, and the battery life would quickly run out.

Microsoft is providing developers with tools to integrate AI capabilities into desktop applications. For them, loading Llama v3.1 is not necessary because they already have Copilot capabilities on PC.

Microsoft's Phi Silica support is more about bringing large language model-style query capabilities to Windows applications for developers through the Windows App SDK.

2. Meteor Lake’s failure

Late last year, Intel unveiled an AI PC chip called Meteor Lake that's equipped with a neural processing unit.

Now, the chip is a piece of junk, and people who bought a laptop with it for AI on their PC are left out. There are no useful applications, and the NPU is only used for basic AI models like TinyLlama.

To be sure, Intel's Meteor Lake chips don't meet Microsoft's minimum specifications for AI PCs. Intel claims Meteor Lake reaches 34 TOPS (tera operations per second) in AI performance, which is lower than the 40 TOPS required for Windows PCs.

Meteor Lake has received poor reviews. It's slower than the previous generation of laptop chips and offers no improvements in battery life.

About six months after releasing Meteor Lake, Intel launched its next-generation AI PC chip, Lunar Lake, which is already available on PCs and provides 120 TOPS of AI performance.

I tried running a local AI model manually on my Meteor Lake PC.

Loading a neural network to take advantage of the NPU involves installing OpenVINO 2024.2 and following the instructions on the OpenVINO website.

The installation provides the NPU plugin that you expect to run when you load the model in a Jupyter notebook. Intel says I need the correct NPU drivers and firmware.

Installing the new NPU driver was a challenge in itself, I had to uninstall the old driver in the Windows Device Manager settings and then detect the new driver. In the end, I could only use the driver search to update the driver.

I ran models like TinyLlama from a Jupyter notebook and it ran smoothly but gave poor answers. But like the Qualcomm one, it didn't take advantage of the NPU.

A few mods like Stable Diffusion 1.4 take advantage of the NPU, but do so directly from within the GIMP interface.

Intel's AI software development is mainly focused on its server CPUs.

3. Back to Nvidia

Developers should rely on Nvidia to run Jupyter notebooks on their PCs for any meaningful AI work.

AI PCs are bought for productivity, but not for AI-related coding or experimentation. Chip makers’ NPUs are not developer-friendly. The problem starts with launching neural networks, and each chip maker has its own problems. But on-device AI is an emerging field that provides developers with many opportunities to optimize AI, such as optimizing AI on PCs through quantization.

For the more adventurous developer, the typical Windows challenges will arise - making sure you have the right drivers and development kits. Both Qualcomm and Intel have their own preferred tools for compiling and loading models.

Fortunately, the Windows command line and PowerShell make command line adventures fun.

It is expected that AI features that can take advantage of NPUs will come pre-packaged in applications. Intel is working with companies to take advantage of NPUs. This is the same as making software compatible with a specific chip architecture.

AI hardware is advancing rapidly, and Intel is hyping its latest Lunar Lake chip. Recent reviews praise the chip for its excellent battery life. But don't buy it for development purposes - it doesn't have enough memory or bandwidth to run language models locally.