Is there any actual standalone AI software?

Ad4mWayn3@lemmy.world · 4 months ago

Is there any actual standalone AI software?

Ziggurat@sh.itjust.works · 4 months ago

There is tons of “standalone” software that you can run on your own PC

For Text generation, the easiest way is to get GPT4All package which allows you to run text generation model in CPU on your own PC
For image generation, you can try to get Easy difusion package which is an easy to use stable diffusion package, then if you like-it, time to try the “comfyUI”

You can check !localllama@sh.itjust.works and !imageai@sh.itjust.works for some more information

deranger@sh.itjust.works · 4 months ago

I’ve wanted to try these out for shits and giggles - what would I expect with a 3090, is it going to take a long time to make some shitposts?

chicken@lemmy.dbzer0.com · 4 months ago

3090s are ideal because the most important factor is vram, and those are at the top of the plateau for vram until you get into absurdly expensive server hardware. Expect around 3 seconds for generating a 512x512 image or 4 words per second generating text at around GPT 3.5 quality.

tyler@programming.dev · 4 months ago

I did a bunch of image generation on my 3080 and it felt extremely fast. Enough that I was able to set it up as a shared node in one of those image generation nets and it outperformed most other people in the net.

yo_scottie_oh@lemmy.ml · edit-2 3 months ago

shared node in one of those image generation nets

You mean like AI Horde?

tyler@programming.dev · 3 months ago

Yeah I couldn’t remember what it was called lol

KoboldCoterie@pawb.social · 4 months ago

Stable Diffusion (AI image generation) runs fully locally. The models (the datasets you’re referring to) are generally around 3GB in size. It’s more about the processing power needed for it to run (it’s very GPU-intensive) than the storage size on disk.

MacN'Cheezus@lemmy.today · 4 months ago

For LLMs, the already mentioned LM Studio does a good job as far as beginner friendliness goes.

For text-to-image, I like Fooocus, which is a custom Stable Diffusion setup with automatic prompt enhancement, which can comfortably compete with Midjourney.

Here’s a setup guide for first time users. There’s also an online version to try it out.

iturnedintoanewt@lemm.ee · 4 months ago

Just wanted to thank you, as I hadn’t had any luck running any other SD software on my AMD setup with Nobara. But after a couple of fixes to get rocm running, this one runs, and runs pretty fast. Thanks!

JackGreenEarth@lemm.ee · 4 months ago

I use Krita with the AI Diffusion plugin for Image Generation, which is working great, and Jan for text Generation, using the Llama 3 8B Q4 model. I have a NVIDIA GTX 1660 Ti with 6GB of VRAM and both are reasonably fast.

Björn Tantau@swg-empire.de · 4 months ago

Krita has an AI plugin that’s pretty painless to setup if you’ve got an nVidia card. AMD has to be done manually or you can fall back to slow CPU generation. It uses ComfyUI in the background.

Wahots@pawb.social · 4 months ago

https://lmstudio.ai/

You can load up your own datasets, has some of its own, too. Most of these are pretty good, but run on synthetic data. Storing and processing something the size of chatgpt would bankrupt most people.

This program can use significant amounts of computer resources if you let her eat. I recommend closing other programs and games.

astrsk@kbin.run · 4 months ago

GPT4ALL for chat and Automatic1111 for generative with downloaded models works great. The former does not require a gpu but the later generally does.

sunzu@kbin.run · 4 months ago

Do you have 24gb GPU.

If so… Then you can get decent results from running local models

FaceDeer@fedia.io · 4 months ago

You can get decent results with much less these days, actually. I don’t have personal experience (I do have a 24GB GPU) but the open source community has put a lot of work into getting models to run on lower-spec machines. Aim for smaller models (8B parameters is common) and low quantization (the values of the parameters get squished into smaller numbers of bits). It’s slower and the results can be of noticeably lower quality but I’ve seen people talk about usable LLMs running CPU-only.

CaptDust@sh.itjust.works · edit-2 4 months ago

Local LLMs can be compressed to fit on consumer hardware. Model formats like GUFF and Exl2 can be loaded up with a offline hosted API like KobaldCPP or Oobabooga. These formats lose resolution from the full floating point model and become “dumber” but it’s good enough for many uses.

Also noting these models are like, 7, 11, 20 Billion parameters while hosted models like ChatGPT run closer to 8x220 Billion

FaceDeer@fedia.io · 4 months ago

Though bear in mind that parameter count alone is not the only measure of a model’s quality. There’s been a lot of work done over the past year or two on getting better results from the same or smaller parameter counts, lots of discoveries have been made on how to train better and run inferencing better. The old ChatGPT3 from back at the dawn of all this was really big and was trained on a huge number of tokens but nowadays the small downloadable models fine-tuned by hobbyists would compete with it handily.

CaptDust@sh.itjust.works · 4 months ago

Agreed, especially true with Llama3 their 7b model is extremely competitive.

FaceDeer@fedia.io · 4 months ago

Makes it all the more amusing how OpenAI staff were fretting about how GPT-2 was “too dangerous to release” back in the day. Nowadays that class of LLM is a mere toy.

tyler@programming.dev · 4 months ago

They were fretting about it until their morals went out the door for money.

Rhaedas@fedia.io · 4 months ago

The AI, image, and audio models that can run on a typical PC have all been broken down from originally larger models. How this is done affects what the models can do and the quality, but the open source community has come a long way in making impressive stuff. First question is more hardware - do you have an Nvidia GPU that can support these types of generations? They can be done through CPU alone, but it’s painfully much slower.

If so, then I would highly recommend looking into Ollama for running AI models (using WSL if you’re using Windows) and ComfyUI for graphical generation. Don’t let the workflow of complicated ComfyUI scare you, starting from the basics with plenty of Youtube help out there it will make sense. As for TTS, there’s a lot of constant “new stuff” out there, but for actual local processing in “real time” (still takes a bit) I have yet to find anything to replace my Coqui TTS copy with Jenny as the model voice. It may take some digging and work to get that together, it’s older and not supported anymore.

hendrik@palaver.p3x.de · edit-2 4 months ago

I don’t think they break them down. For most models the math requires to start at the beginning and train each model individually from ground up.

But sure, a smaller model generally isn’t as capable as a bigger one. And you can’t train them indefinitly. So for a model series you’ll maybe use the same dataset but feed more into the super big variant and not so much into the tiny one.

And there is something where you use a big model to generate questions and answers and use them to train a different, small model. And that model will learn to respond like the big one.

Rhaedas@fedia.io · 4 months ago

The breaking down I mentioned is the quantization that forms a smaller model from the larger one. I didn’t want to get technical because I don’t understand the math details myself past how to use them. :)