

Add “site:reddit.com” to your google query.
Add “site:reddit.com” to your google query.
Sad thing is that search engines have got so bad, and usually return so much garbage blog spam that searching directly on reddit is more likely to give useful results. I hope a similar amount of knowledge will build up on Lemmy over time.
Assuming they already own a PC, if someone buys two 3090 for it they’ll probably also have to upgrade their PSU so that might be worth including in the budget. But it’s definitely a relatively low cost way to get more VRAM, there are people who run 3 or 4 RTX3090 too.
For LLMs it entirely depends on what size models you want to use and how fast you want it to run. Since there’s diminishing returns to increasing model sizes, i.e. a 14B model isn’t twice as good as a 7B model, the best bang for the buck will be achieved with the smallest model you think has acceptable quality. And if you think generation speeds of around 1 token/second are acceptable, you’ll probably get more value for money using partial offloading.
If your answer is “I don’t know what models I want to run” then a second-hand RTX3090 is probably your best bet. If you want to run larger models, building a rig with multiple (used) RTX3090 is probably still the cheapest way to do it.
Is max tokens different from context size?
Might be worth keeping in mind that the generated tokens go into the context, so if you set it to 1k with 4k context you only get 3k left for character card and chat history. I think i usually have it set to 400 tokens or something, and use TGW’s continue button in case a long response gets cut off
llama.cpp uses the gpu if you compile it with gpu support and you tell it to use the gpu…
Never used koboldcpp, so I don’t know why it would it would give you shorter responses if both the model and the prompt are the same (also assuming you’ve generated multiple times, and it’s always the same). If you don’t want to use discord to visit the official koboldcpp server, you might get more answers from a more llm-focused community such as !localllama@sh.itjust.works
A static website and Immich
So it’s supposed to be 15 hours/month included with your premium subscription? Since I’m not familiar with how Spotify audio books work, I thought you meant that you had a free account and was allowed to listen 15 hours to books that would be included/unlimited with a premium subscription. Contact support if it ate through your monthly credits faster that it should. If you’re a paying customer supports are usually quite helpful.
I’ve been using Intel NUCs, even though they have a lot of issues and start failing after about 3 years of heavy use. Previously used Kodi on Arch, but with the latest NUC I decided to go with Xubuntu and for some reason video playback doesn’t work in Kodi now. So instead I just use VLC media player for TV/movies and a web browser for everything else. Got a Logitech K400 Plus wireless keyboard which makes it easy to control the computer from the couch.
Less than a year after that mail Swedish laws were rewritten to make copying music and movies illegal.
There are tons of options for running LLMs locally nowadays, though none come close to GPT4 or Claude 2 etc. One place to start is /c/localllama@sh.itjust.works
Static html+css page generated with this: https://github.com/maximtrp/tab
Do you mean that you want to build the docker image on one computer, export it to a different computer where it’s going to run, and there shouldn’t be any traces of the build process on the first computer? Perhaps it’s possible with the –output option… Otherwise you could write a small script which combines the commands for docker build, export to file, delete local image, and clean up the system.
Intel NUC running Linux. Not the cheapest solution but can play anything and I have full control over it. At first I tried to find some kind of programmable remote but now we have a wireless keyboard with built-in touchpad.
Biggest downside is that the hardware quality is kind of questionable and the first two broke after 3 years + a few months, so we’re on our third now.