From d0838f36dd596369c1e8ddd826b6f9a818565fdc Mon Sep 17 00:00:00 2001 From: randihuie63727 Date: Sun, 9 Feb 2025 15:13:45 +0000 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..581fe33 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That model was trained in part using their unreleased R1 "thinking" design. Today they've launched R1 itself, together with a whole household of [brand-new models](https://pv.scinet.ch) obtained from that base.
+
There's a lot of stuff in the [brand-new release](https://findmynext.webconvoy.com).
+
DeepSeek-R1-Zero seems the [base model](https://pakknokri.com). It's over 650GB in size and, like the [majority](https://multistyle.work) of their other releases, is under a clean MIT license. [DeepSeek warn](https://euvisajobs.com) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, poor readability, and language blending." ... so they likewise launched:
+
DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and thinking tasks". That a person is also MIT certified, and is a similar size.
+
I don't have the [capability](http://jahhero.com) to run [models larger](http://www.shaunhooke.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://aliancasrei.com) are something I can easily have fun with myself. That's where the new [distilled designs](https://www.ninahanson.dk) are available in.
+
To [support](http://git.r.tender.pro) the research community, we have [open-sourced](https://www.123flowers.net) DeepSeek-R1-Zero, DeepSeek-R1, and [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=763081) 6 thick [designs distilled](http://101.132.73.143000) from DeepSeek-R1 based upon Llama and Qwen.
+
This is an interesting flex! They have actually [designs based](http://xn--kchenmesser-kaufen-m6b.de) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and [genbecle.com](https://www.genbecle.com/index.php?title=Utilisateur:LesterStatton) Llama 3.3 70B Instruct).
+
Weirdly those Llama designs have an MIT license connected, which I'm uncertain is suitable with the underlying Llama license. Qwen models are [Apache licensed](https://tubebeans.com) so possibly MIT is OK?
+
(I likewise just [noticed](http://bbs.yongrenqianyou.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might require to pay a bit more attention to how they copied those in.)
+
[Licensing](http://networkbillingservices.co.uk) aside, these [distilled models](https://redbeachvilla.gr) are [remarkable monsters](http://marcstone.de).
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized versions are currently starting to appear. So far I have actually tried simply among those- unsloth/[DeepSeek-R](https://xn--939a42kg7dvqi7uo.com) 1-Distill-Llama-8[B-GGUF launched](https://www.mudlog.net) by [Unsloth](http://roadsafety.am) [AI](https://gitea.evo-labs.org)-and it's truly fun to play with.
+
I'm [running](https://ferbal.com) it using the mix of Ollama, LLM and [wiki.monnaie-libre.fr](https://wiki.monnaie-libre.fr/wiki/Utilisateur:MaybellHoughton) the [llm-ollama plugin](https://www.muslimlove.com).
+
First I brought and ran the [model utilizing](https://k2cyuuki.com) Ollama itself:
+
This [downloads](https://heskethwinecompany.com.au) around 8.5 GB of model information and [wiki.rrtn.org](https://wiki.rrtn.org/wiki/index.php/User:ElveraC22587) begins an interactive chat [interface](https://www.123flowers.net).
+
Once the design has been fetched, LLM can talk with it too. I prefer using LLM for [experiments](https://recruitment.econet.co.zw) because it logs everything to SQLite for later expedition.
+
I set up the plugin and ran it like this:
+
Each [reaction](https://www.storiamito.it) starts with a ... [pseudo-XML](http://giwa.shop) tag containing the chain of thought [utilized](https://www.fundacjaibs.pl) to assist create the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:
+
Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an interesting [combination](http://bridgingthefamilygap.com). Let me think of how these two [characters](http://daydream-believer.org) might engage in a [funny scenario](https://tv.goftesh.com).
+
First, let's consider their characteristics. [Pelicans](https://insituespacios.com) are [understood](http://ad.hrincjob.com) for their long beaks and Webbed feet, [frequently](https://haceelektrik.com) seen near the beach or water. Walruses have huge teeth, thick fur, and they're usually found in chillier climates, like icebergs or snowy locations. So, [integrating](http://blog.slade.kent.sch.uk) these 2 into a tea room [setting](https://www.meprotec.com.py) is already a funny image.
+
[...] +
It then thought for 20 [paragraphs](https://git.ycoto.cn) before [outputting](https://www.conexiontecnologica.com.do) the joke! Here's the full [transcript](https://www.vieclam.jp) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://centrogravedadcero.com) and a [walrus choose](http://www.demoscene.ru) to open a quaint little [tea space](https://www.c24news.info) together. The walrus, with its big size, [struggles](https://gandhcpas.net) to [discover comfy](https://3srecruitment.com.au) chairs, so it sits on the table by mishap, [knocking](http://mancajuvan.com) over the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://staffigo.com) its beak to put dishes on the table, [causing](https://blog.bienenzwirbel.ch) a few spills.
+
After a series of funny incidents, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://gramofoni.fi) and remarks, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](http://154.64.253.773000) and Walrus run a not likely tea space. Their sizes result in [comical](http://johjigroup.com) mayhem. +- Punchline: "This isn't so fishy any longer" is a pun on the circumstance becoming less absurd. The follow-up joke introduces "krill," a play on words that connects into both [marine life](https://git.ycoto.cn) ("fish") and [higgledy-piggledy.xyz](https://higgledy-piggledy.xyz/index.php/User:KrisMccrory) tea, making it funny while [fitting](http://www.errayhaneclinic.com) the style.
+
So yeah, the joke is [dreadful](http://tiggo4.su). But the procedure of getting there was such a [fascinating insight](https://yaseen.tv) into how these [brand-new designs](https://megadenta.biz) work.
+
This was a fairly small 8B design. I'm looking [forward](http://okbestgood.com3000) to trying out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://golgi.ru) of Llama 3.3 70[B-currently](http://www.nordicwalkingvco.it) my [favourite](https://reemsbd.com) GPT-4 [class design](https://bhavyabarcode.com) that I have actually run on my own [machine-I](https://www.inlandbaysgardencenter.com) have high [expectations](https://www.lupitankequipments.com).
+
Update 21st January 2025: I got this [quantized variation](https://selarios.com) of that Llama 3.3 70B R1 [distilled model](http://ksc-samara.ru) working like this-a 34GB download:
+
Can it draw a pelican?
+
I attempted my [classic Generate](http://avcilarsuit.com) an SVG of a pelican riding a bike timely too. It did [refrain](http://gitlab.ds-s.cn30000) from doing effectively:
+
It aimed to me like it got the order of the aspects wrong, so I followed up with:
+
the background wound up covering the [remainder](https://digitalworldtoken.com) of the image
+
It thought some more and provided me this:
+
Similar to the earlier joke, the chain of thought in the [records](https://korthar.com) was far more interesting than the end result.
+
Other ways to [attempt](https://hroom.co.uk) DeepSeek-R1
+
If you wish to [attempt](http://lovemult.ru) the design out without [setting](http://federalmealspro.com) up anything you can do so using [chat.deepseek](https://angiologoenguadalajara.com).[com-you'll](https://marealtaescolanautica.com.br) [require](https://medhealthprofessionals.com) to [produce](https://cakoinhat.com) an [account](https://www.itcvertebraljundiai.com.br) (sign in with Google, use an [email address](http://daydream-believer.org) or offer a [Chinese](https://residencialsotavento.mx) +86 phone number) and after that pick the "DeepThink" [option listed](https://www.pedimedidoris.be) below the timely [input box](https://lafffrica.com).
+
[DeepSeek](http://www.3dtvorba.cz) use the design through their API, [gratisafhalen.be](https://gratisafhalen.be/author/delia219806/) using an [OpenAI-imitating endpoint](https://gutachter-fast.de). You can access that via LLM by [dropping](https://gogs.k4be.pl) this into your [extra-openai-models](https://digiartostelbien.de). [yaml setup](https://www.plm.ba) file:
+
Then run llm keys set deepseek and paste in your API key, then -m deepseek-reasoner ['timely'](https://www.istitutosalutaticavalcanti.edu.it) to run [triggers](https://www.cabcalloway.org).
+
This will not reveal you the [reasoning](https://meetcupid.in) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a way to [display](https://theabsolutebestacademy.com) them.
\ No newline at end of file