Add Simon Willison's Weblog
commit
d0838f36dd
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
<br>That model was trained in part using their unreleased R1 "thinking" design. Today they've launched R1 itself, together with a whole household of [brand-new models](https://pv.scinet.ch) obtained from that base.<br>
|
||||||
|
<br>There's a lot of stuff in the [brand-new release](https://findmynext.webconvoy.com).<br>
|
||||||
|
<br>DeepSeek-R1-Zero seems the [base model](https://pakknokri.com). It's over 650GB in size and, like the [majority](https://multistyle.work) of their other releases, is under a clean MIT license. [DeepSeek warn](https://euvisajobs.com) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, poor readability, and language blending." ... so they likewise launched:<br>
|
||||||
|
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and thinking tasks". That a person is also MIT certified, and is a similar size.<br>
|
||||||
|
<br>I don't have the [capability](http://jahhero.com) to run [models larger](http://www.shaunhooke.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://aliancasrei.com) are something I can easily have fun with myself. That's where the new [distilled designs](https://www.ninahanson.dk) are available in.<br>
|
||||||
|
<br>To [support](http://git.r.tender.pro) the research community, we have [open-sourced](https://www.123flowers.net) DeepSeek-R1-Zero, DeepSeek-R1, and [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=763081) 6 thick [designs distilled](http://101.132.73.143000) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||||
|
<br>This is an interesting flex! They have actually [designs based](http://xn--kchenmesser-kaufen-m6b.de) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and [genbecle.com](https://www.genbecle.com/index.php?title=Utilisateur:LesterStatton) Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>Weirdly those Llama designs have an MIT license connected, which I'm uncertain is suitable with the underlying Llama license. Qwen models are [Apache licensed](https://tubebeans.com) so possibly MIT is OK?<br>
|
||||||
|
<br>(I likewise just [noticed](http://bbs.yongrenqianyou.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might require to pay a bit more attention to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](http://networkbillingservices.co.uk) aside, these [distilled models](https://redbeachvilla.gr) are [remarkable monsters](http://marcstone.de).<br>
|
||||||
|
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br>Quantized versions are currently starting to appear. So far I have actually tried simply among those- unsloth/[DeepSeek-R](https://xn--939a42kg7dvqi7uo.com) 1-Distill-Llama-8[B-GGUF launched](https://www.mudlog.net) by [Unsloth](http://roadsafety.am) [AI](https://gitea.evo-labs.org)-and it's truly fun to play with.<br>
|
||||||
|
<br>I'm [running](https://ferbal.com) it using the mix of Ollama, LLM and [wiki.monnaie-libre.fr](https://wiki.monnaie-libre.fr/wiki/Utilisateur:MaybellHoughton) the [llm-ollama plugin](https://www.muslimlove.com).<br>
|
||||||
|
<br>First I brought and ran the [model utilizing](https://k2cyuuki.com) Ollama itself:<br>
|
||||||
|
<br>This [downloads](https://heskethwinecompany.com.au) around 8.5 GB of model information and [wiki.rrtn.org](https://wiki.rrtn.org/wiki/index.php/User:ElveraC22587) begins an interactive chat [interface](https://www.123flowers.net).<br>
|
||||||
|
<br>Once the design has been fetched, LLM can talk with it too. I prefer using LLM for [experiments](https://recruitment.econet.co.zw) because it logs everything to SQLite for later expedition.<br>
|
||||||
|
<br>I set up the plugin and ran it like this:<br>
|
||||||
|
<br>Each [reaction](https://www.storiamito.it) starts with a ... [pseudo-XML](http://giwa.shop) tag containing the chain of thought [utilized](https://www.fundacjaibs.pl) to assist create the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
|
||||||
|
<br>Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an interesting [combination](http://bridgingthefamilygap.com). Let me think of how these two [characters](http://daydream-believer.org) might engage in a [funny scenario](https://tv.goftesh.com).<br>
|
||||||
|
<br>First, let's consider their characteristics. [Pelicans](https://insituespacios.com) are [understood](http://ad.hrincjob.com) for their long beaks and Webbed feet, [frequently](https://haceelektrik.com) seen near the beach or water. Walruses have huge teeth, thick fur, and they're usually found in chillier climates, like icebergs or snowy locations. So, [integrating](http://blog.slade.kent.sch.uk) these 2 into a tea room [setting](https://www.meprotec.com.py) is already a funny image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then thought for 20 [paragraphs](https://git.ycoto.cn) before [outputting](https://www.conexiontecnologica.com.do) the joke! Here's the full [transcript](https://www.vieclam.jp) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](https://centrogravedadcero.com) and a [walrus choose](http://www.demoscene.ru) to open a quaint little [tea space](https://www.c24news.info) together. The walrus, with its big size, [struggles](https://gandhcpas.net) to [discover comfy](https://3srecruitment.com.au) chairs, so it sits on the table by mishap, [knocking](http://mancajuvan.com) over the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://staffigo.com) its beak to put dishes on the table, [causing](https://blog.bienenzwirbel.ch) a few spills.<br>
|
||||||
|
<br>After a series of funny incidents, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://gramofoni.fi) and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](http://154.64.253.773000) and Walrus run a not likely tea space. Their sizes result in [comical](http://johjigroup.com) mayhem.
|
||||||
|
- Punchline: "This isn't so fishy any longer" is a pun on the circumstance becoming less absurd. The follow-up joke introduces "krill," a play on words that connects into both [marine life](https://git.ycoto.cn) ("fish") and [higgledy-piggledy.xyz](https://higgledy-piggledy.xyz/index.php/User:KrisMccrory) tea, making it funny while [fitting](http://www.errayhaneclinic.com) the style.<br>
|
||||||
|
<br>So yeah, the joke is [dreadful](http://tiggo4.su). But the procedure of getting there was such a [fascinating insight](https://yaseen.tv) into how these [brand-new designs](https://megadenta.biz) work.<br>
|
||||||
|
<br>This was a fairly small 8B design. I'm looking [forward](http://okbestgood.com3000) to trying out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://golgi.ru) of Llama 3.3 70[B-currently](http://www.nordicwalkingvco.it) my [favourite](https://reemsbd.com) GPT-4 [class design](https://bhavyabarcode.com) that I have actually run on my own [machine-I](https://www.inlandbaysgardencenter.com) have high [expectations](https://www.lupitankequipments.com).<br>
|
||||||
|
<br>Update 21st January 2025: I got this [quantized variation](https://selarios.com) of that Llama 3.3 70B R1 [distilled model](http://ksc-samara.ru) working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a pelican?<br>
|
||||||
|
<br>I attempted my [classic Generate](http://avcilarsuit.com) an SVG of a pelican riding a bike timely too. It did [refrain](http://gitlab.ds-s.cn30000) from doing effectively:<br>
|
||||||
|
<br>It aimed to me like it got the order of the aspects wrong, so I followed up with:<br>
|
||||||
|
<br>the background wound up covering the [remainder](https://digitalworldtoken.com) of the image<br>
|
||||||
|
<br>It thought some more and provided me this:<br>
|
||||||
|
<br>Similar to the earlier joke, the chain of thought in the [records](https://korthar.com) was far more interesting than the end result.<br>
|
||||||
|
<br>Other ways to [attempt](https://hroom.co.uk) DeepSeek-R1<br>
|
||||||
|
<br>If you wish to [attempt](http://lovemult.ru) the design out without [setting](http://federalmealspro.com) up anything you can do so using [chat.deepseek](https://angiologoenguadalajara.com).[com-you'll](https://marealtaescolanautica.com.br) [require](https://medhealthprofessionals.com) to [produce](https://cakoinhat.com) an [account](https://www.itcvertebraljundiai.com.br) (sign in with Google, use an [email address](http://daydream-believer.org) or offer a [Chinese](https://residencialsotavento.mx) +86 phone number) and after that pick the "DeepThink" [option listed](https://www.pedimedidoris.be) below the timely [input box](https://lafffrica.com).<br>
|
||||||
|
<br>[DeepSeek](http://www.3dtvorba.cz) use the design through their API, [gratisafhalen.be](https://gratisafhalen.be/author/delia219806/) using an [OpenAI-imitating endpoint](https://gutachter-fast.de). You can access that via LLM by [dropping](https://gogs.k4be.pl) this into your [extra-openai-models](https://digiartostelbien.de). [yaml setup](https://www.plm.ba) file:<br>
|
||||||
|
<br>Then run llm keys set deepseek and paste in your API key, then -m deepseek-reasoner ['timely'](https://www.istitutosalutaticavalcanti.edu.it) to run [triggers](https://www.cabcalloway.org).<br>
|
||||||
|
<br>This will not reveal you the [reasoning](https://meetcupid.in) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a way to [display](https://theabsolutebestacademy.com) them.<br>
|
Loading…
x
Reference in New Issue
Block a user