Are instruction tuning LORAs transferable?

keepthepace@slrpnk.net · edit-2 2 months ago

It would probably be more effective to put an explicit mention in the system prompt. “Your interlocutor is a <gendered term> and will be greatly offended to be refered to as a boy or a man.”

keepthepace@slrpnk.net · 3 months ago

The Huggingface page has examples of how to use it: https://huggingface.co/ibm-granite/granite-8b-code-instruct

keepthepace@slrpnk.net · 5 months ago

My point is that using “grokking” in ML is not a Musk/Twitter/Whatever-his-Ai-company-is-named invention, it predates their use.

Yes the original researchers reused a pre-existing meaning, which has been in internet for a while before. I did not know it came from Heinlein and I did not know its full meaning. I remember seeing it first, more than a decade ago, in a text that explained without any explanation that an isolated unknown word can easily be groked from context. Demonstrating it immediately. To me (and I guess to those researchers) “grok” means “understanding from context” which is particularly appropriate in the context.

BTW Elon was not the only one to reuse this word. Another company named Groq, totally unrelated to Musk as far as I know, designs AI acceleration chips.

keepthepace@slrpnk.net · 5 months ago

Grokking is actually a concept in ML, when a model’s loss start suddenly lower far after it is considered to have overfit. That notion was named by researchers, I’ll let people decide if it is aptly named, but Elon likely just took it from there.

keepthepace@slrpnk.net · 6 months ago

I really want this lemmy community to grow and strive but for that thing, I thought it was too important to not post it on the biggest community out there, so I made a post on /r/localllama to incite a collective response. Feel free to collaborate of cross-post/copy the message here: https://old.reddit.com/r/LocalLLaMA/comments/1b7iwxi/we_should_make_a_collective_rlocallama_answer_for/

keepthepace@slrpnk.net · edit-2 6 months ago

I read the questions asked there and it is clear that it comes from people who have done their homeworks and are positive about open models already. Answering their questions in depth enough is pretty involved and would probably take me 1-2 days to bring up citations and articles.

It could be interesting to make a collaborative answer.

keepthepace@slrpnk.net · 6 months ago

As a non-US citizen can/show I comment?

keepthepace@slrpnk.net · 6 months ago

I don’t understand how we are supposed to file a comment?

keepthepace@slrpnk.net · 7 months ago

Note that he did not confirm is was mistral-medium. He says that’s a retrained llama2-70B model, but hints that it is not the fully trained one. Sounds a bit like damage control but is not a 100% confirmation of the claim.

keepthepace@slrpnk.net · 8 months ago

Nice! It feels like a direct answer to Karpathy comment on Mistral, where he said it is nice to call it “open weight” but not “open source” because we still don’t know the dataset and the training code. LLM360 seem to be fully open source by that definition and releases even the checkpoints!

Performance wise, a bit lagging (under a Llama2 of the same size) but all the tools are there to improve it!

keepthepace@slrpnk.net · 8 months ago

Did anyone here have access? Like many I am on waitlist, I wonder if it is already available to anyone.

keepthepace@slrpnk.net · 8 months ago

I really can’t shake the feeling that this is a jab as Google release-but-not-release of Gemini.

I also like that they release it on a friday evening: that gives a headstart to the hobbyists. By monday, there will be a crowd-sourced tech report.

keepthepace@slrpnk.net · 9 months ago

Understandable but sad. They are now at the bleeding edge and not playing catchup anymore. They can cash out hard. That means that for now, this advanced model is a dead end for open source as we wont be able to improve it as we were in the last iterations.

“non-commercial license” is not open source but I would not mind if this became the standard for the cutting edge in the industry as we try to figure out a business model to make true open source work.

keepthepace@slrpnk.net · 10 months ago

Does Walmart have a monopoly on kinder chocolate? The idea is to have several distributors each with as complete a catalog as possible. Having such a shattered offers between platforms makes it very noncompetitive against any piracy solution.

keepthepace@slrpnk.net · 11 months ago

Do you consider that there is a way to add facts to a model without rising the probability of hallucinations? Yes, RAG is a necessity, but if we want the model to display some sort of reasoning on a variety of facts, we need them embedded more deeply. The email example I gave can’t be done with RAG.

keepthepace@slrpnk.net · 11 months ago

Yes, but my understanding is that they are commutable? (i.e. the order does not matter) If so, it looks like that a “facts-adding” LORA seem to induce forgetting of formatting.

And I am especially curious if a facts-LORA + a instructions-LORA results in a model that can use the new facts in the instructions or not. I’ll run experiments but would have loved if people here knew about it already.

keepthepace@slrpnk.net · 11 months ago

Ah I should have made a bit more detailed message explaining the road I wen through already I guess :-)

I know that RAG gets recommended more for adding information. It is the fastest way to retrieve information. However it allows only a shallow understanding of it and the LLM will have problem using information from several different files to give you. You can’t, for example, give it 1000 emails and ask to list the problems encountered in project A and how they were solved.

Fine tuning can add facts. This person added the documentation for Unreal Engine 5 in Llama 7B. Or this company added financial knowledge to Llama 13B. These are my inspiration. When using LORA it requires higher ranks and crucially to do the fine-tuning on a foundation model and only after your own fine-tuning, do the instruction fine-tune.

I am wondering if there is a way to make the last step easier by reapplying the same LORA.

I guess I am also wondering why we can’t directly fine-tune facts into an instruction-tuned model. I tried, it does tend to remember the way to interact with instruct prompts but the format is a bit corrupted by the new dataset. I find it a bit weird the speed at which such models forget past things as they are fed new tokens.

keepthepace@slrpnk.net · 11 months ago

Do you have any plan to do reinforcement learning fine tuning? I really feel like this is the correct way to teach coding to a model: with good enough test cases, computing the reward is straightforward.

keepthepace@slrpnk.net · 11 months ago

Are instruction tuning LORAs transferable?

keepthepace@slrpnk.net · 11 months ago

I like the idea, but are there real examples of tasks that work better that way than when prompted directly? In my experience, agents answering to each other quickly get into nonsense after a few exchanges. How is this different?