• 1 Post
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle


  • My point is that using “grokking” in ML is not a Musk/Twitter/Whatever-his-Ai-company-is-named invention, it predates their use.

    Yes the original researchers reused a pre-existing meaning, which has been in internet for a while before. I did not know it came from Heinlein and I did not know its full meaning. I remember seeing it first, more than a decade ago, in a text that explained without any explanation that an isolated unknown word can easily be groked from context. Demonstrating it immediately. To me (and I guess to those researchers) “grok” means “understanding from context” which is particularly appropriate in the context.

    BTW Elon was not the only one to reuse this word. Another company named Groq, totally unrelated to Musk as far as I know, designs AI acceleration chips.











  • Understandable but sad. They are now at the bleeding edge and not playing catchup anymore. They can cash out hard. That means that for now, this advanced model is a dead end for open source as we wont be able to improve it as we were in the last iterations.

    “non-commercial license” is not open source but I would not mind if this became the standard for the cutting edge in the industry as we try to figure out a business model to make true open source work.





  • Ah I should have made a bit more detailed message explaining the road I wen through already I guess :-)

    I know that RAG gets recommended more for adding information. It is the fastest way to retrieve information. However it allows only a shallow understanding of it and the LLM will have problem using information from several different files to give you. You can’t, for example, give it 1000 emails and ask to list the problems encountered in project A and how they were solved.

    Fine tuning can add facts. This person added the documentation for Unreal Engine 5 in Llama 7B. Or this company added financial knowledge to Llama 13B. These are my inspiration. When using LORA it requires higher ranks and crucially to do the fine-tuning on a foundation model and only after your own fine-tuning, do the instruction fine-tune.

    I am wondering if there is a way to make the last step easier by reapplying the same LORA.

    I guess I am also wondering why we can’t directly fine-tune facts into an instruction-tuned model. I tried, it does tend to remember the way to interact with instruct prompts but the format is a bit corrupted by the new dataset. I find it a bit weird the speed at which such models forget past things as they are fed new tokens.