Alexander Shlyapin (@alshlyapin) Bsky

This is why, for now, I'll stick with ChatGPT and Copilot. When Claude 3 Opus acquires internet search capabilities, I will try it.

6/6

2 years ago 2 0 1 0

Two different models (you don't know which ones) give you answers, and you choose which answer is best. Claude 3 Opus is right behind GPT-4.

5/n

2 years ago 3 0 1 0

2) I think that arena.lmsys.org (click "Leaderboard" in the menu at the top) shows more fair results. The leaderboard is set up as follows: in the arena ("Arena (battle)" in the top menu, try it), you give a prompt.

4/n

2 years ago 1 0 1 0

If you ask an LLM without internet access about recent events, it will just refuse to answer. However, I want to note that it's possible to add internet search to Claude 3. For example, Perplexity.ai managed to do it, as they did with Claude 2.1.

3/n

2 years ago 1 0 1 0

1) GPT-4 has access to the internet. I use chatbots every day (ChatGPT Plus and Microsoft Copilot Pro, which are based on GPT-4), and the ability to search the internet is absolutely essential for me.

2/n

2 years ago 1 0 1 0

Recently, Claude 3 was released. Although it shows greater results on many benchmarks compared to GPT-4, I argue that GPT-4 is probably better overall. I have two reasons for this:

1/n

2 years ago 7 0 1 0

But in the end, if the model really works as well as they explain, it must be a new seminal work.

8/8

2 years ago 0 0 0 0

- Also, some people claim that they didn't cite two previous important works: arxiv.org/abs/1602.02830 (binarized NNs) and arxiv.org/abs/1609.00222 (ternary NNs).
- And most importantly, the code is not available yet, so we can't be sure it's really that good.

7/n

2 years ago 0 0 1 0

- As far as I understand, they still store gradients and the optimizer in high precision, so the difference in size during training is not that big (not 2.71 times at least). I took this information from "BitNet: Scaling 1-bit Transformers for Large Language Models."

6/n

2 years ago 0 0 1 0

- The paper does not provide detailed comparisons of hyperparameters, which could impact the performance evaluation between BitNet and LLaMA.

5/n

2 years ago 0 0 1 0

- The datasets used for training both models (BitNet and LLaMA) are reported to be the same, which helps ensure a fair comparison.
- The architectures are different, but I didn't delve into the details, so I can't say whether it's significant.

4/n

2 years ago 0 0 1 0

They compare it with LLaMA. Also, given Microsoft's involvement, the research likely adheres to high standards of quality and rigor. I quickly checked the paper to find something that would show the results are embellished but didn't find anything suspicious:

3/n

2 years ago 0 0 1 0

They managed to reduce every parameter in the model to 1.58 bits (except for activations, which are 8-bit), whereas normally it's 16 bits. They claim their model matches the 16-bit models in both perplexity and end-task performance while being faster.

2/n

2 years ago 0 0 1 0

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single...

Recently, "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits" (arxiv.org/abs/2402.17764) paper was published. The results are amazing, although I have a bit of skepticism because the results seem too good to be true.

1/n

2 years ago 1 0 1 0

Docker Alexander Shlyapin’s wiki

I have written instructions on my wiki detailing how to install and utilize Docker for training and inferencing LLMs using Hugging Face transformers: wiki.shlyapin.com/docker_train...

2 years ago 1 0 0 0

Moreover, you receive no shares in this company, and the total sum donated by various people amounts to $130 million dollars. I can’t even comprehend how this is legal.

3/3

2 years ago 0 0 0 0

Imagine you donated money to a charity for children with cancer, and then the organization shifted from non-profit to for-profit, essentially abandoning their mission to help these children.

2/n

2 years ago 0 0 1 0

Elon Musk is suing Sam Altman and OpenAI because they transitioned from a non-profit to a for-profit organization, despite Elon’s donation to the foundation under the premise that it was non-profit and committed to developing open-source AI. I tend to agree with Elon.

1/n

2 years ago 0 0 1 0

Just a reminder: you should not trust any chatbot screenshots without a link to the conversation.

2 years ago 2 0 0 0

In the end, everything remains as it is, and I stay as an LLM engineer.

5/5

2 years ago 1 0 0 0

I've been thinking about this lately but haven't yet figured out how to change my life considering this new information. Should I go into coal mining? Become a cleaner? Work in a factory? Become a waiter? It doesn't seem like a logical solution.

4/n

2 years ago 0 0 1 0

It turns out that a programmer's job is easier for AI than a taxi driver's job. This led me to think. I always believed that a programmer's job (and other intellectual work) would be replaced after non-intellectual work, but it turns out to be the opposite.

3/n

2 years ago 0 0 1 0

For example, ChatGPT can already code at a junior level and write texts at a professional level, whereas autonomous vehicles are not yet fully developed (although Waymo has already launched a taxi service in San Francisco).

2/n

2 years ago 0 0 1 0

I recently came across this: en.m.wikipedia.org/wiki/Moravec.... It claims that, contrary to popular belief, intellectual labor will be replaced by AI first, followed by manual labor. And their logic really makes sense.

1/n

2 years ago 3 0 1 0

The backlash against Google continues. Earlier, Google disabled image generation on Gemini, but users continued to check for biases and inaccuracies in Gemini (using text) and other Google services.

2 years ago 1 0 0 0

What's interesting about the recent release of Sora is that it revealed how much society is anti-AI. I am presenting to you three posts against AI (and particularly against Sora) that received more likes—205K, 155K, and 150K—than the official OpenAI video, which received 141K likes.

2 years ago 1 0 0 0

Microsoft introduced LongRoPE, a method to increase the context window of LLMs to 2M tokens. They tested this method on the Mistral and LLaMA2 models, demonstrating that the models do not lose performance on short-context benchmarks.
arxiv.org/abs/2402.13753

2 years ago 1 0 0 0

Gemini has been observed exhibiting biases when generating images related to history. This issue arises from the application of Reinforcement Learning from Human Feedback (RLHF).
(The screenshots are not mine).

2 years ago 0 0 0 0

Google has released the Gemma models in 2B and 7B sizes. The 7B model surpasses the performance of Llama-2 13B. However, there is no comparison with Mistral 7B on their page. blog.google/technology/d...

2 years ago 0 0 0 0

The introduction of LoraLand features 25 fine-tuned Mistral-7B models that outperform GPT-4. They are served on a single A100. The training cost is approximately $200. The downside is that you need to manually select a model for each prompt. predibase.com/blog/lora-la...

2 years ago 2 0 0 0

Posts by Alexander Shlyapin