Veselin Raychev (@veselinr) Bsky

INSAIT creates leading Bulgarian-first LLM with Gemma 2 | Google AI for Developers INSAIT sets a new standard for AI models in Eastern Europe by building BgGPT with Gemma 2.

A Google blog about BgGPT

ai.google.dev/gemma/gemmav...

1 year ago 1 0 0 0

We are also calling for community contributions to BaxBench on our GitHub!

🏆 Website & Leaderboard: baxbench.com
💾 Code Repository: github.com/logic-star-ai/…
📄Paper: arxiv.org/abs/2502.11844

1 year ago 0 0 0 0

We evaluate Claude 3.7 with 64k thinking tokens on BaxBench, and find that it now tops our leaderboard with 38% correct and secure generation rate. But when instructing the models with security specifications OpenAI o1 is again the best model.

1 year ago 1 0 1 0

BaxBench: Can LLMs Generate Secure and Correct Backends? We introduce a novel benchmark to evaluate LLMs on secure and correct code generation, showing that even flagship LLMs are not ready for coding automation, frequently generating insecure or incorrect ...

LLMs are great at generating code, but the real test is creating production-ready applications. With BaxBench we tried to answer the question how often functionally correct app backends are generated and how often they contain security vulnerabilities.
BaxBench.com - led by @markvero.bsky.social

1 year ago 1 0 0 0

LogicStar is building AI agents for app maintenance Swiss startup LogicStar is bent on joining the AI agent game. The summer 2024-founded startup has bagged $3 million in pre-seed funding to bring tools to the developer market that can do autonomous maintenance of software applications, rather than the…

LogicStar is building AI agents for app maintenance

1 year ago 10 1 0 0

How does Snyk DCAIF Work under the hood? | Snyk Read our technical deep-dive into how Snyk's DCAIF works. To start, with Snyk's Deep Code AI Fix, simply register for a Snyk account here, enable DeepCode AI Fix in your Snyk settings, and start relia...

How to effectively fix vulnerabilities in code.

1 have the scanner confirm if it is fixed. Not just LLM hallucinations
2 have a fast scanner that can be used in Delta debugging to check what lines are affecting the results
3 all working in the IDE speed

snyk.co/uhJ48

1 year ago 7 0 0 0

State-of-the-art Bulgarian LLMs State-of-the-art generative AI created for the Bulgarian government, users, public and private organizations

The new bggpt is here. Based on Gemma2. The large 27B model is on par with gpt4o with gpt4o used as a judge.

models.bggpt.ai/blog/

1 year ago 6 0 0 0

I think there are people here, but not so much content. So, getting the good content as much as we can put

1 year ago 4 0 0 0

Our continuous pretraining method for LLMs that reduces forgetting from the base model was presented last week at EMNLP. Soon, some really strong models are coming.

arxiv.org/abs/2407.08699

1 year ago 2 1 0 0

Posts by Veselin Raychev