Advertisement · 728 × 90
#
Hashtag
#Parsing
Advertisement · 728 × 90

I do not see why a recursive-descent parser cannot have numeric precedences like a Pratt parser. You just move the order of productions freely, and let any production call any other.

Am I missing something?

#parsing #programming

0 0 0 0
Preview
OpenUI - The Open Standard for Generative UI OpenUI is a full-stack Generative UI framework with a compact streaming-first language, a React runtime with built-in components, and ready-to-use chat interfaces - using up to 67% fewer tokens than JSON.

Rewriting Our Rust Wasm Parser in TypeScript, by (not on Mastodon or Bluesky):

https://www.openui.com/blog/rust-wasm-parser

#migrating #parsing #rust #typescript

2 0 1 0

Detecting generalization deficits in large language and reasoning models by using natural variati...

Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev

Action editor: Elahe Arani

https://openreview.net/forum?id=frA7uYn2um

#generalization #parsing #ai

0 0 0 0

"what is actually happening is that your brain starts parsing the sentence, and it makes a statistical prediction of what most likely comes next."
A bit like LLMs but more intelligently.
#Language #Parsing

2 0 1 0

New #J2C Certification:

$\texttt{SEM-CTRL}$: Semantically Controlled Decoding

Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo

https://openreview.net/forum?id=ICUHKhOISN

#parsing #semantics #semantic

0 0 0 0
No Semicolons Needed | Terts Diepraam

A nice survey of #ProgrammingLanguages that allow you to omit semicolons:

“No Semicolons Needed”, Terts Diepraam (terts.dev/blog/no-semi...).

Via HN: news.ycombinator.com/item?id=4747...

On Lobsters: lobste.rs/s/09wmcz/no_...

#Programming #PLDI #Parsing #Lexing #Syntax #Grammar #Compilers

0 0 0 0

Reposting for the syntactic ambiguity #psycholiguistics #parsing

0 0 0 0
Preview
Intro to Compiler Theory - Part 1 This is a comprehensive introduction to compiler theory and the systematic process of translating high-level programming languages into machine-executable code. We outline the modular architecture of a compiler, divided into a frontend for source analysis and a backend for target code synthesis. Key phases described include lexical analysis, where text is converted into a token stream, and syntactic analysis, which generates an abstract syntax tree. Then, we further explore semantic analysis, intermediate code generation, and various optimization techniques designed to improve program efficiency. Additionally, we help define the mathematical foundations of language processing, such as regular expressions, finite automata, and the use of symbol tables to manage program identifiers.

📣 New Podcast! "Intro to Compiler Theory - Part 1" on @Spreaker #compiler #compilers #computers #connectedcomponents #cs #engineering #parsing #regex #science #stem

3 0 0 0
research!rsc: Floating-Point Printing and Parsing Can Be Simple And Fast (Floating Point Formatting, Part 3)

Another excellent post 👌🏽 from Russ Cox 👇🏽🫡:

“Floating-Point Printing And Parsing Can Be Simple And Fast” (research.swtch.com/fp).

On HN: news.ycombinator.com/item?id=4668...

On Lobsters: lobste.rs/s/nbsclr/flo...

#Programming #Math #FloatingPoint #Numbers #PLDI #Parsing #Printing

0 0 0 0
Post image

Maybe I'm just parsing it the wrong way, but could you get away with this restaurant name anywhere other Glasgow? And it's in the fashionable area of Finnieston, too!

#glasgow #keepglasgowweird #parsing #finnieston #scotland

55 8 3 0

Physics of Language Models: Part 1, Learning Hierarchical Language Structures

Zeyuan Allen-Zhu, Yuanzhi Li

Action editor: Jonathan Berant

https://openreview.net/forum?id=mPQKyzkA1K

#parsing #parse #grammars

2 0 0 0
Rewriting pycparser with the help of an LLM - Eli Bendersky's website

Rewriting pycparser with the help of an LLM pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1] ). It's a pure-Python parser for the C programming lang...

#misc #Python #Machine #Learning #Compilation #Recursive #descent #parsing

Origin | Interest | Match

0 0 0 0
Preview
A Percise Parser Thomas worked for a company based in Germany which was looking to expand internationally. Once they started servicing other locales, things started to break. It didn't take long to track the problem down to a very "percise" numeric parser. handleInput( value ){ let value_ = value; if( value…

A Percise Parser, by @remyporter.bsky.social:

https://thedailywtf.com/articles/a-percise-parser

#javascript #parsing

1 0 0 0
Explainer: Tree-sitter vs. LSP I got asked a good question today: what is the difference between Tree-sitter and a language server? I don’t understand how either of these tools work in depth, so I’m just going to explain from an ob...

“Explainer: Tree-sitter Vs. LSP”, Ashton Wiersdorf (lambdaland.org/posts/2026-0...).

Via HN: news.ycombinator.com/item?id=4671...

On Lobsters: lobste.rs/s/qhickw/exp...

#LSP #TreeSitter #LanguageServerProtocol #Editors #SyntaxHighlighting #Parsers #Parsing

0 0 0 0
Physics of Language Models: Part 1, Learning Hierarchical Language Structures

Physics of Language Models: Part 1, Learning Hierarchical Language Structures

New #TMLR-Paper-with-Video:

Physics of Language Models: Part 1, Learning Hierarchical Language Structures

Zeyuan Allen-Zhu, Yuanzhi Li

https://tmlr.infinite-conf.org/paper_pages/mPQKyzkA1K

#parsing #parse #grammars

0 0 0 0
Awakari App

Why Parsing Is the Real Foundation of Document AI Whenever we talk about building systems around PDFs, resume parsers, invoice extractors, RAG pipelines, or document-based chatbots, most… Continu...

#rags #llm #machine-learning #data-engineering #parsing

Origin | Interest | Match

1 0 0 0
Post image

[Перевод] Как работают современные браузеры. Часть 2 Веб-разработчики нередко воспринимают браузер как «чер...

#browser #chrome #chromium #parsing #timeweb_статьи_перевод #парсинг #браузер #internals #внутреннее #устройство #compilation

Origin | Interest | Match

0 0 0 0

Recursive descent parsing is highlighted as a simple, effective technique for building parsers. Modern tools and LLMs can even assist in generating code, balancing simplicity with the need for more complex language features. #Parsing 3/6

0 0 1 0
Revisiting "Let's Build a Compiler" Comments

Revisiting "Let's Build a Compiler" There's an old compiler-building tutorial that has become part of the field's lore: the Let's Build a Compiler series by Jack Crenshaw (p...

#misc #Compilation #Python #Recursive #descent #parsing

Origin | Interest | Match

0 0 0 0
Stop writing if statements for your CLI flags If you've built CLI tools, you've written code like this: if (opts.reporter === "junit" && !opts.outputFile) { throw new Error("--output-file is required for junit reporter"); } if (opts.reporter === "html" && !opts.outputFile) { throw new Error("--output-file is required for html reporter"); } if (opts.reporter === "console" && opts.outputFile) { console.warn("--output-file is ignored for console reporter"); } A few months ago, I wrote _Stop writing CLI validation. Parse it right the first time._ about parsing individual option values correctly. But it didn't cover the _relationships_ between options. In the code above, `--output-file` only makes sense when `--reporter` is `junit` or `html`. When it's `console`, the option shouldn't exist at all. We're using TypeScript. We have a powerful type system. And yet, here we are, writing runtime checks that the compiler can't help with. Every time we add a new reporter type, we need to remember to update these checks. Every time we refactor, we hope we didn't miss one. ## The state of TypeScript CLI parsers The old guard—Commander, yargs, minimist—were built before TypeScript became mainstream. They give you bags of strings and leave type safety as an exercise for the reader. But we've made progress. Modern TypeScript-first libraries like cmd-ts and Clipanion (the library powering Yarn Berry) take types seriously: // cmd-ts const app = command({ args: { reporter: option({ type: string, long: 'reporter' }), outputFile: option({ type: string, long: 'output-file' }), }, handler: (args) => { // args.reporter: string // args.outputFile: string }, }); // Clipanion class TestCommand extends Command { reporter = Option.String('--reporter'); outputFile = Option.String('--output-file'); } These libraries infer types for individual options. `--port` is a `number`. `--verbose` is a `boolean`. That's real progress. But here's what they can't do: express that `--output-file` is required _when_ `--reporter` is `junit`, and forbidden _when_ `--reporter` is `console`. The relationship between options isn't captured in the type system. So you end up writing validation code anyway: handler: (args) => { // Both cmd-ts and Clipanion need this if (args.reporter === "junit" && !args.outputFile) { throw new Error("--output-file required for junit"); } // args.outputFile is still string | undefined // TypeScript doesn't know it's definitely string when reporter is "junit" } Rust's clap and Python's Click have `requires` and `conflicts_with` attributes, but those are runtime checks too. They don't change the result type. If the parser configuration knows about option relationships, why doesn't that knowledge show up in the result type? ## Modeling relationships with `conditional()` Optique treats option relationships as a first-class concept. Here's the test reporter scenario: import { conditional, object } from "@optique/core/constructs"; import { option } from "@optique/core/primitives"; import { choice, string } from "@optique/core/valueparser"; import { run } from "@optique/run"; const parser = conditional( option("--reporter", choice(["console", "junit", "html"])), { console: object({}), junit: object({ outputFile: option("--output-file", string()), }), html: object({ outputFile: option("--output-file", string()), openBrowser: option("--open-browser"), }), } ); const [reporter, config] = run(parser); The `conditional()` combinator takes a discriminator option (`--reporter`) and a map of branches. Each branch defines what other options are valid for that discriminator value. TypeScript infers the result type automatically: type Result = | ["console", {}] | ["junit", { outputFile: string }] | ["html", { outputFile: string; openBrowser: boolean }]; When `reporter` is `"junit"`, `outputFile` is `string`—not `string | undefined`. The relationship is encoded in the type. Now your business logic gets real type safety: const [reporter, config] = run(parser); switch (reporter) { case "console": runWithConsoleOutput(); break; case "junit": // TypeScript knows config.outputFile is string writeJUnitReport(config.outputFile); break; case "html": // TypeScript knows config.outputFile and config.openBrowser exist writeHtmlReport(config.outputFile); if (config.openBrowser) openInBrowser(config.outputFile); break; } No validation code. No runtime checks. If you add a new reporter type and forget to handle it in the switch, the compiler tells you. ## A more complex example: database connections Test reporters are a nice example, but let's try something with more variation. Database connection strings: myapp --db=sqlite --file=./data.db myapp --db=postgres --host=localhost --port=5432 --user=admin myapp --db=mysql --host=localhost --port=3306 --user=root --ssl Each database type needs completely different options: * SQLite just needs a file path * PostgreSQL needs host, port, user, and optionally password * MySQL needs host, port, user, and has an SSL flag Here's how you model this: import { conditional, object } from "@optique/core/constructs"; import { withDefault, optional } from "@optique/core/modifiers"; import { option } from "@optique/core/primitives"; import { choice, string, integer } from "@optique/core/valueparser"; const dbParser = conditional( option("--db", choice(["sqlite", "postgres", "mysql"])), { sqlite: object({ file: option("--file", string()), }), postgres: object({ host: option("--host", string()), port: withDefault(option("--port", integer()), 5432), user: option("--user", string()), password: optional(option("--password", string())), }), mysql: object({ host: option("--host", string()), port: withDefault(option("--port", integer()), 3306), user: option("--user", string()), ssl: option("--ssl"), }), } ); The inferred type: type DbConfig = | ["sqlite", { file: string }] | ["postgres", { host: string; port: number; user: string; password?: string }] | ["mysql", { host: string; port: number; user: string; ssl: boolean }]; Notice the details: PostgreSQL defaults to port 5432, MySQL to 3306. PostgreSQL has an optional password, MySQL has an SSL flag. Each database type has exactly the options it needs—no more, no less. With this structure, writing `dbConfig.ssl` when the mode is `sqlite` isn't a runtime error—it's a compile-time impossibility. Try expressing this with `requires_if` attributes. You can't. The relationships are too rich. ## The pattern is everywhere Once you see it, you find this pattern in many CLI tools: **Authentication modes:** const authParser = conditional( option("--auth", choice(["none", "basic", "token", "oauth"])), { none: object({}), basic: object({ username: option("--username", string()), password: option("--password", string()), }), token: object({ token: option("--token", string()), }), oauth: object({ clientId: option("--client-id", string()), clientSecret: option("--client-secret", string()), tokenUrl: option("--token-url", url()), }), } ); **Deployment targets** , **output formats** , **connection protocols** —anywhere you have a mode selector that determines what other options are valid. ## Why `conditional()` exists Optique already has an `or()` combinator for mutually exclusive alternatives. Why do we need `conditional()`? The `or()` combinator distinguishes branches based on **structure** —which options are present. It works well for subcommands like `git commit` vs `git push`, where the arguments differ completely. But in the reporter example, the structure is identical: every branch has a `--reporter` flag. The difference lies in the flag's _value_ , not its presence. // This won't work as intended const parser = or( object({ reporter: option("--reporter", choice(["console"])) }), object({ reporter: option("--reporter", choice(["junit", "html"])), outputFile: option("--output-file", string()) }), ); When you pass `--reporter junit`, `or()` tries to pick a branch based on what options are present. Both branches have `--reporter`, so it can't distinguish them structurally. `conditional()` solves this by reading the discriminator's value first, then selecting the appropriate branch. It bridges the gap between structural parsing and value-based decisions. ## _The structure is the constraint_ Instead of parsing options into a loose type and then validating relationships, define a parser whose structure _is_ the constraint. Traditional approach | Optique approach ---|--- Parse → Validate → Use | Parse (with constraints) → Use Types and validation logic maintained separately | Types reflect the constraints Mismatches found at runtime | Mismatches found at compile time The parser definition becomes the single source of truth. Add a new reporter type? The parser definition changes, the inferred type changes, and the compiler shows you everywhere that needs updating. ## Try it If this resonates with a CLI you're building: * Documentation * Tutorial * `conditional()` reference * GitHub Next time you're about to write an `if` statement checking option relationships, ask: could the parser express this constraint instead? _The structure of your parser is the constraint._ You might not need that validation code at all.
1 1 0 1
How Modern Browsers Work

How Modern Browsers Work, by @addyosmani@indieweb.social:

https://archive.ph/Kj62o

#browsers #chromium #network #parsing #painting

2 1 0 0
Preview
Userbot + ИИ: За гранью парсинга, как Telegram-юзербот и нейросеть помогают искать тренды и боли Задача — не просто спарсить сообщения из Telegram-каналов. Задача — научиться вылавливать из потока обсуждений актуальные тренды, боли клиентов и рабочие лайфхаки. Это золотая жила для...

Userbot + ИИ: За гранью парсинга, как Telegram-юзербот и нейросеть помогают искать тренды и боли Задача — не просто с...

#telegram #telegram #bot #parsing #ai #bot

Origin | Interest | Match

0 0 0 0
Post image

[Перевод] Как работают современные браузеры. Часть 1 Веб-разработчики нередко воспринимают браузер как "ч...

#browser #chrome #chromium #parsing #браузер #парсинг #timeweb_статьи_перевод #internals #внутреннее #устройство #compilation

Origin | Interest | Match

0 0 0 0
pdf_reference_extraction_tool_6i2hj.jpg

pdf_reference_extraction_tool_6i2hj.jpg

AI Tool to Extract References From PDFS and Format Citations

Read More - www.bestfreewebsitetools.com/ai-tool-to-extract-refer...

#citation #formatting #pdf #parsing #reference #extraction

0 0 0 0
Preview
Release Pyparsing 3.3.0b1 · pyparsing/pyparsing (added in 3.3.0b1) Implemented a TINY language parser/interpreter using pyparsing, in the examples/tiny directory. This is a little tutorial language that I used to demonstrate how to use pyparsi...

I just published pyparsing version 3.3.0b1, with some significant additions:
- example implementation of the TINY language
- performance tests with scripts to run and tabulate results using pyparsing 3.1-3.3 and Python 3.9-3.14

Github link: github.com/pyparsing/py...

#pyparsing #python #parsing

1 0 1 0

Anyone happen to have got ALL the comments out of a source/header file with libclang (in C or via bindings, not using the C++ tooling)? I'm only getting some of the comments and I'd just love to hear the rest are in there and it IS just me! Grateful for boosts. #c #clang #parsing

0 0 0 0
URL parser tester

bsky URL parsing #TEST
#parse #parsing #parser

timothygu.me/urltester/#i...

0 0 0 0
List of Internet top-level domains - Wikipedia

bsky URL parsing #TEST
#parse #parsing #parser

en.wikipedia.org/wiki/List_of...

0 0 0 0
Example Domain

bsky URL parsing #TEST
#parse #parsing #parser

example.com
https://example<.>com
example”.”com > xn--example-b46c.xn--com-6o0a
https://example%.%com
example#.#com
example{.}com
https://example|.|com
example\.\com > example/com
https://example^.^com
example~.~com
https://example[.]com
example`.`com

0 0 0 0