CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

First poster: bot
WebAssembly and Back Again: Fine-Grained Sandboxing in Firefox 95 – Mozilla Hacks - the Web developer blog. In Firefox 95, we’re shippin...
New
First poster: bot
GitHub - eeue56/derw: An Elm-inspired language that transpiles to TypeScript. An Elm-inspired language that transpiles to TypeScript - G...
New
First poster: bot
Misusing TypeScript Assertion Functions for Fun and Profit — Sympolymathesy, by Chris Krycho. TypeScript’s assertions functions (asserts...
New
First poster: bot
Starting a blank Jekyll site with Tailwind CSS in 2022. Most websites I build start off as a blank Jekyll site with Tailwind CSS on top.
New
First poster: bot
Enarx 0.6.0: Fushimi Castle. This release is a developer-only, preview release. It is not production ready. We hope that you will experi...
New
First poster: bot
NASA’s DART Deploys Camera Probe Ahead of Asteroid Impact. Called LICIACube, the Italian-built probe will take images of DART’s impact w...
New
First poster: bot
Will Serving Real HTML Content Make A Website Faster? Let’s Experiment! - WebPageTest Blog. In this post, Scott runs WebPageTest Pro Exp...
New
First poster: bot
mod_wasm: run WebAssembly with Apache. mod_wasm is a new module to run WebAssembly (Wasm) modules in Apache httpd
New
First poster: bot
A Game Engine in the Elm Style!. A ‘Nu’ way to make games! The Nu Game Engine was the world’s first practical, purely-functional game en...
New
First poster: bot
How to write your first unit test in JavaScript | Snyk. Testing code is the first step to making it secure. One of the best ways to do t...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1045 20892 392
New
PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
New
New
New
AnfaengerAlex
Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New