CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

First poster: bot
I suspect you’ve already have Emscripten (https://emscripten.org/) installed. Make sure, you’ve included the proper paths, so you can iss...
New
First poster: bot
WebAssembly and Back Again: Fine-Grained Sandboxing in Firefox 95 – Mozilla Hacks - the Web developer blog. In Firefox 95, we’re shippin...
New
First poster: bot
GitHub - eeue56/derw: An Elm-inspired language that transpiles to TypeScript. An Elm-inspired language that transpiles to TypeScript - G...
New
First poster: bot
Libsodium has been fully supporting WebAssembly as a target for quite a long time. This includes its built-in benchmark suite, that can r...
New
First poster: bot
Recommendations when publishing a Wasm library. A set of recommendations to make publishing a Javascript library that uses Wasm internal...
New
First poster: bot
Arquero. Query processing and transformation of array-backed data tables.
/js
New
First poster: bot
ESLint equivalents in Elm. Comparing ESLint functionality and the equivalents in the Elm ecosystem
New
First poster: bot
TypeScript Typelevel Tic-Tac-Toe: Overkill edition!. A fully functioning, dynamically sized, Tic Tac Toe Game with a UI, all on the type...
New
First poster: bot
Threads and messages with Rust and WebAssembly. How and why to share threads in WASM workers (and when not to)
New
First poster: bot
Speeding up the JavaScript ecosystem - one library at a time. Most popular libraries can be sped up by avoiding unnecessary type convers...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1040 20280 387
New
AstonJ
Or looking forward to? :nerd_face:
483 11975 256
New
AstonJ
SpaceVim seems to be gaining in features and popularity and I just wondered how it compares with SpaceMacs in 2020 - anyone have any thou...
New
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
PragmaticBookshelf
Author Spotlight: Karl Stolley @karlstolley Logic! Rhetoric! Prag! Wow, what a combination. In this spotlight, we sit down with Karl ...
New
PragmaticBookshelf
Author Spotlight: Peter Ullrich @PJUllrich Data is at the core of every business, but it is useless if nobody can access and analyze ...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New