CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

New
First poster: bot
10 bad TypeScript habits to break this year. TypeScript and JavaScript have steadily evolved over the last years, and some of the habits...
New
First poster: bot
I suspect you’ve already have Emscripten (https://emscripten.org/) installed. Make sure, you’ve included the proper paths, so you can iss...
New
First poster: bot
How the TypeScript Compiler Compiles - understanding the compiler internal. A systems-level look at the TypeScript compiler. How it conv...
New
First poster: OvermindDL1
Rust Is The Future of JavaScript Infrastructure – Lee Robinson. Why is Rust being used to replace parts of the JavaScript web ecosystem ...
New
First poster: bot
How Prime Video updates its app for more than 8,000 device types. The switch to WebAssembly increases stability, speed.
New
CommunityNews
The proposal “Pipe operator (|>) for JavaScript” (by J. S. Choi, James DiGioia, Ron Buckton and Tab Atkins) introduces a new operator....
/js
New
First poster: bot
Assertion Functions in TypeScript. TypeScript 3.7 implemented support for assertion functions in the type system. An assertion function ...
New
First poster: bot
GitHub - Qovery/Replibyte: Seed your development database with real data :zap:. Seed your development database with real data :zap:. Con...
New
First poster: joeb
Exploring how React’s dominance by default stifles frontend innovation, and why deliberate framework choices lead to better tools for per...
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New