CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

New
New
First poster: bot
Many years ago NRI adopted Elm as a frontend language. We started small with a disposable proof of concept, and as the engineering team i...
New
First poster: bot
A Proposal For Type Syntax in JavaScript. Today we’re excited to announce our support and collaboration on a new Stage 0 proposal to bri...
New
First poster: bot
Full Stack Dart . Chris Swan discusses using a stack of Dart, where Flutter developers can use the same language to build the services b...
New
First poster: bot
GitHub - astrodon/astrodon: Make Desktop apps with Deno :sauropod:. Make Desktop apps with Deno :sauropod:. Contribute to astrodon/astro...
New
First poster: bot
ESLint equivalents in Elm. Comparing ESLint functionality and the equivalents in the Elm ecosystem
New
First poster: bot
Chrome Browser Exploitation, Part 1: Introduction to V8 and JavaScript Internals. Web browsers, our extensive gateway to the internet. B...
New
First poster: bot
Welcome to the Open Source Seed Initiative - Open Source Seed Initiative. Today, only a handful of companies account for most of the wor...
New
First poster: bot
Speeding up the JavaScript ecosystem - one library at a time. Most popular libraries can be sped up by avoiding unnecessary type convers...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
AstonJ
I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...
New
AstonJ
Seems like a lot of people caught it - just wondered whether any of you did? As far as I know I didn’t, but it wouldn’t surprise me if I...
New
AstonJ
Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...
New
PragmaticBookshelf
Author Spotlight: Sophie DeBenedetto @SophieDeBenedetto The days of the traditional request-response web application are long gone, b...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New