CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Popular General Dev topics Top

Devtalk
Our jobs section is for full-time paid positions and for companies to post their own jobs (no consultancies or agencies, sorry!) - simply...
New
herminiotorres
Tell us what kind of machine/hardware do you have? and what kind of reason for? describe your hardware… :computer: :nerd_face:
New
AstonJ
Just been adding some more portals, currently have the following languages: Apache Groovy C C# C++ Clojure CoffeeScript Crystal ...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
AstonJ
Inspired by some of the comments in our https://forum.devtalk.com/t/your-vim-tips/4748 (in particular those by Mafinar and Hallski) …what...
New
ohm
I just switched jobs to tech lead with a small team of about 6 other developer. This is my first tech lead job. What do I need to know? A...
New
hskohli
We are looking to hire a freelancer to design the complete pipeline for the following problem statement: Hardware to be used: Jetson Na...
New
First poster: malloryerik
GitHub - hlissner/doom-emacs: An Emacs framework for the stubborn martian hacker. An Emacs framework for the stubborn martian hacker - G...
New
AstonJ
This was interesting: He’s definitely more of an Emacs fan (which is fine) and the thing I found interesting is how you wo...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New

Other popular topics Top

AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
AstonJ
In case anyone else is wondering why Ruby 3 doesn’t show when you do asdf list-all ruby :man_facepalming: do this first: asdf plugin-upd...
New
wmnnd
Here’s the story how one of the world’s first production deployments of LiveView came to be - and how trying to improve it almost caused ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New