CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

First poster: bot
Why do we need static blocks in classes? A more complicated example Details Support in engines for class static blocks Conclusion
/js
New
First poster: andrea
Etsy’s Journey to TypeScript. Over the past few years, Etsy’s Web Platform team has spent a lot of time bringing our frontend code up to...
New
First poster: bot
Creating 3D graphics on the web has never been easier or more accessible. Svelte Cubed lets you build state-driven Three.js scenes with m...
New
First poster: bot
Fixing a performance problem in Elm using Html.Lazy. How you can increase the performance of your Elm application using Html.Lazy, and w...
New
First poster: bot
GitHub - eeue56/derw: An Elm-inspired language that transpiles to TypeScript. An Elm-inspired language that transpiles to TypeScript - G...
New
First poster: bot
JavaScript Containers. The majority of server programs are Linux programs. They consist of a file system, some executable files, maybe s...
New
First poster: bot
The importance of designing accessibility in software from the ground up has only been emphasized by the pandemic, and as a consequence F...
New
First poster: bot
TIL: You Can Access A User’s Camera with Just HTML. So that’s the HTML capture attribute. It’s a pretty cool way to add a nicer user exp...
New
First poster: bot
Understanding HTML with Large Language Models. Large language models (LLMs) have shown exceptional performance on a variety of natural ...
New
First poster: bot
Chrome Browser Exploitation, Part 1: Introduction to V8 and JavaScript Internals. Web browsers, our extensive gateway to the internet. B...
New

Other popular topics Top

Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
Margaret
Hello content creators! Happy new year. What tech topics do you think will be the focus of 2021? My vote for one topic is ethics in tech...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
New
PragmaticBookshelf
Author Spotlight: Karl Stolley @karlstolley Logic! Rhetoric! Prag! Wow, what a combination. In this spotlight, we sit down with Karl ...
New
AstonJ
If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...
New