CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

First poster: KnowledgeIsPower
Hi all, With excitement we’re sharing today that Vue.js is Wikimedia Foundation’s official choice for adoption as future JavaScript fr...
/js
New
First poster: bot
How Prime Video updates its app for more than 8,000 device types. The switch to WebAssembly increases stability, speed.
New
First poster: bot
Pay attention to WebAssembly. WebAssembly is at an inflection point. Over the next few years, I expect to see increased adoption of WebA...
New
First poster: bot
Building a JavaScript Bundler. Learn the basics of building a JavaScript bundler.
/js
New
First poster: bot
Full Stack Dart . Chris Swan discusses using a stack of Dart, where Flutter developers can use the same language to build the services b...
New
First poster: bot
So, Wasm lets you write code in the language of your choice and run it anywhere. What makes it interesting?
New
First poster: bot
GitHub - astrodon/astrodon: Make Desktop apps with Deno :sauropod:. Make Desktop apps with Deno :sauropod:. Contribute to astrodon/astro...
New
First poster: bot
Type-Level TypeScript — Introduction. A course to take your TypeScript skills to the next level!
New
First poster: bot
The new wave of Javascript web frameworks. Make sense of the proliferation of new Javascript web frameworks. A deep dive into the proble...
New
First poster: bot
mod_wasm: run WebAssembly with Apache. mod_wasm is a new module to run WebAssembly (Wasm) modules in Apache httpd
New

Other popular topics Top

AstonJ
SpaceVim seems to be gaining in features and popularity and I just wondered how it compares with SpaceMacs in 2020 - anyone have any thou...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
PragmaticBookshelf
Author Spotlight James Stanier @jstanier James Stanier, author of Effective Remote Work , discusses how to rethink the office as we e...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
PragmaticBookshelf
Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New