CommunityNews

CommunityNews

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models.
Large language models (LLMs) have shown exceptional performance on a variety
of natural language tasks. Yet, their capabilities for HTML understanding –
i.e., parsing the raw HTML of a webpage, with applications to automation of
web-based tasks, crawling, and browser-assisted retrieval – have not been
fully explored. We contribute HTML understanding models (fine-tuned LLMs) and
an in-depth analysis of their capabilities under three tasks: (i) Semantic
Classification of HTML elements, (ii) Description Generation for HTML inputs,
and (iii) Autonomous Web Navigation of HTML pages. While previous work has
developed dedicated architectures and training procedures for HTML
understanding, we show that LLMs pretrained on standard natural language
corpora transfer remarkably well to HTML understanding tasks. For instance,
fine-tuned LLMs are 12% more accurate at semantic classification compared to
models trained exclusively on the task dataset. Moreover, when fine-tuned on
data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks
using 192x less data compared to the previous best supervised model. Out of the
LLMs we evaluate, we show evidence that T5-based models are ideal due to their
bidirectional encoder-decoder architecture. To promote further research on LLMs
for HTML understanding, we create and open-source a large-scale HTML dataset
distilled and auto-labeled from CommonCrawl.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular Frontend topics Top

First poster: bot
Assertion Functions in TypeScript. TypeScript 3.7 implemented support for assertion functions in the type system. An assertion function ...
New
CommunityNews
Abstract This document describes version 2.0 of the core WebAssembly standard, a safe, portable, low-level code format designed for effic...
New
First poster: bot
GitHub - phoboslab/q1k3: A tiny FPS for js13k. A tiny FPS for js13k. Contribute to phoboslab/q1k3 development by creating an account on ...
/js
New
First poster: bot
JavaScript Containers. The majority of server programs are Linux programs. They consist of a file system, some executable files, maybe s...
New
First poster: bot
I made minimal change to escape HTML in REST API using go reflection | Inspektor. This blog post explains about go reflection and some o...
New
CommunityNews
Introducing Signals – Preact. Signals are a way of expressing state that ensure apps stay fast regardless of how complex they get. Signa...
New
First poster: bot
Will Serving Real HTML Content Make A Website Faster? Let’s Experiment! - WebPageTest Blog. In this post, Scott runs WebPageTest Pro Exp...
New
First poster: bot
Writing composable SQL using JavaScript by Gajus Kuizinas. A walkthrough of common patterns of writing SQL queries in JavaScript with th...
New
CommunityNews
165+ JavaScript terms you need to know. Keep this JavaScript glossary bookmarked to reference variables, methods, strings, and more.
/js
New
First poster: bot
How to write your first unit test in JavaScript | Snyk. Testing code is the first step to making it secure. One of the best ways to do t...
New

Other popular topics Top

Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
PragmaticBookshelf
Design and develop sophisticated 2D games that are as much fun to make as they are to play. From particle effects and pathfinding to soci...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
PragmaticBookshelf
“Finding the Boundaries” Hero’s Journey with Noel Rappin @noelrappin Even when you’re ultimately right about what the future ho...
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
mafinar
Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out o...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New