August 27, 2007

Screen scraping with jQuery

A test case in my work requires a complete list of HTML elements and a list of self-closing elements (e.g. <br/>).

The W3C Index of HTML 4 Elements lists all defined elements in a table. For each row with an "E" in the Empty column, the corresponding element doesn't need a closing tag (and thus is self-closing).

With two lines of jQuery code in the Firebug console, I got the lists I wanted. Here is how:

To get all elements

$.map($('table tr:gt(0) a'), function(e) {return $.trim($(e).text());})

To get all self-closing elements (formatted for readability)

$.map(
$('table tr:gt(0)').filter(function() {
return $(this).find('td:nth-child(4)').text() == 'E';
}),
function(e){return $.trim($(e).find('td:first-child').text());});
$.trim() is needed because the HTML source contains \n in the Name column.
This demonstrates a handy usage of jQuery as a hacking tool. Another excellent demonstration can be found here.
You can add jQuery to the current page using the jQuerify bookmarklet.
Happy jQuerifying!