August 27, 2007

Screen scraping with jQuery

A test case in my work requires a complete list of HTML elements and a list of self-closing elements (e.g. <br/>).

The W3C Index of HTML 4 Elements lists all defined elements in a table. For each row with an "E" in the Empty column, the corresponding element doesn't need a closing tag (and thus is self-closing).

With two lines of jQuery code in the Firebug console, I got the lists I wanted. Here is how:

To get all elements

$.map($('table tr:gt(0) a'), function(e) {return $.trim($(e).text());})

To get all self-closing elements (formatted for readability)

$.map(
$('table tr:gt(0)').filter(function() {
return $(this).find('td:nth-child(4)').text() == 'E';
}),
function(e){return $.trim($(e).find('td:first-child').text());});
$.trim() is needed because the HTML source contains \n in the Name column.
This demonstrates a handy usage of jQuery as a hacking tool. Another excellent demonstration can be found here.
You can add jQuery to the current page using the jQuerify bookmarklet.
Happy jQuerifying!

July 26, 2007

Ruby code: Finding the closest point

We want to fetch an avatar of specific size. How to find the closest size from all available sizes? Suppose available sizes are 16, 48 and 96.

An obvious version is

  def closest_size(size)
case
when size < 32
16
when size < 72
48
else
96
end
end

The value used in comparison tests is the average of two neighboring candidate sizes. The problem is that when our available sizes change, we need to add or remove when clauses and recalculate average values.


Here is a smarter way

def closest_size(size)
points = [16, 48, 96]
distances = points.map {|a| (size - a).abs}
points[distances.index(distances.min)]
end

It finds the point with shortest distance to the given size. Now if we want to change candidate sizes, we only need to change the array literal. Further, we can even pass the candidate array as an argument.

Rails: Error calling Dispatcher.dispatch...

I ran into a strange issue with one of my rails projects today - the browser displayed html source code in plain text instead of rendering. Nothing is listed in Firebug's net tab. Strange! Later I found the server was always sending back HTTP 404 with Content-type: text/plain. In the log, there were many lines of "Error calling Dispatcher.dispatch #<NameError: cannot remove Object::Handler>".

Finally, I found out that in a controller, someone has put include xxx at file level and in that module he defined class Handler. So that was it. After moving include xxx into the controller class definition, everything went well.

File level include (include outside class/module definition) actually affects Object - the mother of all ruby elves. So it's something we should definitely avoid in real world projects.

Never include outside class/module definition.

BTW, Rails is notoriously good at giving error messages unrelated to the cause of the problem. This is largely due to the dynamic nature of Ruby.

May 6, 2007

Ubuntu 7.04/8.04 and Mouse Wheel

I've just installed unbuntu 7.04 in VMWare. Everything is cool except that the mouse wheel stops working.

sudo gedit /etc/X11/xorg.conf

Find the mouse section which may look like

Section "InputDevice"
Identifier "Configured Mouse"
Driver "mouse"
Option "CorePointer"
Option "Device" "/dev/input/mice"
Option "Protocol" "ps/2"
Option "ZAxisMapping" "4 5"
Option "Emulate3Buttons" "true"
EndSection

Change Options "Protocol" "ps/2" to Option "Protocol" "IMPS/2"

Save the file and restart X (ctrl + alt + backspace).

It works for me. Happy scrolling!

UPDATE: 2008-8-18

Now with Ubuntu 8.04 running in VMWare Workstation 6.04 on Windows Server 2008...

The working config for me is as below:

Section "InputDevice"
Identifier "Configured Mouse"
Driver "vmmouse"
Option "CorePointer"
Option "Device" "/dev/input/mice"
Option "Protocol" "ImPS/2"
Option "ZAxisMapping" "4 5"
EndSection

The mouse cursor moves very smoothly crosses hosting and virtual machine seamlessly.

April 30, 2007

eval() bug in IE

eval('function(){}') evaluates to undefined in IE, the same for eval('(function(){})').

How to get the Function?

eval('[function(){}][0]')

IE, you always let me down!

February 10, 2007

Tricky alternatives to toString() and parseInt()

JavaScript is Loosely typed

JavaScript is dynamically typed, making things extremely flexible. Suppose you have a text field for inputting birth year and want to greet to people born after 1984, simply write

var birthYear = document.getElementById('txtBirthYear').value;
if (birthYear > 1984) {
alert('Greetings!');
}


When JavaScript sees you compare a String with a Number, it automatically converts the String to Number and the perform comparison.



But sometimes the ambiguity of type causes problems. 1 + '1' evaluates to '11' instead of 2. This may cause hard to find bugs. Douglas Crockford categorizes "the overloading of + to mean both addition and concatenation with type coercion" as a design error of JavaScript in his famous article on The World's Most Misunderstood Programming Language.



Still we often need to convert data type between String and Number. To convert variable i to String, simply call i.toString(). To convert s to a Number, we use Number(s). This is nice and clear.



The empty string ('') concatenation trick, the plus sign (+) trick and the minus zero (- 0) trick



But for guys who want to squeeze every byte. There are tricky alternatives.



  • To convert x to String: x + ''
  • To convert x to Number: +x
  • To convert x to Number: x - 0




For examples,

1 + 2 + 3 //produces 6
//while
'1' + 2 + 3 //produces '123'
'1' - 0 + 2 + 3 //produces 6
'1' + '2' //produces '12'
//while
+'1' + +'2' //produces 3

Notice that +x and x-0 doesn't mean parseInt(x) or parseFloat(x), it doesn't do any further parsing.

parseInt('2007 is promising') //produces 2007
//while
+'2007 is promising' //produces NaN
Let's call them the empty string concatenation conversion trick, the plus sign and the minus zero trick. Both of the tricks sacrifice clarity and make code harder to understand.

February 6, 2007

Closure, eval and Function

eval() evaluates a string of JavaScript code. The Function constructor can be used to create a function from string. Someone says that the Function constructor is another form of eval(). However, one significant difference between eval() and the Function constructor is that while eval() keeps the lexical scope, the Function constructor always creates top-level functions.

function f1() {
var bbb = 2;
eval('alert(bbb);');
}
f1(); //alerts 2

function f2() {
var bbb = 2;
new Function('alert(bbb)')();
}
f2(); //bbb undefined error

function f3() {
var bbb = 2;
eval('function() {alert(bbb);}')();
}
f3(); //alerts 2

eval() inside a function body creates a closure while new Function() doesn't. This difference may not bother you for the whole lifetime. However, it happens to bother me once. It's about jQuery - a new type of JavaScript library. I'm using jQuery in my bookmarklet application. In order to make the code as unobtrusive as possible, I decided to put all my code including the jQuery code inside an anonymous function. It looks like this:

(function() {
//jQuery code
//my code
})();

In this way, even the jQuery object is just a local variable. The outside environment is completely unaffected. But $().parents(), $().children, $().prev(), $().next() and $().siblings() always fail in my code. These functions are created by the Function constructor in $.grep() and $.map().

// If a string is passed in for the function, make a function
// for it (a handy shortcut)
if ( typeof fn == "string" )
fn = new Function("a","i","return " + fn);

So they are all top-level and the identifier "jQuery" inside is resolved as window.jQuery which is undefined and the code fails.


We can implement an alternative to the Function constructor and use it within the lexical scope:

var createFunc = (function () {
var args = [].slice.call(arguments);
var body = args.pop();
return eval('function(' + args.join(',') + ') {' + body + '}');
}).toString();

function f4() {
var bbb = 2;
eval(createFunc)('alert(bbb);')();
}
f4(); //alerts 2
You can use eval(createFunc) just like new Function(), but you get the bonus lexical scope binding.
function f6() {
var add = function(a, b) {return a + b;};
return eval(createFunc)('x', 'y', 'return add(x, y);');
}
f6()(3, 5); //8

At last, I quote Douglas Crockford's words on eval() and the Function constructor


"eval is Evil


The eval function is the most misused feature of JavaScript. Avoid it.


eval has aliases. Do not use the Function constructor. Do not pass strings to setTimeout or setInterval. "

January 19, 2007

Several Controversial Points in Pro JavaScript Techniques

I'm previewing the the ultimate JavaScript book for the modern web developer. It's a great book. I strongly recommend you read it and I'm sure that you'll thank me. To make it even better, I'd like to point out and discuss several controversial points.

A side effect of the anonymous function scope induction trick

At the end of Chapter 2 >> Privileged Methods

Listing 2-25. Example of Dynamically Generated Methods That Are Created When a New Object Is Instantiated

// Create a new user object that accepts an object of properties
function User( properties ) {
// Iterate through the properties of the object, and make sure
// that it's properly scoped (as discussed previously)
for ( var i in properties ) { (function(){
// Create a new getter for the property
this[ "get" + i ] = function() {
return properties[i];
};
// Create a new setter for the property
this[ "set" + i ] = function(val) {
properties[i] = val;
};
})(); }
}
// Create a new user object instance and pass in an object of
// properties to seed it with
var user = new User({
name: "Bob",
age: 44
});
// Just note that the name property does not exist, as it's private
// within the properties object
alert( user.name == null );
// However, we're able to access its value using the new getname()
// method, that was dynamically generated
alert( user.getname() == "Bob" );
// Finally, we can see that it's possible to set and get the age using
// the newly generated functions
user.setage( 22 );
alert( user.getage() == 22 );

The example code won't work as expected. My test with Firefox 2.0.0.1 shows that the user.getname and user.getage are actually undefined. But window.getname and window.getage are there! The error is caused by the scope induction trick:
(function(){})(). Inside the anonymous function, the this variable somehow points to the window object! In the simplest case:

var o = {f: function() {(function(){alert(this === window);})();}}; o.f(); 
//alerts true (but false if you evaluate the whole line in Firebug 1.0b8)

Seems that the implementation treats anonymous functions as properties of the window object?


null, 0, ‘’, false, and undefined are NOT all equal (==) to each other


In Chapter 3 >> != and == vs. !== and ===


"...In JavaScript, null, 0, ‘’, false, and undefined are all equal (==) to each other, since they all evaluate to false... "

Listing 3-12. Examples of How != and == Differ from !== and ===
// Both of these are true
null == false
0 == undefined
// You should use !== or === instead
null !== false
false === false


Actually 0, '' and false all equal (==) to each other and null equals (==) to undefined but both null == false and undefined == false evaluate to false. This is reasonable because both null and undefined indicate "no value" while false is a valid value.


domReady Race Conditions


In Chapter 5 >> Figuring Out When the DOM Is Loaded
Listing 5-12. A Function for Watching the DOM Until It’s Ready

function domReady( f ) {
// If the DOM is already loaded, execute the function right away
if ( domReady.done ) return f();
// If we've already added a function
if ( domReady.timer ) {
// Add it to the list of functions to execute
domReady.ready.push( f );
} else {
// Attach an event for when the page finishes loading,
// just in case it finishes first. Uses addEvent.
addEvent( window, "load", isDOMReady );
// Initialize the array of functions to execute
domReady.ready = [ f ];
// Check to see if the DOM is ready as quickly as possible
domReady.timer = setInterval( isDOMReady, 13);
}
}

// Checks to see if the DOM is ready for navigation
function isDOMReady() {
// If we already figured out that the page is ready, ignore
if ( domReady.done ) return false;
// Check to see if a number of functions and elements are
// able to be accessed
if ( document && document.getElementsByTagName &&
document.getElementById && document.body ) {
// If they're ready, we can stop checking
clearInterval( domReady.timer );
domReady.timer = null;
// Execute all the functions that were waiting
for ( var i = 0; i < domReady.ready.length; i++ )
domReady.ready[i]();
// Remember that we're now done
domReady.ready = null;
domReady.done = true;
}
}

Notice that in the domReady function, isDOMReady is added as a handler of the "load" event of window. The purpose of this is to take advantages of browser caching cabability to gain extra speed. However, the extra gain here causes troubles. When I tried to use this domReady implementation in a GreaseMonkey user script, sometimes the onDOMReady handler gets triggered twice. It isn't always reproducable. But if you refresh the page 15 times, you can see the double triggering problem one or twice. The only possible cause is the addEvent line. So I commented out the line and tested again, as expected, everything went OK.


I looked carefully at the code to find a possible race condition in function isDOMReady. The function


  1. Checks domReady.done
  2. ClearInterval and call handlers if DOM is ready
  3. Mark domReady.done true
When a page gets cached by browser, the window "load" event and an interval event almost occur at the same time, resulting two threads of isDOMReady executing side by side. In case that one thread is in step 2 but before step 3 while the other is reaching step 1, the later will read domReady.done as false and proceed to step 2, causing every handler triggered a second time.

There are two ways to work around


  1. Remove the addEvent line and be happy without the extra speed gain
  2. Advance the domReady.done = true; line as early as possible (may reduce but can't eliminate race conditions)

Update Tue, 06 Feb 2007 09:16:03 GMT window.onload reopened


The domReady() function above will prematurelly execute the handler if document.write() is used. The document ready solution in jQuery is so far the most robust. But in IE, premature execution will occur if innerHTML modification is performed before the document finishes loading. So the window.onload problem is now reopened. Great effort has been made to solve the problems.


JavaScript closure and IE memory leak

2007-06-27 08:26:49 UTC Update: Microsoft has fixed IE memory leaks problem. KB929874

2006-10-24 +8 Update: According to IE 7 vs IE 6, IE 7 seems to have solved the memory leaks. Cheers!

2006-09-24 +8 Update: See the follow up

"Betty: Your umbrella leaks, Professor Boffin!"    ---- Look, Listen and Learn

IE leaks memory like a sieve and my web page is getting slower and slo...ower. But memory usage keeps climbing...

I've tried everything including banging my head on the desk. It just doesn't change anything.

I read the following articles and fell asleep.

The fact is that IE has separate garbage collecting mechanisms for COM interfaces and JavaScript objects and is unable to resolve circular references between DOM(or ActiveX or any kind of COM) objects and JavaScript objects. When objects form the two worlds have circular references between them, GC can not detect them and the memory cannot be reclaimed until IE exists. There are several patterns that cause circular references. Unfortunately, assigning nested functions as event handlers falls into this category. IE is rejecting the use of closure, one of the most powerful and flexible feature of JavaScript.

It is frustrating to know that experts at Microsoft like Eric Lippert suggest "Don't use closures unless you really need closure semantics. In most cases, non-nested functions are the right way to go." It sounds like that programmers are abusing closures, completely ignoring that IE has a big problem with closures. It would be relieving and reasonable to expect that IE will reclaim leaked memory after a page has been unloaded. But that's not the fact. And here I found some explanation: "...the application compatibility lab discovered that there were actually web pages that broke when those semantics were implemented.  (No, I don't know the details.) The IE team considers breaking existing web pages that used to work to be way, way worse than leaking a little memory here and there, so they've decided to take the hit and leak the memory in this case." I don't understand. Maintaining backward compatibility with rare web pages which even an scripting engine writer does not know much about at the cost of leaking memory possibly for all web pages? What kind of decision is it? They don't admit their own faults but instead suggest poor coding practices.

But we have to code for IE, however buggy it is. I've run the test from Mihai Bazon(see IE: WHERE'S MY MEMORY?).

function createEl(i) {     
var span = document.createElement("span");
span.className = "muci";
span.innerHTML = "&nbsp;foobar #"+i+"&nbsp;";
span.onclick = function() {
alert(this.innerHTML + "\n" + i);
};
document.body.appendChild(span);
}

function start() {
var T1 = (new Date()).getTime(); // DEBUG.PROFILE
for (var i = 0; i < 3000; ++i) createEl(i);
alert(((new Date()).getTime() - T1) / 1000); // DEBUG.PROFILE
}


The first request in IE took 1.797s, the tenth 8.063s. Memory usage kept growing. Firefox reports a typical value of 1.6xs with no memory leak. Prototype.js avoids IE memory leak by hooking window's unload event, unobserving all events and clearing its event handler cache. I replaced the line span.onclick = function() { alert(this.innerHTML + "\n" + i); };  with Event.observe(span, 'click', function() { alert(this.innerHTML + "\n" + i); }); and rerun the test. The good news is that the leaked memory in IE is reclaimed when the page unloads. The bad news is that each request takes approximately 17s in IE while Firefox only needs 2.1xs! The Prototype event system makes it possible to free memory when page unloads but is extremely slow and uses more memory in a single request.  The speed degradation is explainable: Event._observeAndCache saves extra references thus uses more memory and IE gets slow as it leaks memory. Event.observe does more things than a simple assignment thus is much slower. However, memory leak is under control... I admire Edward Dean's addEvent though it doesn't solve the memory leak problem with closures. (Dean insisted that his script does not leak memory in comments. Maybe he is not talking about the closure case). Leak Free JavaScript Closures solution can really prevent memory leak. The way it breaks circular reference inserting a new closure between the nested function and its closing scope. The fancy part is that when another level of closure is added, the inner closure can still access variables in its initial closing scope indirectly via scope chain without causing circular references between DOM objects and JavaScript objects. The way it breaks circular reference is create a new function which holds no references to the closing scope.



In the simplest case:

<script type="text/javascript">
//Holds references to functions
__funcs = [];
//The code fragment is for demonstrative purpose only and lacks of optimization.
//Do not use it in productive environment!
Function.prototype.closure = function() {
__funcs.push(this);
return function () {
return __funcs[__funcs.length - 1].apply(null, arguments);
};
};
function setup() {
var span = $('span1');
span.bigProperty = new Array(1000).join('-_-');
span.onclick = function() {
alert(span.bigProperty.length);
}.closure();
}
setup();
</script>


This will not leak memory in IE. The nested function in setup() forms a closure so it's able access span. span.onclick does not refer to the nested function but a newly created function returned by the closure() method of the nested function. The newly created function invokes the nested function via global array __funcs and have no references to the scope of setup(). So there is no circular reference. You may argue that the newly created function is able to call the nested function, it must have some kind of reference to it while the nested function have reference to span via closure, so there will be a circular reference. However, ECMA 262 treat this as a keyword rather than an identifier, this keyword is resolved dependent on execution context in which it occurs, without reference to the scope chain (see Javascript Closures ). A global array is used to hold references to closures without modifying the internal [[scope]] object. This is a hack indeed. IE always needs hacks to amend its holes.

Converting the arguments object to an Array

The arguments object is an Array-like object providing access to arguments passed to a function. When hacking with functions, we frequently need to manipulate and pass around arguments as an Array. The Prototype library uses it's $A() function to accomplish the conversion, which is intuitive and beautiful.

Function.prototype.bind = function() {
var __method = this, args = $A(arguments), object = args.shift();
return function() {
return __method.apply(object, args.concat($A(arguments)));
}
}

Yet there is a another way.

//... inside a function definition
var args = [].slice.call(arguments);

// or var args = Array.prototype.slice.call(arguments);
//...

The slice(start, end) method of an Array object returns a copy of a specified portion of the array. Here we omit the start and end arguments and call slice() upon the arguments object. An array containing all arguments is returned. Then we can modify the array as we need and pass it to the apply() method of functions.


Notice that slice() only does a shallow copy and the return value is just an array any without magical behavior of the arguments object. There is no args.callee property. Moreover, if a parameter is listed at the nth position in the parameter list of the function definition, arguments[n] is a synonym for the local variable corresponding to the nth argument. Any change made to arguments[n] will affect the local variable because it's actually modifying the named property of the call object. However, since args is just a shallow copy of arguments, assigning args[n] will not affect the local variable. To demonstrate this,


function f (a) {
    alert('a: ' + a);
    var args = [].slice.call(arguments);
    arguments[0] = 'Assigning arguments[0] affects a';
    alert('a: ' + a);
    args[0] = 'Assigning args[0] does not change a';
    alert('a: ' + a);
}
f('JavaScript rocks!');


The three alerts will display "a: JavaScript rocks", "a: Assigning arguments[0] affects a" and "a: Assigning arguments[0] affects a" in order.


Tip: [].slice.call() can be used for any Array like objects, not only limited to the arguments object.