javascript – What Does 'Then' Really Mean in CasperJS-ThrowExceptions

Exception or error:

I’m using CasperJS to automate a series of clicks, completed forms, parsing data, etc through a website.

Casper seems to be organized into a list of preset steps in the form of then statements (see their example here: http://casperjs.org/quickstart.html) but it’s unclear what triggers the next statement to actually run.

For example, does then wait for all pending requests to complete? Does injectJS count as a pending request? What happens if I have a then statement nested – chained to the end of an open statement?

casper.thenOpen('http://example.com/list', function(){
    casper.page.injectJs('/libs/jquery.js');
    casper.evaluate(function(){
        var id = jQuery("span:contains('"+itemName+"')").closest("tr").find("input:first").val();
        casper.open("http://example.com/show/"+id); //what if 'then' was added here?
    });
});

casper.then(function(){
    //parse the 'show' page
});

I’m looking for a technical explanation of how the flow works in CasperJS. My specific problem is that my last then statement (above) runs before my casper.open statement & I don’t know why.

How to solve:

then() basically adds a new navigation step in a stack. A step is a javascript function which can do two different things:

  1. waiting for the previous step – if any – being executed
  2. waiting for a requested url and related page to load

Let’s take a simple navigation scenario:

var casper = require('casper').create();

casper.start();

casper.then(function step1() {
    this.echo('this is step one');
});

casper.then(function step2() {
    this.echo('this is step two');
});

casper.thenOpen('http://google.com/', function step3() {
    this.echo('this is step 3 (google.com is loaded)');
});

You can print out all the created steps within the stack like this:

require('utils').dump(casper.steps.map(function(step) {
    return step.toString();
}));

That gives:

$ casperjs test-steps.js
[
    "function step1() { this.echo('this is step one'); }",
    "function step2() { this.echo('this is step two'); }",
    "function _step() { this.open(location, settings); }",
    "function step3() { this.echo('this is step 3 (google.com is loaded)'); }"
]

Notice the _step() function which has been added automatically by CasperJS to load the url for us; when the url is loaded, the next step available in the stack — which is step3() — is called.

When you have defined your navigation steps, run() executes them one by one sequentially:

casper.run();

Footnote: the callback/listener stuff is an implementation of the Promise pattern.

###

then() merely registers a series of steps.

run() and its family of runner functions, callbacks, and listeners, are all what actually do the work of executing each step.

Whenever a step is completed, CasperJS will check against 3 flags: pendingWait, loadInProgress, and navigationRequested. If any of those flags is true, then do nothing, go idle until a later time (setInterval style). If none of those flags is true, then the next step will get executed.

As of CasperJS 1.0.0-RC4, a flaw exists, where, under certain time-based circumstances, the “try to do next step” method will be triggered before CasperJS had the time to raise either one of the loadInProgress or navigationRequested flags. The solution is to raise one of those flags before leaving any step where those flags are expected to be raised (ex: raise a flag either before or after asking for a casper.click()), maybe like so:

(Note: This is only illustrative, more like psuedocode than proper CasperJS form…)

step_one = function(){
    casper.click(/* something */);
    do_whatever_you_want()
    casper.click(/* something else */); // Click something else, why not?
    more_magic_that_you_like()
    here_be_dragons()
    // Raise a flag before exiting this "step"
    profit()
}

To wrap up that solution into a single-line of code, I introduced blockStep() in this github pull request, extending click() and clickLabel() as a means to help guarantee that we get the expected behaviour when using then(). Check out the request for more info, usage patterns, and minimum test files.

###

According to the CasperJS Documentation:

then()

Signature: then(Function then)

This method is the standard way to add a new navigation step to the stack, by providing a simple function:

casper.start('http://google.fr/');

casper.then(function() {
  this.echo('I\'m in your google.');
});

casper.then(function() {
  this.echo('Now, let me write something');
});

casper.then(function() {
  this.echo('Oh well.');
});

casper.run();

You can add as many steps as you need. Note that the current Casper instance automatically binds the this keyword for you within step functions.

To run all the steps you defined, call the run() method, and voila.

Note: You must start() the casper instance in order to use the then() method.

Warning: Step functions added to then() are processed in two different cases:

  1. when the previous step function has been executed,
  2. when the previous main HTTP request has been executed and the page loaded;

Note that there’s no single definition of page loaded; is it when the DOMReady event has been triggered? Is it “all requests being finished”? Is it “all application logic being performed”? Or “all elements being rendered”? The answer always depends on the context. Hence why you’re encouraged to always use the waitFor() family methods to keep explicit control on what you actually expect.

A common trick is to use waitForSelector():

casper.start('http://my.website.com/');

casper.waitForSelector('#plop', function() {
  this.echo('I\'m sure #plop is available in the DOM');
});

casper.run();

Behind the scenes, the source code for Casper.prototype.then is shown below:

/**
 * Schedules the next step in the navigation process.
 *
 * @param  function  step  A function to be called as a step
 * @return Casper
 */
Casper.prototype.then = function then(step) {
    "use strict";
    this.checkStarted();
    if (!utils.isFunction(step)) {
        throw new CasperError("You can only define a step as a function");
    }
    // check if casper is running
    if (this.checker === null) {
        // append step to the end of the queue
        step.level = 0;
        this.steps.push(step);
    } else {
        // insert substep a level deeper
        try {
            step.level = this.steps[this.step - 1].level + 1;
        } catch (e) {
            step.level = 0;
        }
        var insertIndex = this.step;
        while (this.steps[insertIndex] && step.level === this.steps[insertIndex].level) {
            insertIndex++;
        }
        this.steps.splice(insertIndex, 0, step);
    }
    this.emit('step.added', step);
    return this;
};

Explanation:

In other words, then() schedules the next step in the navigation process.

When then() is called, it is passed a function as a parameter which is to be called as a step.

It checks if an instance has started, and if it has not, it displays the following error:

CasperError: Casper is not started, can't execute `then()`.

Next, it checks if the page object is null.

If the condition is true, Casper creates a new page object.

After that, then() validates the step parameter to check if it is not a function.

If the parameter is not a function, it displays the following error:

CasperError: You can only define a step as a function

Then, the function checks if Casper is running.

If Casper is not running, then() appends the step to the end of the queue.

Otherwise, if Casper is running, it inserts a substep a level deeper than the previous step.

Finally, the then() function concludes by emitting a step.added event, and returns the Casper object.

Leave a Reply

Your email address will not be published. Required fields are marked *