[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] [Hobbit] URLPlus interest - looking for feedback



On Fri, Jul 31, 2009 at 11:34 AM, Gary Baluha <gumby3203 (at) gmail.com> wrote:

> On Fri, Jul 31, 2009 at 4:57 AM, Ralph Mitchell <ralphmitchell (at) gmail.com>wrote:
>
>> I could really have used something like your feature request about 6 years
>> ago.  Instead I spent a lot of time handcrafting bash scripts to login to
>> web pages.
>>
>
> Yep, that's kind of how URLPlus got started in the first place ;-)
>
>
>> Don't get me started on the sites that hit you with 5 different types of
>> redirects before reaching the front page, or the sites where each input
>> field is held in it's own personal form. and the submit button executes
>> javascript to copy the values into form full of hidden fields for the actual
>> submittal.
>>
>
> The redirect issue actually isn't too difficult to work around.  I have
> been working on a perl program that is capable of more in-depth session
> management than URLPlus is currently capable of, and the solution I'm using
> now seems to work pretty well.  My goal is to eventually convert URLPlus
> from using a command-line curl solution, to my current one.  This new method
> deals with multi-page redirects better.
>

It's not so much the multi-page redirects using the standard "302: page is
now elsewhere" format, as the other weird ways redirects are sometimes done.
 The one that irritated me the most did all of these, in no particular
order:

   1) meta-refresh with zero time delay and a new url

   2) self-submitting form - i.e. a preloaded form with "form.submit();" at
the end of the html, between script tags

   3) self-submitting form - another preloaded form, but with
"onLoad=form.submit();" in the html BODY tag

   4) in script tags, change the page location via:   top.location="newurl"

   5) as above, but use "top.href", or "page.href" or something similar.

I'm not knocking your efforts - you've already done more than I ever did
towards a generic webpage check.  I just think that the above are going to
be tricky to handle in an automated way without replicating a large fraction
of a web browser.  But, now at least they're documented in the mailing list
for anyone interested in doing their own web checks...  :)


> As for the javascript part, that is a bit more difficult.
>

Especially when the page you just downloaded creates the form POST url
on-the-fly from some of the form elements filled in by the user.  Yep, saw
that happen too...  Another weird page ran a java function to generate a
random character string to include in the url - luckily the function wasn't
too hard to extract and shove through the spidermonkey javascript
interpreter...  :)

Ralph Mitchell