So, I have Python a script here at work that needs to use urllib to grab some pages from a site where HTTP basic access authentication is used. I had to work through some issues on my own after reading the code, and after not finding many references on Google decided to document it here in case someone else wants it. There were two basic problems I had to figure out.
urllib.urlopen()to request a page that is protected by HTTP basic auth in IDLE or some other interactive prompt, you are prompted to enter your user name and password using the
prompt_user_passwd()function defined in the urllib.FancyURLopener class. If you want to automate your login to the web server, you have to override this method to return the user name and password.
http_error_401()method as well to timeout. Fortunately, we can use the urllib.FancyURLopener attribute
maxtrieswhich is defined on instantiation as a limit to the number of authentication tries in the case of an incorrect password. This attribute is originally used by the
http_error_302()method to prevent infinite-looping due to redirect recursion. We could end up with slightly less tries to authenticate if we go through a few redirects before getting the 401 error requiring us to authenticate, but since we only need one successful try at authenticating it shouldn’t be a big deal.
So basically, we’ll create our own class, inheriting from urllib.FancyURLopener, and overflow those two methods. The code, with the salient bits highlighted:
class basicAuth(urllib.FancyURLopener): def prompt_user_passwd(self, host, realm): return "our_username", "our_password" def http_error_401(self, url, fp, errcode, errmsg, headers, data=None): """Error 401 -- authentication required. This function supports Basic authentication only.""" self.tries += 1 if self.maxtries and self.tries >= self.maxtries: self.tries = 0 return self.http_error_default(url, fp, 500, "HTTPS Basic Auth timed out after "+str(self.maxtries)+" attempts.", headers) if not 'www-authenticate' in headers: URLopener.http_error_default(self, url, fp, errcode, errmsg, headers) stuff = headers['www-authenticate'] import re match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff) if not match: URLopener.http_error_default(self, url, fp, errcode, errmsg, headers) scheme, realm = match.groups() if scheme.lower() != 'basic': URLopener.http_error_default(self, url, fp, errcode, errmsg, headers) name = 'retry_' + self.type + '_basic_auth' if data is None: return getattr(self,name)(url, realm) else: self.tries = 0 return getattr(self,name)(url, realm, data)