Half an IP address and the fix for the Yahoo Search API 999 error
The Yahoo Search API has been driving me nuts. I started to get 999 errors on all the calls I was making from a new app I was developing (”999 Rate Limit Exceeded” was returned in the response). What was really driving me nuts was that I was only making a handful of calls per day - I knew I wasn’t exceeding the 5000 rate limit advertised by Yahoo for the Search API.
FYI - skip to the end for the fix…
Since the rate limit is applied per IP address I convinced myself that my problem was due to somebody else’s app running on the same server. Not ideal but it would be a downside of shared hosting. So I immediately signed up for a unique IP. The next day was spent figuring out why this didn’t make any difference. Turns out a unique IP on shared hosting isn’t actually a complete IP - it’s only half an IP. Unique/dedicated IPs are generally provided on shared hosting for SSL. This only requires incoming requests for your site to be directed to a unique IP address. However outgoing requests (eg from a PHP script) do *not* originate from the dedicated IP but from the IP address of the shared Apache instance. Ugh. Understandable in hindsight but it totally didn’t meet my expectations. The upgrade to a private server or virtual private server would get me a “whole” dedicated IP… maybe when this app starts making cash I’ll spring for it.
Anyway, after figuring out the above, spending an hour setting up routes through my various firewalls, installing a proxy (CCProxy worked perfectly) I was able to pass the request through my laptop and hence originate it from a different IP address which I knew for sure wouldn’t be rate limited.
Guess what. Still getting 999 errors. However copying the URL to my browser returns the entire set of results immediately. Same IP same request, no? Ugh! Something crucial had to be different between the request coming from my browser and my script. After experimenting with every cURL option in the known universe I finally figured out the undocumented API requirement.
The Yahoo Search API now requires the HTTP User Agent to be set. Eg:
curl_setopt($session,CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
This appears to be a new undocumented requirement but it does match the User Agent requirement listed against the shopping API. I tried various User Agent strings and just about anything seems to be accepted but Yahoo suggests faking a commonly used User Agent string for the shopping API so it’s probably best to stick with a browser UA.
Anyway, worked like a treat. The stupid little things always suck up the most time.
January 8th, 2009 at 8:58 am
I got same problem for backlinks using yahoo api. It does not returns any results through php code but when i try in browser with the yahoo api url it works and it returns default 50 urls.
Please have a look my code and suggest or correct my code. Thank you.
Link Text Explorer
#sortby {
display:none;
}
Logged in as | Logout
Location:
Tools
>
Backlinks
>
Link Text Explorer
Link Text Explorer
The title, link text and number of external links is retrieved from each page, together with the page PageRank and domain PageRank (if required).
<form action=”" method=”get”>
URL<input type=”text” name=”query” size=”60″ value=”" />
Results
<option value=”" >
PageRank
<input type=”checkbox” name=”pagepr” id=”page” />Page
<input type=”checkbox” name=”domainpr” id=”domain” />Domain
<?php
function get_html ($url) {
$html = “”;
$timeout = 10;
ini_set(’user_agent’,'Mozilla: (compatible; Windows XP)’);
$old = ini_set(’default_socket_timeout’, $timeout);
$fh = fopen($url, ‘r’);
if ($fh) {
ini_set(’default_socket_timeout’, $old);
stream_set_timeout($fh, $timeout);
stream_set_blocking($fh, 0);
while (! feof($fh)) {
$html .= fread($fh, 4096);
}
fclose($fh);
return $html;
} else {
return 0;
}
}
function get_pageinfo($url, $query) {
$html = get_html ($url);
if ($html) {
$pageinfo['success'] = 1;
$html = preg_replace(’/\n/’, ‘ ‘, $html);
$pattern = ‘##im’;
preg_match_all($pattern, $html, $matches);
$pageinfo['ExternalLinks'] = 0;
$linktext = “”;
if ($matches) {
foreach ($matches[0] as $match) {
if ( preg_match(”#$query#i”, $match) ) {
$text = “”;
if ( preg_match(’#<img#im’, $match) ) {
$text .= “[IMG]“;
if ( preg_match(’#alt\s*=\s*”(.*?)”#im’, $match, $alt) ) {
$text .= ” $alt[1]“;
}
} else {
$text = strip_tags($match);
}
$pageinfo['LinkText'] = $text;
}
if ( preg_match(”#http://#i”, $match) ) {
$pageinfo['ExternalLinks']++;
}
}
}
} else {
$pageinfo['success'] = 0;
}
return $pageinfo;
}
if ( isset($_GET['query']) ) {
require_once (’pagerank2.php’);
?>
Results
Notes
[IMG] denotes link is an image. The text following is the ALT text.
[X] denotes the page could not be loaded.
<?php
flush();
echo “$query has a PageRank of “.trim(getrank($query)).”.\n”;
flush();
#mryahoodemo
$params = array( “appid” => “mryahoodemo”,
“query” => $query,
“results” => $num,
“start” => $start,
“omit_inlinks” => “domain”
);
$request = “”;
foreach ($params as $param => $value) {
$request .= “$param=$value&”;
}
$yahoo_api =”http://api.search.yahoo.com/WebSearchService/V1/webSearch?”;
#$yahoo_api=”http://search.yahooapis.com/SiteExplorerService/V1/inlinkData?”;
#$yahoo_api = “http://api.search.yahoo.com/SiteExplorerService/V1/inlinkData?”;
$ResultSet = simplexml_load_file ( urlencode($yahoo_api.$request) );
if ($ResultSet) {
$totalResultsAvailable = $ResultSet['totalResultsAvailable'];
$totalResultsReturned = $ResultSet['totalResultsReturned'];
$firstResultPosition = $ResultSet['firstResultPosition'];
$lastResultPosition = $firstResultPosition + $params['results'] - 1;
if ($totalResultsReturned > $totalResultsAvailable) {
$lastResultPosition = $totalResultsAvailable;
}
echo “Results $firstResultPosition - $lastResultPosition of about “.number_format($totalResultsAvailable).” for $query.\n”;
?>
Sort by:
#
External Links
Page PageRank
Domain PageRank
#
URL
Visit
Title
Link Text
External Links
Page PageRank
Domain PageRank
Result as $Result) {
// InLinks
$Title = $Result->Title;
$Url = $Result->Url;
$ClickUrl = $Result->ClickUrl;
// PageInfo
$PageInfo = get_pageinfo($Url, $query);
if ($PageInfo['success']) {
$success = “”;
} else {
$success = “[X]“;
}
// Link Text
$LinkText = “”;
if ($PageInfo['success']) {
if (isset($PageInfo['LinkText'])) {
$LinkText = $PageInfo['LinkText'];
} else {
$LinkText = “[NULL]“;
}
}
if ($LinkText != “[NULL]“) {
if (++$count%2) { $class = “even”; } else { $class = “odd”; }
echo “”;
echo “”.$count.”\n”;
echo “”.preg_replace(’#^http://#’, ”, wordwrap($Url, 25, ” “, 1)).”";
echo “”.$success.”";
echo “[visit]“;
echo “”.$Title.”";
// External Links
$ExternalLinks = $PageInfo['ExternalLinks'];
echo “”.$LinkText.”";
echo “”.$ExternalLinks.”";
// Page PageRank
if ($pagepr) {
$PageRank = getrank($Url);
if (!isset($PageRank)) {$PageRank = 0;}
echo “”.$PageRank.”";
}
// Domain PageRank
if ($domainpr) {
$domain_array = explode(”/”, preg_replace(’#^http[s]*://#’, ”, $Url));
$domain = $domain_array[0];
$PageRank2 = getrank($domain);
if (!isset($PageRank2)) {$PageRank2 = 0;}
echo “”.$PageRank2.”";
}
flush();
}
}
?>
var results = new SortableTable(document.getElementById(”results_table”),
["Number", "CaseInsensitiveString", "CaseInsensitiveString", "CaseInsensitiveString","CaseInsensitiveString", "CaseInsensitiveString", "Number", "Number", "Number"]);
document.getElementById(”sortby”).style.display = “block”;
document.getElementById(”progress”).style.display = “none”;
function addClassName(el, sClassName) {
var s = el.className;
var p = s.split(” “);
var l = p.length;
for (var i = 0; i < l; i++) {
if (p[i] == sClassName)
return;
}
p[p.length] = sClassName;
el.className = p.join(” “).replace( /(^\s+)|(\s+$)/g, “” );
}
function removeClassName(el, sClassName) {
var s = el.className;
var p = s.split(” “);
var np = [];
var l = p.length;
var j = 0;
for (var i = 0; i < l; i++) {
if (p[i] != sClassName)
np[j++] = p[i];
}
el.className = np.join(” “).replace( /(^\s+)|(\s+$)/g, “” );
}
results.onsort = function () {
var rows = this.tBody.rows;
var l = rows.length;
for (var i = 0; i < l; i++) {
removeClassName(rows[i], i % 2 ? “even” : “odd”);
addClassName(rows[i], i % 2 ? “odd” : “even”);
}
};
© Intelligent Positioning
January 10th, 2009 at 7:07 am
Thanks, this saved me a ton of time troubleshooting this bug!
January 18th, 2009 at 3:55 pm
Hemachander, you have a lot of functionality in this one file. I suggest you break it down and output the raw results you’re getting back from the call - including the HTTP code to see what’s going on.
January 18th, 2009 at 3:55 pm
Ryan - no problem - glad it helped!
February 24th, 2009 at 4:47 am
Hi Jonathan,
Well done for finding the solution. I had a feeling it could be the User Agent but I had it set at Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0) which, after a few calls, gave the dreaded 999 response. Your user agent fixed it. Cheers!
February 24th, 2009 at 9:42 am
Great bug tracking (Y)
Thanks a lot for the post.
March 21st, 2009 at 9:26 am
James and mmfoscar - thanks for leaving a comment - glad it helped!!