Friday, September 28, 2012

Jquery cross domain AJAX requests and HTML parsing list of image src using JQUERY MAP

Recently I have been working with Wikipedia, Wikimedia and had to do AJAX requests and parsing the returned HTML.

Javascript and JQUERY have problems when you try an AJAX query to cross-domain sites. I read the following article by Micahs

Making Cross Domain jQuery AJAX Calls and followed the method suggested using YQL and seems to work fine. I have repeated below the code mentioned in the article for reference.

//feel free to add querystring vars to this
var myurl="http://www.example.com/get-post.ashx?var1=var1value&callback=?";
//make the call to YQL
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
"q=select%20*%20from%20html%20where%20url%3D%22"+
encodeURIComponent(myurl)+
"%22&format=xml'&callback=?",
function(data){
if(data.results[0]){
//this data.results[0] is the return object you work with,
//if you actually want to do something with the returned json
alert(data.results[0]);
} else {
var errormsg = '
Error: could not load the page.
'
;
//output to firebug's console
//use alert() for other browsers/setups
console.log(errormsg);
}
}
);



I have given below an example of parsing all the image source in a list of images.



The list item code snippet is given below



    <div class="thumb" style="width: 150px;"><div style="margin:15px auto;"><a href="//commons.wikimedia.org/wiki/File:Adrien_Ysenbrandt_-_Virgin_and_Child_with_Two_Angels_in_a_Landscape_-_Walters_37266.jpg" class="image"><img alt="Adrien Ysenbrandt - Virgin and Child with Two Angels in a Landscape - Walters 37266.jpg" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/49/Adrien_Ysenbrandt_-_Virgin_and_Child_with_Two_Angels_in_a_Landscape_-_Walters_37266.jpg/102px-Adrien_Ysenbrandt_-_Virgin_and_Child_with_Two_Angels_in_a_Landscape_-_Walters_37266.jpg" width="102" height="120" /></a></div></div>


After alert(data.results[0]); you can do something like the following to get the  thumb html in the list:



 var thumb_array_img = $(('.thumb img'), result);


To get all the src links in an array use the JQUERY map function like the following code:



image_thumb_array = [];


image_thumb_array = $(thumb_array_img).map(function ()

                   { return $(this).attr("src"); });

Wikimedia Wikipedia parsing data and machine readable code useful links

The Wikipedia, Wikimedia API has lot of information but not well documented. I have been working with that to get some data and parse it. I found the following links very helpful.

  1. API:Main page
  2. MediaWiki API documentation page
  3. API sandbox
  4. Commons:Machine-readable data
  5. Category:Infobox templates
  6. What is the Full URL of the images returned by a wikipedia query...
  7. API:Query
  8. Call Wikipedia API using jQuery
  9. Manual:Parameters to index.php
  10. Template:Information
  11. API:Properties
  12. API:Lists

Wikipedida Wikimedia query returns 403 forbidden

I had the following webclient request 
 WebClient serviceRequest = new WebClient();
var JsonResponse = serviceRequest.DownloadString(new Uri(urselected));

Where urselected was the REST query string. The query worked fine in my browser but returned 403 forbidden error. I found the following post which addresses the problem:

Wikipedia query returns error 403

Modified my code as follows and it worked find:

WebClient serviceRequest = new WebClient();
           serviceRequest.Headers.Add("user-agent", "Silver Azure");

           var JsonResponse = serviceRequest.DownloadString(new Uri(urselected));

Friday, September 7, 2012

Wikipedia and Wikimedia commons html content

Wikipedia and Wikimedia Commons have excellent information. I am building a few applications using the information from these sites. I wanted to get some portions of the html markup, for example the image source, title, thumbnail etc. I tried the Media Wiki API and was able to get the file names by a query like this.

http://commons.wikimedia.org/w/api.php?format=xml&action=query&list=categorymembers&cmtitle=Category:Coins_in_the_Walters_Art_Museum

This produced the following output

<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="File:Ethiopian - Coin Depicting an Anonymous King - Walters 59793 - Obverse.jpg" ns="6" pageid="18842729"/><cm title="File:Ethiopian - Coin Depicting an Anonymous King - Walters 59793 - Reverse.jpg" ns="6" pageid="18842732"/><cm title="File:Ethiopian - One of Two Coins Depicting Ousanas and an Anonymous King - Walters 59794.jpg" ns="6" pageid="18809697"/><cm title="File:Greek - Apollo - Walters 59533.jpg" ns="6" pageid="18787772"/><cm title="File:Greek - Athena - Walters 59519 - Obverse.jpg" ns="6" pageid="18801612"/><cm title="File:Greek - Athena - Walters 59519 - Reverse.jpg" ns="6" pageid="18801616"/><cm title="File:Greek - Athena - Walters 59702 - Back.jpg" ns="6" pageid="18787788"/><cm title="File:Greek - Persephone - Walters 59693.jpg" ns="6" pageid="18787786"/><cm title="File:Greek - Tetradrachme with King Nicodemus II - Walters 59723 - Back.jpg" ns="6" pageid="18787795"/><cm title="File:Matthes Gebel - Medal of Arnold and Nicholas Wenck - Walters 59480 - Obverse.jpg" ns="6" pageid="18839416"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|7e524f4d414e202d20434f494e2057495448204120484950504f504f54414d555320414e4420504f525452414954204f46204f544143494c494120534556455241202d2057414c54455253203539373531202d204241434b2e4a50470a524f4d414e202d20434f494e2057495448204120484950504f504f54414d555320414e4420504f525452414954204f46204f544143494c494120534556455241202d2057414c54455253203539373531202d204241434b2e4a5047|18787799"/></query-continue></api>


You can use the file name to build a url like the following to get the image info



http://commons.wikimedia.org/wiki/File:Egyptian_-_Pectoral_-_Walters_42199.jpg



However I wanted to get the title, thumbnail url etc. After some research I hit on this discussion How to get HTML content text of a Wikipedia Page (via Wikipedia API)? and then this page Manual:Parameters to index.php.



This led to the following query which gives all the relevant html markup: http://commons.wikimedia.org/wiki/Category:Coins_in_the_Walters_Art_Museum?action=render

Sunday, September 2, 2012

Wikimedia Commons queries to get image data

I am developing application to get images from Wikimedia Commons which is a big repository of excellent images. The query API is not well documented. I researched and came up with the following queries to get the file locations of images.

Suppose I want to get the images in the Category Flags of United States, I will do the following:

http://commons.wikimedia.org/w/api.php?format=xml&action=query&list=categorymembers&cmlimit=10&cmtitle=Category:benthos

which produces the following XML output for the first 10 items.

<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="Category:Porifera" ns="14" pageid="13371"/><cm title="Category:Riftia pachyptila" ns="14" pageid="8805563"/><cm title="File:1991 benthos-achtern02 hg.jpg" ns="6" pageid="3321991"/><cm title="File:Abra alba.jpg" ns="6" pageid="333239"/><cm title="File:Ammonia tepida.jpg" ns="6" pageid="706636"/><cm title="File:Aonides paucibranchiata.jpg" ns="6" pageid="396315"/><cm title="File:Aphrodita aculeata (Sea mouse).jpg" ns="6" pageid="3610614"/><cm title="File:Aphrodita aculeata.jpg" ns="6" pageid="1525478"/><cm title="File:Arenicole système.jpg" ns="6" pageid="4258545"/><cm title="File:Asterias rubens.jpg" ns="6" pageid="364648"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|42454e5448494320464f52414d494e4946455241532e4a5047|2080936"/></query-continue></api>


to  get an image you have to use the



http://commons.wikimedia.org/wiki/File:Aonides%20paucibranchiata.jpg will take you to the Aonides paucibranchiata image!



if you query

http://commons.wikimedia.org/w/api.php?format=xml&action=query&list=categorymembers&cmlimit=10&cmtitle=Category:benthos&cmcontinue=file|42454e5448494320464f52414d494e4946455241532e4a5047|2080936



you get the following output  which are the next 10 items, and so on



<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="File:Benthic foraminiferas.jpg" ns="6" pageid="2080936"/><cm title="File:Benthic GLERL 1.jpg" ns="6" pageid="2143998"/><cm title="File:Bentos kasarek.jpg" ns="6" pageid="16992357"/><cm title="File:Borstenwurmer des Meeres.png" ns="6" pageid="5229875"/><cm title="File:Branchiostoma lanceolatum.jpg" ns="6" pageid="5712836"/><cm title="File:Capitella capitata.jpg" ns="6" pageid="398359"/><cm title="File:Cirratulus cirratus.jpg" ns="6" pageid="398775"/><cm title="File:Crabmuss 600.jpg" ns="6" pageid="292191"/><cm title="File:Crangon crangon (dorsal).jpg" ns="6" pageid="10320338"/><cm title="File:Crangon crangon.jpg" ns="6" pageid="470098"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|4352494252494e4f50534953204645524e414c44492e4a5047|3307271"/></query-continue></api>


Other useful formats



format=json



format=jsonfm produces a html output of json format.



Reference




  1. API reference


  2. API:Main page


  3. API:Tutorial