Friday, September 28, 2012

Jquery cross domain AJAX requests and HTML parsing list of image src using JQUERY MAP

Recently I have been working with Wikipedia, Wikimedia and had to do AJAX requests and parsing the returned HTML.

Javascript and JQUERY have problems when you try an AJAX query to cross-domain sites. I read the following article by Micahs

Making Cross Domain jQuery AJAX Calls and followed the method suggested using YQL and seems to work fine. I have repeated below the code mentioned in the article for reference.

//feel free to add querystring vars to this
var myurl="";
//make the call to YQL
//this data.results[0] is the return object you work with,
//if you actually want to do something with the returned json
} else {
var errormsg = '
Error: could not load the page.
//output to firebug's console
//use alert() for other browsers/setups

I have given below an example of parsing all the image source in a list of images.

The list item code snippet is given below

    <div class="thumb" style="width: 150px;"><div style="margin:15px auto;"><a href="//" class="image"><img alt="Adrien Ysenbrandt - Virgin and Child with Two Angels in a Landscape - Walters 37266.jpg" src="//" width="102" height="120" /></a></div></div>

After alert(data.results[0]); you can do something like the following to get the  thumb html in the list:

 var thumb_array_img = $(('.thumb img'), result);

To get all the src links in an array use the JQUERY map function like the following code:

image_thumb_array = [];

image_thumb_array = $(thumb_array_img).map(function ()

                   { return $(this).attr("src"); });

Wikimedia Wikipedia parsing data and machine readable code useful links

The Wikipedia, Wikimedia API has lot of information but not well documented. I have been working with that to get some data and parse it. I found the following links very helpful.

  1. API:Main page
  2. MediaWiki API documentation page
  3. API sandbox
  4. Commons:Machine-readable data
  5. Category:Infobox templates
  6. What is the Full URL of the images returned by a wikipedia query...
  7. API:Query
  8. Call Wikipedia API using jQuery
  9. Manual:Parameters to index.php
  10. Template:Information
  11. API:Properties
  12. API:Lists

Wikipedida Wikimedia query returns 403 forbidden

I had the following webclient request 
 WebClient serviceRequest = new WebClient();
var JsonResponse = serviceRequest.DownloadString(new Uri(urselected));

Where urselected was the REST query string. The query worked fine in my browser but returned 403 forbidden error. I found the following post which addresses the problem:

Wikipedia query returns error 403

Modified my code as follows and it worked find:

WebClient serviceRequest = new WebClient();
           serviceRequest.Headers.Add("user-agent", "Silver Azure");

           var JsonResponse = serviceRequest.DownloadString(new Uri(urselected));

Friday, September 7, 2012

Wikipedia and Wikimedia commons html content

Wikipedia and Wikimedia Commons have excellent information. I am building a few applications using the information from these sites. I wanted to get some portions of the html markup, for example the image source, title, thumbnail etc. I tried the Media Wiki API and was able to get the file names by a query like this.

This produced the following output

<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="File:Ethiopian - Coin Depicting an Anonymous King - Walters 59793 - Obverse.jpg" ns="6" pageid="18842729"/><cm title="File:Ethiopian - Coin Depicting an Anonymous King - Walters 59793 - Reverse.jpg" ns="6" pageid="18842732"/><cm title="File:Ethiopian - One of Two Coins Depicting Ousanas and an Anonymous King - Walters 59794.jpg" ns="6" pageid="18809697"/><cm title="File:Greek - Apollo - Walters 59533.jpg" ns="6" pageid="18787772"/><cm title="File:Greek - Athena - Walters 59519 - Obverse.jpg" ns="6" pageid="18801612"/><cm title="File:Greek - Athena - Walters 59519 - Reverse.jpg" ns="6" pageid="18801616"/><cm title="File:Greek - Athena - Walters 59702 - Back.jpg" ns="6" pageid="18787788"/><cm title="File:Greek - Persephone - Walters 59693.jpg" ns="6" pageid="18787786"/><cm title="File:Greek - Tetradrachme with King Nicodemus II - Walters 59723 - Back.jpg" ns="6" pageid="18787795"/><cm title="File:Matthes Gebel - Medal of Arnold and Nicholas Wenck - Walters 59480 - Obverse.jpg" ns="6" pageid="18839416"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|7e524f4d414e202d20434f494e2057495448204120484950504f504f54414d555320414e4420504f525452414954204f46204f544143494c494120534556455241202d2057414c54455253203539373531202d204241434b2e4a50470a524f4d414e202d20434f494e2057495448204120484950504f504f54414d555320414e4420504f525452414954204f46204f544143494c494120534556455241202d2057414c54455253203539373531202d204241434b2e4a5047|18787799"/></query-continue></api>

You can use the file name to build a url like the following to get the image info

However I wanted to get the title, thumbnail url etc. After some research I hit on this discussion How to get HTML content text of a Wikipedia Page (via Wikipedia API)? and then this page Manual:Parameters to index.php.

This led to the following query which gives all the relevant html markup:

Sunday, September 2, 2012

Wikimedia Commons queries to get image data

I am developing application to get images from Wikimedia Commons which is a big repository of excellent images. The query API is not well documented. I researched and came up with the following queries to get the file locations of images.

Suppose I want to get the images in the Category Flags of United States, I will do the following:

which produces the following XML output for the first 10 items.

<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="Category:Porifera" ns="14" pageid="13371"/><cm title="Category:Riftia pachyptila" ns="14" pageid="8805563"/><cm title="File:1991 benthos-achtern02 hg.jpg" ns="6" pageid="3321991"/><cm title="File:Abra alba.jpg" ns="6" pageid="333239"/><cm title="File:Ammonia tepida.jpg" ns="6" pageid="706636"/><cm title="File:Aonides paucibranchiata.jpg" ns="6" pageid="396315"/><cm title="File:Aphrodita aculeata (Sea mouse).jpg" ns="6" pageid="3610614"/><cm title="File:Aphrodita aculeata.jpg" ns="6" pageid="1525478"/><cm title="File:Arenicole système.jpg" ns="6" pageid="4258545"/><cm title="File:Asterias rubens.jpg" ns="6" pageid="364648"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|42454e5448494320464f52414d494e4946455241532e4a5047|2080936"/></query-continue></api>

to  get an image you have to use the will take you to the Aonides paucibranchiata image!

if you query|42454e5448494320464f52414d494e4946455241532e4a5047|2080936

you get the following output  which are the next 10 items, and so on

<?xml version="1.0"?>
-<api>-<query>-<categorymembers><cm title="File:Benthic foraminiferas.jpg" ns="6" pageid="2080936"/><cm title="File:Benthic GLERL 1.jpg" ns="6" pageid="2143998"/><cm title="File:Bentos kasarek.jpg" ns="6" pageid="16992357"/><cm title="File:Borstenwurmer des Meeres.png" ns="6" pageid="5229875"/><cm title="File:Branchiostoma lanceolatum.jpg" ns="6" pageid="5712836"/><cm title="File:Capitella capitata.jpg" ns="6" pageid="398359"/><cm title="File:Cirratulus cirratus.jpg" ns="6" pageid="398775"/><cm title="File:Crabmuss 600.jpg" ns="6" pageid="292191"/><cm title="File:Crangon crangon (dorsal).jpg" ns="6" pageid="10320338"/><cm title="File:Crangon crangon.jpg" ns="6" pageid="470098"/></categorymembers></query>-<query-continue><categorymembers cmcontinue="file|4352494252494e4f50534953204645524e414c44492e4a5047|3307271"/></query-continue></api>

Other useful formats


format=jsonfm produces a html output of json format.


  1. API reference

  2. API:Main page

  3. API:Tutorial