Second Life Forums Archive - php scraping of market data

Sol Columbia

Ding! Level up

Join date: 24 Sep 2005

Posts: 91

05-30-2006 14:19

Hey all,

I've been trying for a few days to create a php script which will go to https://secondlife.com/currency/market.php and scrape the daily summary data for a project I'm working on (since that info isn't available via download). My goal is to automate this process and track it in a database. I have everything working except this scraping element and I'm frustrated after trying several tactics and spending a lot of hours trying to figure out a workable method. I'm hoping you all can help me out with some suggestions for a new direction or possibly some code since I'm at my wit's end.

My latest tactic has been to try to use php's cURL functionality. I get it to work on other pages, but I'm getting nothing when trying to get the one page I want, namely that market data. The following code is what I think would work, but does not.

CODE



<?php

$url = "https://secondlife.com/account/login.php";

$post_request = "form[type]=second-life-member&form[nextpage]=/currency/market.php&";
$post_request .= "form[persistent]=Y&";
$post_request .= "form[username]=Sol&form[lasntame]=Columbia&form[password]=mypasswordhere";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"$url");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_request);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$data = curl_exec($ch);
curl_close($ch);

print($data);

?>

Anyhow, if anyone knows anything about what I'm doing wrong, or has any suggestions on a different track, I'd really appreciate it, and thank you very much in advance!

_____________________

-Sol Columbia
Luminosity
Luminosity blog

Geuis Dassin

Filming Path creator

Join date: 3 May 2006

Posts: 565

05-30-2006 14:25

see if this helps

/15/d4/99525/1.html

Sol Columbia

Ding! Level up

Join date: 24 Sep 2005

Posts: 91

05-30-2006 14:31

Bleh! How the hell did I miss that? =/

Thank you much for the link, jumping into it now.

_____________________

-Sol Columbia
Luminosity
Luminosity blog

Eddy Stryker

libsecondlife Developer

Join date: 6 Jun 2004

Posts: 353

05-30-2006 20:06

For reference, MC Seattle is a dead account I was using before I managed to recover the password to my original account Eddy Stryker. Some important bits you were missing:

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);

https sites are SSL encrypted, and unless you want to jump through the hoops of having a CA file on hand and pointing curl to it, it's easier to just skip the peer verification completely.

curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookies.txt');

You got the follow location part right, since the site does a redirect after you login. But to maintain state between the login and the redirect a login cookie is used which curl needs to save, the above line will work fine (on a UNIX-based system at least).

I also included the nextpage variable in the POSTFIELDS to redirect straight to the market page so you can do the scraping without any special tricks. Now if you instead wanted to scrape the LindeX market info, you might want to redirect with

nextpage%5D=%2Fcurrency%2Fsell.php

If you've gotten to the actual scraping of the LindeX data, you'll notice it's two big tables with structure and content all mashed together in 1998 style HTML. Here's some of the code out of my C# app, inside of a function called getBuyOrders():

CODE


        	while (data.IndexOf("bg_dashes_w_ltblue") > 0) {
        		data = data.Remove(0, data.IndexOf("\t<tr>") + 5);
        		string lineItem = data.Substring(0, data.IndexOf("</tr>"));

				////
        		i = lineItem.IndexOf(">") + 2;
        		string exchangeRate = lineItem.Substring(i, lineItem.IndexOf("<", i) - i);
        		
        		if (exchangeRate.IndexOf("$") < 0) {
					data = data.Remove(0, data.IndexOf("</table>"));
        			continue;
        		}
        		
        		exchangeRate = exchangeRate.Substring(1, exchangeRate.IndexOf(" "));

        		lineItem = lineItem.Remove(0, lineItem.IndexOf("</td>") + 5);
        		////
        		
        		////
        		//i = lineItem.IndexOf(">") + 1;
        		//string buyers = lineItem.Substring(i, lineItem.IndexOf("<", i) - i);

				lineItem = lineItem.Remove(0, lineItem.IndexOf("</td>") + 5);
        		////
        		
        		////
        		i = lineItem.IndexOf(">") + 1;
        		string volume = lineItem.Substring(i, lineItem.IndexOf("<", i) - i);
        		volume = volume.Remove(0, 2);
        		volume = volume.Replace(",", "");
        		////
        		
        		data = data.Remove(0, data.IndexOf("</table>"));
        		
        		Linden linden = new Linden();
        		linden.rate = double.Parse(exchangeRate);
        		linden.volume = int.Parse(volume);
        		openOrders.Add(linden);
        	}

Welcome to the Second Life Forums Archive

php scraping of market data - need advice

php scraping of market data - need advice
Sol Columbia Ding! Level up Join date: 24 Sep 2005 Posts: 91	05-30-2006 14:19 Hey all, I've been trying for a few days to create a php script which will go to https://secondlife.com/currency/market.php and scrape the daily summary data for a project I'm working on (since that info isn't available via download). My goal is to automate this process and track it in a database. I have everything working except this scraping element and I'm frustrated after trying several tactics and spending a lot of hours trying to figure out a workable method. I'm hoping you all can help me out with some suggestions for a new direction or possibly some code since I'm at my wit's end. My latest tactic has been to try to use php's cURL functionality. I get it to work on other pages, but I'm getting nothing when trying to get the one page I want, namely that market data. The following code is what I think would work, but does not. CODE <?php $url = "https://secondlife.com/account/login.php"; $post_request = "form[type]=second-life-member&form[nextpage]=/currency/market.php&"; $post_request .= "form[persistent]=Y&"; $post_request .= "form[username]=Sol&form[lasntame]=Columbia&form[password]=mypasswordhere"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"$url"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_request); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $data = curl_exec($ch); curl_close($ch); print($data); ?> Anyhow, if anyone knows anything about what I'm doing wrong, or has any suggestions on a different track, I'd really appreciate it, and thank you very much in advance! _____________________ -Sol Columbia Luminosity Luminosity blog
Geuis Dassin Filming Path creator Join date: 3 May 2006 Posts: 565	05-30-2006 14:25 see if this helps /15/d4/99525/1.html
Sol Columbia Ding! Level up Join date: 24 Sep 2005 Posts: 91	05-30-2006 14:31 Bleh! How the hell did I miss that? =/ Thank you much for the link, jumping into it now. _____________________ -Sol Columbia Luminosity Luminosity blog
Eddy Stryker libsecondlife Developer Join date: 6 Jun 2004 Posts: 353	05-30-2006 20:06 For reference, MC Seattle is a dead account I was using before I managed to recover the password to my original account Eddy Stryker. Some important bits you were missing: curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); https sites are SSL encrypted, and unless you want to jump through the hoops of having a CA file on hand and pointing curl to it, it's easier to just skip the peer verification completely. curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookies.txt'); You got the follow location part right, since the site does a redirect after you login. But to maintain state between the login and the redirect a login cookie is used which curl needs to save, the above line will work fine (on a UNIX-based system at least). I also included the nextpage variable in the POSTFIELDS to redirect straight to the market page so you can do the scraping without any special tricks. Now if you instead wanted to scrape the LindeX market info, you might want to redirect with nextpage%5D=%2Fcurrency%2Fsell.php If you've gotten to the actual scraping of the LindeX data, you'll notice it's two big tables with structure and content all mashed together in 1998 style HTML. Here's some of the code out of my C# app, inside of a function called getBuyOrders(): CODE while (data.IndexOf("bg_dashes_w_ltblue") > 0) { data = data.Remove(0, data.IndexOf("\t<tr>") + 5); string lineItem = data.Substring(0, data.IndexOf("</tr>")); //// i = lineItem.IndexOf(">") + 2; string exchangeRate = lineItem.Substring(i, lineItem.IndexOf("<", i) - i); if (exchangeRate.IndexOf("$") < 0) { data = data.Remove(0, data.IndexOf("</table>")); continue; } exchangeRate = exchangeRate.Substring(1, exchangeRate.IndexOf(" ")); lineItem = lineItem.Remove(0, lineItem.IndexOf("</td>") + 5); //// //// //i = lineItem.IndexOf(">") + 1; //string buyers = lineItem.Substring(i, lineItem.IndexOf("<", i) - i); lineItem = lineItem.Remove(0, lineItem.IndexOf("</td>") + 5); //// //// i = lineItem.IndexOf(">") + 1; string volume = lineItem.Substring(i, lineItem.IndexOf("<", i) - i); volume = volume.Remove(0, 2); volume = volume.Replace(",", ""); //// data = data.Remove(0, data.IndexOf("</table>")); Linden linden = new Linden(); linden.rate = double.Parse(exchangeRate); linden.volume = int.Parse(volume); openOrders.Add(linden); }