Second Life Forums Archive - PHP Parser

Prometheus Deckard

Registered User

Join date: 24 May 2005

Posts: 23

01-24-2009 15:36

I'm trying to script a PHP script that parses some data from a website, gleaning only the more relevant data, and then sending it back into SL.

I'm sure many have seen these around before, but I'm having trouble on the PHP-side of things. If somebody could link me a site, or quote some script, that'd be great!

I'm trying to grab the syntax, versions, and description of PHP functions found on:
http://us.php.net/manual/en/function.(function).php via a PHP script, which is called from a HTTP request in-world.

For example, http://us.php.net/manual/en/function.echo.php could return:
Syntax:
Versions: (PHP 4, PHP 5)
Description: echo() is not actually a function (it is a language construct), so you are not required to use parentheses with it. echo() (unlike some other language constructs) does not behave like a function, so it cannot always be used in the context of a function. Additionally, if you want to pass more than one parameter to echo(), the parameters must not be enclosed within parentheses.

Thanks in advance

Osgeld Barmy

Registered User

Join date: 22 Mar 2005

Posts: 3,336

01-24-2009 15:52

im not really sure i understand the question (altho im told i can be pretty thick)

or if your asking for some basic example of lsl <> php, maybe my old crappy insecure guestbook thing would be useful to start playing around with

/54/0c/205116/1.html#post1641083

ALSO theres a php users group that you can join in world, who are full of php uberness

RobbyRacoon Olmstead

Red warrior is hungry!

Join date: 20 Sep 2006

Posts: 1,821

01-24-2009 16:54

I'm very sick and don't know if I can write anything helpful or not, but I'll try...

To scrape another page, first off get something like HTMLPurifier to filter and 'correct' the retrieved HTML : http://htmlpurifier.org/, it really does make a big difference in my experience.

For retrieving the html data and parsing it, I just use the class_http code from here: http://www.troywolf.com/articles/php/class_http/

Hopefully that's enough to get the gears going, if not I'll try to post some working code when I feel a little better.

.

_____________________

RobbyRacoon Olmstead

Red warrior is hungry!

Join date: 20 Sep 2006

Posts: 1,821

01-24-2009 16:59

Here's a snippet I've found using the two previously mentioned components to retrieve a region's URL by name from the Second Life Search pages.

It's not a great example, but it's all I could find quickly in my addled condition:

From: someone

_____________________

Prometheus Deckard

Registered User

Join date: 24 May 2005

Posts: 23

01-24-2009 17:02

I'm afraid my PHP knowledge is extremely limited. I'm only really delved into using my MySQL DB to store data.

So this is very much a learning experience for me to get to know just that much more.

RobbyRacoon Olmstead

Red warrior is hungry!

Join date: 20 Sep 2006

Posts: 1,821

01-24-2009 17:43

This sorta kinda works. It's not robust or error free, it's just a quickie example :

From: someone

function getFunctionDescription( $functionName )
{

$functionName = str_replace( "_", "-", $functionName );

$url = "http://us.php.net/manual/en/function.".$functionName.".php";
$h = new http();
if( !$h->fetch( $url ) )
return $url;

$html = $h->body;

$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding
$config->set('Core', 'DefinitionCache', null);
$config->set('HTML', 'EnableAttrID', true);
$purifier = new HTMLPurifier($config);

$html = '<body>' . $purifier->purify( str_replace( " ", " ", $html ) ) . "</body>";

$xml = simplexml_load_string( $html );

$result = $xml->xpath( "//div[@class='refsect1 description']/p" );
$description = (string)$result[0];

return $description;

}

Hopefully that's enough to get started, or enough for someone who's not sick to jump in

.

_____________________

Prometheus Deckard

Registered User

Join date: 24 May 2005

Posts: 23

01-30-2009 19:32

I played around with a some code, although I still can't seem to get it quite right. I'm afraid my PHP knowledge really is -very- limited

Hewee Zetkin

Registered User

Join date: 20 Jul 2006

Posts: 2,702

01-30-2009 21:08

I'd probably try to stick to the standard PHP libraries myself. The DOM API is pretty easy to use and has some good examples in the documentation. The major pieces I'd use are DOMDocument::loadHTML (http://us.php.net/manual/en/domdocument.loadhtml.php) to parse the content, the DOMDocument (http://us.php.net/manual/en/class.domdocument.php) and DOMNode (http://us.php.net/manual/en/class.domnode.php) properties and methods to examine the names, attributes, and values of the nodes, and possibly DOMXPath (http://us.php.net/manual/en/class.domxpath.php) to find the elements you are interested in (though that one would take a bit of learning if you're not familiar with XPaths).

Looking at the HTML source of the echo() documentation referenced above, you'd be looking first for the <div> element with ID "function.echo". it's descendent <p> with class "verinfo" will be the versions.

Then the (descendant of "function.echo"

<div> just after the <a> with ID "function.echo.description" has the rest. It looks like the first <div> child of that element will have a value that is your syntax info, and all the <p> children put together will be your description.

Some important highlights:

http://us.php.net/manual/en/domdocument.loadhtml.php
http://us.php.net/manual/en/domdocument.getelementbyid.php
http://us.php.net/manual/en/domelement.getattribute.php
http://us.php.net/manual/en/class.domnode.php#domnode.props.nodevalue
http://us.php.net/manual/en/class.domnode.php#domnode.props.childnodes
http://us.php.net/manual/en/class.domnode.php#domnode.props.nextsibling

You'd have to do some research yourself on whether the other functions' HTML pages are formatted in a similar fashion. Probably the simple ones are, but if you're going to parse class documentation as well (such as the pages referenced above for DOM) you might have some more interesting structure to consider.

Welcome to the Second Life Forums Archive

PHP Parser - Help!

PHP Parser - Help!
Prometheus Deckard Registered User Join date: 24 May 2005 Posts: 23	01-24-2009 15:36 I'm trying to script a PHP script that parses some data from a website, gleaning only the more relevant data, and then sending it back into SL. I'm sure many have seen these around before, but I'm having trouble on the PHP-side of things. If somebody could link me a site, or quote some script, that'd be great! I'm trying to grab the syntax, versions, and description of PHP functions found on: http://us.php.net/manual/en/function.(function).php via a PHP script, which is called from a HTTP request in-world. For example, http://us.php.net/manual/en/function.echo.php could return: Syntax: Versions: (PHP 4, PHP 5) Description: echo() is not actually a function (it is a language construct), so you are not required to use parentheses with it. echo() (unlike some other language constructs) does not behave like a function, so it cannot always be used in the context of a function. Additionally, if you want to pass more than one parameter to echo(), the parameters must not be enclosed within parentheses. Thanks in advance
Osgeld Barmy Registered User Join date: 22 Mar 2005 Posts: 3,336	01-24-2009 15:52 im not really sure i understand the question (altho im told i can be pretty thick) or if your asking for some basic example of lsl <> php, maybe my old crappy insecure guestbook thing would be useful to start playing around with /54/0c/205116/1.html#post1641083 ALSO theres a php users group that you can join in world, who are full of php uberness
RobbyRacoon Olmstead Red warrior is hungry! Join date: 20 Sep 2006 Posts: 1,821	01-24-2009 16:54 I'm very sick and don't know if I can write anything helpful or not, but I'll try... To scrape another page, first off get something like HTMLPurifier to filter and 'correct' the retrieved HTML : http://htmlpurifier.org/, it really does make a big difference in my experience. For retrieving the html data and parsing it, I just use the class_http code from here: http://www.troywolf.com/articles/php/class_http/ Hopefully that's enough to get the gears going, if not I'll try to post some working code when I feel a little better. . _____________________
RobbyRacoon Olmstead Red warrior is hungry! Join date: 20 Sep 2006 Posts: 1,821	01-24-2009 16:59 Here's a snippet I've found using the two previously mentioned components to retrieve a region's URL by name from the Second Life Search pages. It's not a great example, but it's all I could find quickly in my addled condition: From: someone function getRegionUrl( $regionName ) { $regionURL = str_replace( " ", "%20", $regionName ); $url = "http://secondlife.com/app/search/search_proxy.php?q=%22$regionURL%20Region%22+inurl:region"; $h = new http(); $h->dir = "http_request_cache/"; if( !$h->fetch( $url, 6000, "getRegionUrl-" . $regionName ) ) return $url; $html = $h->body; $config = HTMLPurifier_Config::createDefault(); $config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding $config->set('Core', 'DefinitionCache', null); $config->set('HTML', 'EnableAttrID', true); $purifier = new HTMLPurifier($config); $html = '<body>' . $purifier->purify( str_replace( " ", " ", $html ) ) . "</body>"; $xml = simplexml_load_string( $html ); $result = $xml->xpath( "//p[img[@src='http://s3.amazonaws.com/world.secondlife.com/images/icn_search-location_16.png']]/a[span[@class='l']]" ); if( !$result \|\| count($result) == 0 ) return $url; foreach ($result as $item) { if( strcasecmp( $item->span->b, $regionName." Region" ) == 0 ) { return $item["href"]; } } return $url; } _____________________
Prometheus Deckard Registered User Join date: 24 May 2005 Posts: 23	01-24-2009 17:02 I'm afraid my PHP knowledge is extremely limited. I'm only really delved into using my MySQL DB to store data. So this is very much a learning experience for me to get to know just that much more.
RobbyRacoon Olmstead Red warrior is hungry! Join date: 20 Sep 2006 Posts: 1,821	01-24-2009 17:43 This sorta kinda works. It's not robust or error free, it's just a quickie example : From: someone function getFunctionDescription( $functionName ) { $functionName = str_replace( "_", "-", $functionName ); $url = "http://us.php.net/manual/en/function.".$functionName.".php"; $h = new http(); if( !$h->fetch( $url ) ) return $url; $html = $h->body; $config = HTMLPurifier_Config::createDefault(); $config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding $config->set('Core', 'DefinitionCache', null); $config->set('HTML', 'EnableAttrID', true); $purifier = new HTMLPurifier($config); $html = '<body>' . $purifier->purify( str_replace( " ", " ", $html ) ) . "</body>"; $xml = simplexml_load_string( $html ); $result = $xml->xpath( "//div[@class='refsect1 description']/p" ); $description = (string)$result[0]; return $description; } Hopefully that's enough to get started, or enough for someone who's not sick to jump in . _____________________
Prometheus Deckard Registered User Join date: 24 May 2005 Posts: 23	01-30-2009 19:32 I played around with a some code, although I still can't seem to get it quite right. I'm afraid my PHP knowledge really is -very- limited
Hewee Zetkin Registered User Join date: 20 Jul 2006 Posts: 2,702	01-30-2009 21:08 I'd probably try to stick to the standard PHP libraries myself. The DOM API is pretty easy to use and has some good examples in the documentation. The major pieces I'd use are DOMDocument::loadHTML (http://us.php.net/manual/en/domdocument.loadhtml.php) to parse the content, the DOMDocument (http://us.php.net/manual/en/class.domdocument.php) and DOMNode (http://us.php.net/manual/en/class.domnode.php) properties and methods to examine the names, attributes, and values of the nodes, and possibly DOMXPath (http://us.php.net/manual/en/class.domxpath.php) to find the elements you are interested in (though that one would take a bit of learning if you're not familiar with XPaths). Looking at the HTML source of the echo() documentation referenced above, you'd be looking first for the <div> element with ID "function.echo". it's descendent <p> with class "verinfo" will be the versions. Then the (descendant of "function.echo" <div> just after the <a> with ID "function.echo.description" has the rest. It looks like the first <div> child of that element will have a value that is your syntax info, and all the <p> children put together will be your description. Some important highlights: http://us.php.net/manual/en/domdocument.loadhtml.php http://us.php.net/manual/en/domdocument.getelementbyid.php http://us.php.net/manual/en/domelement.getattribute.php http://us.php.net/manual/en/class.domnode.php#domnode.props.nodevalue http://us.php.net/manual/en/class.domnode.php#domnode.props.childnodes http://us.php.net/manual/en/class.domnode.php#domnode.props.nextsibling You'd have to do some research yourself on whether the other functions' HTML pages are formatted in a similar fashion. Probably the simple ones are, but if you're going to parse class documentation as well (such as the pages referenced above for DOM) you might have some more interesting structure to consider.