Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

PHP Parser - Help!

Prometheus Deckard
Registered User
Join date: 24 May 2005
Posts: 23
01-24-2009 15:36
I'm trying to script a PHP script that parses some data from a website, gleaning only the more relevant data, and then sending it back into SL.

I'm sure many have seen these around before, but I'm having trouble on the PHP-side of things. If somebody could link me a site, or quote some script, that'd be great!

I'm trying to grab the syntax, versions, and description of PHP functions found on:
http://us.php.net/manual/en/function.(function).php via a PHP script, which is called from a HTTP request in-world.

For example, http://us.php.net/manual/en/function.echo.php could return:
Syntax:
Versions: (PHP 4, PHP 5)
Description: echo() is not actually a function (it is a language construct), so you are not required to use parentheses with it. echo() (unlike some other language constructs) does not behave like a function, so it cannot always be used in the context of a function. Additionally, if you want to pass more than one parameter to echo(), the parameters must not be enclosed within parentheses.

Thanks in advance
Osgeld Barmy
Registered User
Join date: 22 Mar 2005
Posts: 3,336
01-24-2009 15:52
im not really sure i understand the question (altho im told i can be pretty thick)

or if your asking for some basic example of lsl <> php, maybe my old crappy insecure guestbook thing would be useful to start playing around with

/54/0c/205116/1.html#post1641083

ALSO theres a php users group that you can join in world, who are full of php uberness
RobbyRacoon Olmstead
Red warrior is hungry!
Join date: 20 Sep 2006
Posts: 1,821
01-24-2009 16:54
I'm very sick and don't know if I can write anything helpful or not, but I'll try...

To scrape another page, first off get something like HTMLPurifier to filter and 'correct' the retrieved HTML : http://htmlpurifier.org/, it really does make a big difference in my experience.

For retrieving the html data and parsing it, I just use the class_http code from here: http://www.troywolf.com/articles/php/class_http/

Hopefully that's enough to get the gears going, if not I'll try to post some working code when I feel a little better.

.
_____________________
RobbyRacoon Olmstead
Red warrior is hungry!
Join date: 20 Sep 2006
Posts: 1,821
01-24-2009 16:59
Here's a snippet I've found using the two previously mentioned components to retrieve a region's URL by name from the Second Life Search pages.

It's not a great example, but it's all I could find quickly in my addled condition:

From: someone

function getRegionUrl( $regionName )
{

$regionURL = str_replace( " ", "%20", $regionName );

$url = "http://secondlife.com/app/search/search_proxy.php?q=%22$regionURL%20Region%22+inurl:region";
$h = new http();
$h->dir = "http_request_cache/";
if( !$h->fetch( $url, 6000, "getRegionUrl-" . $regionName ) )
return $url;

$html = $h->body;

$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding
$config->set('Core', 'DefinitionCache', null);
$config->set('HTML', 'EnableAttrID', true);
$purifier = new HTMLPurifier($config);

$html = '<body>' . $purifier->purify( str_replace( " ", " ", $html ) ) . "</body>";

$xml = simplexml_load_string( $html );

$result = $xml->xpath( "//p[img[@src='http://s3.amazonaws.com/world.secondlife.com/images/icn_search-location_16.png']]/a[span[@class='l']]" );

if( !$result || count($result) == 0 )
return $url;

foreach ($result as $item) {
if( strcasecmp( $item->span->b, $regionName." Region" ) == 0 )
{
return $item["href"];
}
}

return $url;

}

_____________________
Prometheus Deckard
Registered User
Join date: 24 May 2005
Posts: 23
01-24-2009 17:02
I'm afraid my PHP knowledge is extremely limited. I'm only really delved into using my MySQL DB to store data.

So this is very much a learning experience for me to get to know just that much more. :)
RobbyRacoon Olmstead
Red warrior is hungry!
Join date: 20 Sep 2006
Posts: 1,821
01-24-2009 17:43
This sorta kinda works. It's not robust or error free, it's just a quickie example :

From: someone

function getFunctionDescription( $functionName )
{

$functionName = str_replace( "_", "-", $functionName );

$url = "http://us.php.net/manual/en/function.".$functionName.".php";
$h = new http();
if( !$h->fetch( $url ) )
return $url;

$html = $h->body;

$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding
$config->set('Core', 'DefinitionCache', null);
$config->set('HTML', 'EnableAttrID', true);
$purifier = new HTMLPurifier($config);

$html = '<body>' . $purifier->purify( str_replace( " ", " ", $html ) ) . "</body>";

$xml = simplexml_load_string( $html );

$result = $xml->xpath( "//div[@class='refsect1 description']/p" );
$description = (string)$result[0];

return $description;

}


Hopefully that's enough to get started, or enough for someone who's not sick to jump in :)


.
_____________________
Prometheus Deckard
Registered User
Join date: 24 May 2005
Posts: 23
01-30-2009 19:32
I played around with a some code, although I still can't seem to get it quite right. I'm afraid my PHP knowledge really is -very- limited :(
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
01-30-2009 21:08
I'd probably try to stick to the standard PHP libraries myself. The DOM API is pretty easy to use and has some good examples in the documentation. The major pieces I'd use are DOMDocument::loadHTML (http://us.php.net/manual/en/domdocument.loadhtml.php) to parse the content, the DOMDocument (http://us.php.net/manual/en/class.domdocument.php) and DOMNode (http://us.php.net/manual/en/class.domnode.php) properties and methods to examine the names, attributes, and values of the nodes, and possibly DOMXPath (http://us.php.net/manual/en/class.domxpath.php) to find the elements you are interested in (though that one would take a bit of learning if you're not familiar with XPaths).

Looking at the HTML source of the echo() documentation referenced above, you'd be looking first for the <div> element with ID "function.echo". it's descendent <p> with class "verinfo" will be the versions.

Then the (descendant of "function.echo";) <div> just after the <a> with ID "function.echo.description" has the rest. It looks like the first <div> child of that element will have a value that is your syntax info, and all the <p> children put together will be your description.

Some important highlights:

http://us.php.net/manual/en/domdocument.loadhtml.php
http://us.php.net/manual/en/domdocument.getelementbyid.php
http://us.php.net/manual/en/domelement.getattribute.php
http://us.php.net/manual/en/class.domnode.php#domnode.props.nodevalue
http://us.php.net/manual/en/class.domnode.php#domnode.props.childnodes
http://us.php.net/manual/en/class.domnode.php#domnode.props.nextsibling

You'd have to do some research yourself on whether the other functions' HTML pages are formatted in a similar fashion. Probably the simple ones are, but if you're going to parse class documentation as well (such as the pages referenced above for DOM) you might have some more interesting structure to consider.