Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Change Charset

PedroRenaut Mariner
Registered User
Join date: 6 Nov 2006
Posts: 17
01-16-2009 15:53
How I can see the correct char in SL and/or PHP of this string???

&#...

If I put here the string I see this ... Надявам but in SL I receive the codes

TYVM
Escort DeFarge
Together
Join date: 18 Nov 2004
Posts: 681
01-16-2009 17:40
Not sure I know the answer here, but pitching in anyhow. I believe the only format received in SL is text/plain so html entities won't be recognized -- you may try sending utf-8 and setting that as the charset on the raw text you're sending.

/esc
_____________________
http://slurl.com/secondlife/Together
PedroRenaut Mariner
Registered User
Join date: 6 Nov 2006
Posts: 17
01-17-2009 04:06
My PHP return for example that...see http://www.slooger.com/ejemplo.txt

and I want receive ...

Надявам се, че накрая нещо, което да послужи на всички промени, които имам, кой знае?

Please ... Any idea???

TYVM
PedroRenaut Mariner
Registered User
Join date: 6 Nov 2006
Posts: 17
01-17-2009 04:09
From: PedroRenaut Mariner
My PHP return for example...
Надявам се, че накрая нещо, което да послужи на всички промени, които имам, кой знае?


and I want receive ...

Надявам се, че накрая нещо, което да послужи на всички промени, които имам, кой знае?

Any idea???
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
01-17-2009 05:03
Working on it....
PedroRenaut Mariner
Registered User
Join date: 6 Nov 2006
Posts: 17
01-17-2009 06:12
TYVM Hewee ... I wait any solution in PhP (best) or in LsL. I continue working!!
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
01-17-2009 07:27
Here are some functions that will do it. Note that this is quite slow in LSL, even when compiled to Mono. I didn't realize that a PHP solution was acceptable to you. I believe there's a single function that'll do what you want. I'll try to find that too.

CODE

///// Constants

// Initialized below
list HTML_ENTITY_NAMES;
list HTML_ENTITY_CODES;

string BASE_64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";


//// Functions

integer entitiesInitialized = FALSE;
initEntities()
{
if (entitiesInitialized)
{
return;
}
entitiesInitialized = TRUE;

list htmlEntities =
[ "Aacute", 193, "aacute", 225, "Acirc", 194, "acirc", 226,
"acute", 180, "AElig", 198, "aelig", 230, "Agrave", 192,
"agrave", 224, "alefsym", 8501, "Alpha", 913, "alpha", 945,
"amp", 38, "and", 8743, "ang", 8736, "Aring", 197,
"aring", 229, "asymp", 8776, "Atilde", 195, "atilde", 227,
"Auml", 196, "auml", 228, "bdquo", 8222, "Beta", 914,
"beta", 946, "brvbar", 166, "bull", 8226, "cap", 8745,
"Ccedil", 199, "ccedil", 231, "cedil", 184, "cent", 162,
"Chi", 935, "chi", 967, "circ", 710, "clubs", 9827,
"cong", 8773, "copy", 169, "crarr", 8629, "cup", 8746,
"curren", 164, "dagger", 8224, "Dagger", 8225, "darr", 8595,
"dArr", 8659, "deg", 176, "Delta", 916, "delta", 948,
"diams", 9830, "divide", 247, "Eacute", 201, "eacute", 233,
"Ecirc", 202, "ecirc", 234, "Egrave", 200, "egrave", 232,
"empty", 8709, "emsp", 8195, "ensp", 8194, "Epsilon", 917,
"epsilon", 949, "equiv", 8801, "Eta", 919, "eta", 951,
"ETH", 208, "eth", 240, "Euml", 203, "euml", 235,
"euro", 8364, "exist", 8707, "fnof", 402, "forall", 8704,
"frac12", 189, "frac14", 188, "frac34", 190, "frasl", 8260,
"Gamma", 915, "gamma", 947, "ge", 8805, "gt", 62,
"harr", 8596, "hArr", 8660, "hearts", 9829, "hellip", 8230,
"Iacute", 205, "iacute", 237, "Icirc", 206, "icirc", 238,
"iexcl", 161, "Igrave", 204, "igrave", 236, "image", 8465,
"infin", 8734, "int", 8747, "Iota", 921, "iota", 953,
"iquest", 191, "isin", 8712, "Iuml", 207, "iuml", 239,
"Kappa", 922, "kappa", 954, "Lambda", 923, "lambda", 955,
"lang", 9001, "laquo", 171, "larr", 8592, "lArr", 8656,
"lceil", 8968, "ldquo", 8220, "le", 8804, "lfloor", 8970,
"lowast", 8727, "loz", 9674, "lrm", 8206, "lsaquo", 8249,
"lsquo", 8216, "lt", 60, "macr", 175, "mdash", 8212,
"micro", 181, "middot", 183, "minus", 8722, "Mu", 924,
"mu", 956, "nabla", 8711, "nbsp", 160, "ndash", 8211,
"ne", 8800, "ni", 8715, "not", 172, "notin", 8713,
"nsub", 8836, "Ntilde", 209, "ntilde", 241, "Nu", 925,
"nu", 957, "Oacute", 211, "oacute", 243, "Ocirc", 212,
"ocirc", 244, "OElig", 338, "oelig", 339, "Ograve", 210,
"ograve", 242, "oline", 8254, "Omega", 937, "omega", 969,
"Omicron", 927, "omicron", 959, "oplus", 8853, "or", 8744,
"ordf", 170, "ordm", 186, "Oslash", 216, "oslash", 248,
"Otilde", 213, "otilde", 245, "otimes", 8855, "Ouml", 214,
"ouml", 246, "para", 182, "part", 8706, "permil", 8240,
"perp", 8869, "Phi", 934, "phi", 966, "Pi", 928,
"pi", 960, "piv", 982, "plusmn", 177, "pound", 163,
"prime", 8242, "Prime", 8243, "prod", 8719, "prop", 8733,
"Psi", 936, "psi", 968, "quot", 34, "radic", 8730,
"rang", 9002, "raquo", 187, "rarr", 8594, "rArr", 8658,
"rceil", 8969, "rdquo", 8221, "real", 8476, "reg", 174,
"rfloor", 8971, "Rho", 929, "rho", 961, "rlm", 8207,
"rsaquo", 8250, "rsquo", 8217, "sbquo", 8218, "Scaron", 352,
"scaron", 353, "sdot", 8901, "sect", 167, "shy", 173,
"Sigma", 931, "sigma", 963, "sigmaf", 962, "sim", 8764,
"spades", 9824, "sub", 8834, "sube", 8838, "sum", 8721,
"sup", 8835, "sup1", 185, "sup2", 178, "sup3", 179,
"supe", 8839, "szlig", 223, "Tau", 932, "tau", 964,
"there4", 8756, "Theta", 920, "theta", 952, "thetasym", 977,
"thinsp", 8201, "THORN", 222, "thorn", 254, "tilde", 732,
"times", 215, "trade", 8482, "Uacute", 218, "uacute", 250,
"uarr", 8593, "uArr", 8657, "Ucirc", 219, "ucirc", 251,
"Ugrave", 217, "ugrave", 249, "uml", 168, "upsih", 978,
"Upsilon", 933, "upsilon", 965, "Uuml", 220, "uuml", 252,
"weierp", 8472, "Xi", 926, "xi", 958, "Yacute", 221,
"yacute", 253, "yen", 165, "yuml", 255, "Yuml", 376,
"Zeta", 918, "zeta", 950, "zwj", 8205, "zwnj", 8204 ];

HTML_ENTITY_NAMES =
llList2ListStrided(htmlEntities, 0, -1, 2);
HTML_ENTITY_CODES =
llList2ListStrided(llList2List(htmlEntities, 1, -1), 0, -1, 2);
}

string codePointToString(integer codePoint)
{
string base64;
if ((codePoint & ~0x7f) == 0)
{
integer c1 = codePoint >> 2;
integer c2 = 0x3f & (codePoint << 4);

base64 = llGetSubString(BASE_64_CHARS, c1, c1)+
llGetSubString(BASE_64_CHARS, c2, c2)+
"==";
} else if ((codePoint & ~0x7ff) == 0)
{
integer utf8 =
0xc080 |
(0x1f00 & (codePoint << 2)) |
(0x003f & codePoint);

integer c1 = utf8 >> 10;
integer c2 = 0x3f & (utf8 >> 4);
integer c3 = 0x3f & (utf8 << 2);

base64 = llGetSubString(BASE_64_CHARS, c1, c1)+
llGetSubString(BASE_64_CHARS, c2, c2)+
llGetSubString(BASE_64_CHARS, c3, c3)+
"=";
} else if ((codePoint & ~0xffff) == 0)
{
integer utf8 =
0xe08080 |
(0x0f0000 & (codePoint << 4)) |
(0x003f00 & (codePoint << 2)) |
(0x00003f & codePoint);

integer c1 = utf8 >> 18;
integer c2 = 0x3f & (utf8 >> 12);
integer c3 = 0x3f & (utf8 >> 6);
integer c4 = 0x3f & utf8;

base64 =
llGetSubString(BASE_64_CHARS, c1, c1)+
llGetSubString(BASE_64_CHARS, c2, c2)+
llGetSubString(BASE_64_CHARS, c3, c3)+
llGetSubString(BASE_64_CHARS, c4, c4);
} else
{
integer utf8 =
0xf0808080 |
(0x07000000 & (codePoint << 6)) |
(0x003f0000 & (codePoint << 4)) |
(0x00003f00 & (codePoint << 2));
(0x0000003f & codePoint);

integer c1 = 0x3f & (utf8 >> 26); // Sign bit is on, so masking needed
integer c2 = 0x3f & (utf8 >> 20);
integer c3 = 0x3f & (utf8 >> 14);
integer c4 = 0x3f & (utf8 >> 8);
integer c5 = 0x3f & (utf8 >> 2);
integer c6 = 0x3f & (utf8 << 4);

base64 =
llGetSubString(BASE_64_CHARS, c1, c1)+
llGetSubString(BASE_64_CHARS, c2, c2)+
llGetSubString(BASE_64_CHARS, c3, c3)+
llGetSubString(BASE_64_CHARS, c4, c4)+
llGetSubString(BASE_64_CHARS, c5, c5)+
llGetSubString(BASE_64_CHARS, c6, c6)+
"==";
}

return llBase64ToString(base64);
}

string decodeHtmlEntity(string htmlEntity)
{
integer codePoint;

integer entityLen = llStringLength(htmlEntity);
if (entityLen < 3)
{
return "";
}
if (llGetSubString(htmlEntity, 1, 1) == "#")
{
if (entityLen < 4)
{
return "";
}

if (llToLower(llGetSubString(htmlEntity, 2, 2)) == "x")
{
if (entityLen < 5)
{
return "";
}

codePoint = (integer)("0x"+llGetSubString(htmlEntity, 3, -2));
} else
{
codePoint = (integer)llGetSubString(htmlEntity, 2, -2);
}
} else
{
initEntities();

string entityName = llGetSubString(htmlEntity, 1, -2);
integer entityIndex = llListFindList(HTML_ENTITY_NAMES, [ entityName ]);
if (entityIndex < 0)
{
return "";
}
codePoint = llList2Integer(HTML_ENTITY_CODES, entityIndex);
}

return codePointToString(codePoint);
}

string decodeAllHtmlEntities(string encodedStr)
{
string accum = "";

list subStrings = llParseStringKeepNulls(encodedStr, [ "&" ], []);
integer nSubStrings = llGetListLength(subStrings);
integer i;
for (i = 0; i < nSubStrings; ++i)
{
string substr = llList2String(subStrings, i);
integer scIndex = llSubStringIndex(substr, ";");

if (scIndex < 1)
{
accum += "&"+substr;
} else
{
string decoded = decodeHtmlEntity("&"+llGetSubString(substr, 0, scIndex));
if (decoded == "")
{
decoded += "?";
}
accum += decoded+llDeleteSubString(substr, 0, scIndex);
}
}

return accum;
}


I get what look like the results you want (after a good few seconds of execution) using decodeAllHtmlEntities() on the contents of the document to which you linked.

EDIT: Replaced some of the string processing with llParseStringKeepNulls() for faster performance.
Pedro McMillan
SLOODLE Developer
Join date: 28 Jul 2007
Posts: 231
01-17-2009 07:36
We had major headaches for a while on SLOODLE when we started localizing stuff and enabling UTF8. It *is* possible to make PHP convert all your HTML entities to Unicode characters though. You need to start with this in your PHP script (before you output anything):

CODE

header('Content-Type: text/plain; charset=UTF-8');


Without that, most webservers seem to default to ASCII, but note that particularly old webserver software (e.g. Apache 1) sometimes need other special configurations (or just a reinstallation) to make them work with UTF8.

Anyhow, when you get your text in a PHP script, you then need to decode the entities back to ordinary characters, like this:

CODE

$mystring = "....";
echo html_entity_decode($mystring, ENT_QUOTES, "UTF-8");
PedroRenaut Mariner
Registered User
Join date: 6 Nov 2006
Posts: 17
01-17-2009 07:58
TYVM ... Now all run OK!!!

Perfect.