UTF-8 Strings

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF-8 Strings

Michael Pitoniak

team,

 i am having difficulty getting beanshell to operate with UTF-8 strings. does beanshell support high byte info if the string does not use the \u notation?

String string = "abc\u5639\u563b";
String Chinese = "abc嘹嘻";


it appears that the second string clips the high bytes when i use it.

any sample cose is appreciated. i am using JEdit as my editor.

thx,

mp


Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 Strings

Michael Pitoniak

More Specifically:

It appears the source() command read the file as ASCII, and hence any high byte data is lost on strings, even if the editor being used dispalys it. Is there a way to preserve the UTF-8 info while using the source command?

thx,

mike


Michael Pitoniak <[hidden email]>
Sent by: [hidden email]

09/14/2005 06:04 AM

To
<[hidden email]>
cc
Subject
[Beanshell-users] UTF-8 Strings






team,


i am having difficulty getting beanshell to operate with UTF-8 strings. does beanshell support high byte info if the string does not use the \u notation?


String string = "abc\u5639\u563b";

String Chinese = "abc嘹嘻";



it appears that the second string clips the high bytes when i use it.


any sample cose is appreciated. i am using JEdit as my editor.


thx,


mp



Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 Strings

Daniel Wunsch
hi

> It appears the source() command read the file as ASCII, and hence any high
> byte data is lost on strings, even if the editor being used dispalys it.
> Is there a way to preserve the UTF-8 info while using the source command?

the source command uses the platform default encoding,
so you can run into nice little problems if you try to use
it in a script you want to use on different platforms.

use something like this instead:

/** source an UTF-8 encoded URL-content into the caller */
public sourceUTF8(URL url) {
    var in  = new InputStreamReader(url.openStream(), "UTF-8");
    this.interpreter.eval(in, this.caller.namespace, "URL: "+url.toString());
    in.close();
}

daniel


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Beanshell-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/beanshell-users
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 Strings

Michael Pitoniak

Dan,

 Many thanks for timely assistance. It solved my problems.

best regards,

mp


Daniel Wunsch <[hidden email]>
Sent by: [hidden email]

09/14/2005 12:14 PM

To
[hidden email]
cc
Subject
Re: [Beanshell-users] UTF-8 Strings





hi

> It appears the source() command read the file as ASCII, and hence any high
> byte data is lost on strings, even if the editor being used dispalys it.
> Is there a way to preserve the UTF-8 info while using the source command?

the source command uses the platform default encoding,
so you can run into nice little problems if you try to use
it in a script you want to use on different platforms.

use something like this instead:

/** source an UTF-8 encoded URL-content into the caller */
public sourceUTF8(URL url) {
   var in  = new InputStreamReader(url.openStream(), "UTF-8");
   this.interpreter.eval(in, this.caller.namespace, "URL: "+url.toString());
   in.close();
}

daniel


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Beanshell-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/beanshell-users

Reply | Threaded
Open this post in threaded view
|

RE: UTF-8 Strings

Daniel Leuck
In reply to this post by Michael Pitoniak
Hi Mike,

The source command will use the default encoding for your machine.  If you
want to read a UTF8 file use:

var url = new File("/pathToMyCommand/myUTF8Command.bsh").toURL();
this.interpreter.eval(new InputStreamReader(url.openStream(), "UTF8"), this.
namespace, "URL: "+url.toString());

Note: Your native console probably doesn't support UTF8, so I would use
BeanShell's Swing console to view the output.

Cheers,
Dan

> -----Original Message-----
> From: [hidden email] [mailto:beanshell-users-
> [hidden email]] On Behalf Of Michael Pitoniak
> Sent: Wednesday, September 14, 2005 12:04 AM
> To: [hidden email]
> Subject: [Beanshell-users] UTF-8 Strings
>
>
> team,
>
>  i am having difficulty getting beanshell to operate with UTF-8 strings.
> does beanshell support high byte info if the string does not use the \u
> notation?
>
> String string = "abc\u5639\u563b";
> String Chinese = "abc嘹嘻";
>
>
> it appears that the second string clips the high bytes when i use it.
>
> any sample cose is appreciated. i am using JEdit as my editor.
>
> thx,
>
> mp
>
>




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Beanshell-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/beanshell-users