Home

Add a comment

 

Avatar: pieroxy

nodejs error ?

 You are confused between two very different things: compression and encoding. 

Compression is the act of taking bits as an input and outputing less bits on the output. This is where the LZ part of LZString is working.

Encoding is the act of taking bits as input and outputing characters as output. This is where the String part of LZString is working. This is needed because JavaScript doesn't know (consistently at least) how to handle binary data. It can handle numbers and strings. 

So no, you cannot "make more compression" using compressToUTF8(). compress returns to you a UTF-16 string that is packed at the maximum but all 65536 values of a 16-bit int are not valid, so the string is indeed an invalid UTF-16 string. This is why I created compressToUTF16 that will only store 15 bits per character (hence, a slightly less optimal encoding) but produces a valid string. compressToBase64 gives you a string where only 6 bits are used for each characters. You might think that's complete bullshit as the string will be more than twice as big as a string produced with compressToUTF16, and yet is it the most efficient way to upload your data to your server, because everything will pass through the "url encoding" encoder, making every character outside the Base64 range three bytes at least.

Now, to exchange data, computers usually use bytes, not strings. UTF-8, ISO-8859-1 and UTF-16 are methods (encodings) to represent a string as a byte stream. A 1024 characters string may be represented as 1024 bytes in ISO-8859-1 but 2048 bytes in UTF-8. Similarly, another string may be represented as 1000 bytes in UTF-8 but 2000 bytes in UTF-16. Another, 1000 bytes in UTF-16 and 1500 bytes in UTF-8. So you have to know what you're doing in order to optimize your stream of data. Just calling "length" on your strings gives you the number of characters, not the number of bytes needed to tranfer thesse characters to your browser.

So, yes, anyone can write a compressToUTF8() method (it is actually very simple). But in UTF-8, the first bit is reserved for higher characters. So you can only store 7 bits per characters, wasting a full 12.5% of bandwidth, where with UTF-16 I only waste half of that to be UTF-compliant. I am sure you can set the content-type of your requests with your library, so why not just setting it up to be UTF-16 and be over with it? You save bandwidth and time, to get a more elegant solution.

But if you use compressToUTF16 and send it out encoded in UTF-8, your stream may very well end up being twice as big as it needs to be. Again, only testing will tell.


nodejs error ?


Title
Body
HTML : b, strong, i, em, blockquote, br, p, pre, a href="", ul, ol, li, sub, sup
OpenID Login
Name
E-mail address
Website
Remember me Yes  No 

E-mail addresses are not publicly displayed, so please only leave your e-mail address if you would like to be notified when new comments are added to this blog entry (you can opt-out later).

Home