Tuesday, January 31, 2012
Monday, January 30, 2012
Sunday, January 29, 2012
Saturday, January 28, 2012
movie: Fateless - German title: "Roman eines Schicksallosen" (2005)
Thursday, January 26, 2012
Tuesday, January 24, 2012
perl: XML::LibXML::XPathContext - registerNs
XML::LibXML::XPathContext - XPath Evaluation - metacpan.org
If your XML has a namespace without prefix (xmlns="…" instead of xmlns:aaa="…", where 'aaa' would be called the prefix), you still have to provide registerNs with a proper prefix and use that prefix from then on; no undef or zero-length string allowed as prefix.
Learning this lesson cost me some sleep last night.
Labels:
The Perl Programming Language,
XML,
XML::LibXML
syndication feeds for blogs on Blogger.com take CGI parameters
You can find the URL-s of the Atom and the RSS feeds for such a blog in its HTML:
Within the XML of an Atom feed you can find a node like this:
The href attribute looks rather interesting, it uses these CGI parameters:
Actually the href used the somehow internal URL of that blog, but apparently both "public" URL and internal URL seem rather interchangeable – trial and failure "proved" that. Furthermore these CGI parameters also work on the URL of the RSS feed, not just the Atom feed.
/html/head/link[@rel="alternate" and @type="application/atom+xml"]/html/head/link[@rel="alternate" and @type="application/rss+xml"]
Within the XML of an Atom feed you can find a node like this:
/feed/link[@rel="next"]
/feed/link[@rel="next" and type="application/atom+xml"]
Found 1 nodes:
-- NODE --
The href attribute looks rather interesting, it uses these CGI parameters:
- start-index
- max-results
Actually the href used the somehow internal URL of that blog, but apparently both "public" URL and internal URL seem rather interchangeable – trial and failure "proved" that. Furthermore these CGI parameters also work on the URL of the RSS feed, not just the Atom feed.
BTW: Both feed XML-s carry an element openSearch:totalResults, find it like this:
It tells you exactly, what it says, i.e. the number of total results.
So you may very well read blog feed XML-s in small chunks, until you reach their end. I assume, this is what Google Reader does e.g. . It shows you the most recent blog articles (resp. their stubs) initially, and extends the list, when it notices, you are scrolling "beyond the end".
/feed/openSearch:totalResult… for Atom resp. RSS.
/rss/channel/openSearch:totalResult
It tells you exactly, what it says, i.e. the number of total results.
So you may very well read blog feed XML-s in small chunks, until you reach their end. I assume, this is what Google Reader does e.g. . It shows you the most recent blog articles (resp. their stubs) initially, and extends the list, when it notices, you are scrolling "beyond the end".
book: MySQL Troubleshooting
MySQL Troubleshooting: 
Sometimes applications can go mad: tables contain wrong data, users get random replies, server stop working, and so on. Several easy methods allow users to often find the problems quickly. This book, based on successful conference presentations by the author, cover SQL problems, memory and other server problems, replication, and problems related to particular storage engines.

Sometimes applications can go mad: tables contain wrong data, users get random replies, server stop working, and so on. Several easy methods allow users to often find the problems quickly. This book, based on successful conference presentations by the author, cover SQL problems, memory and other server problems, replication, and problems related to particular storage engines.
Monday, January 23, 2012
book: Getting Started with Fluidinfo
Getting Started with Fluidinfo: 
For developers, content providers and sophisticated power users, Fluidinfo is an online information storage and search platform that supports shared openly writable metadata of any type and about anything. Fluidinfo helps content owners publish product information via a modern writable API, with flexible permissions and their domain name on their data. Developers can create lightweight applications that make data social while letting users personalize and search on anything.

For developers, content providers and sophisticated power users, Fluidinfo is an online information storage and search platform that supports shared openly writable metadata of any type and about anything. Fluidinfo helps content owners publish product information via a modern writable API, with flexible permissions and their domain name on their data. Developers can create lightweight applications that make data social while letting users personalize and search on anything.
the Firefox setting "browser.display.use_document_colors"
Up until now, I have never heard of this setting, nevertheless it had gotten toggled a while ago. It had made me a little nervous during the last couple of weeks.
How I solved the issue:
What a wicked little thing!!
I am really relieved, this got finally solved.
How I solved the issue:
- searched the web for "Firefox ignores stylesheets",
- too many hits of course …,
- found a hint pointing to PrefBar,
- installed the Firefox Add-on by the name of PrefBar,
- toggled the Colors check box on that tool bar,
- which surprisingly did, what I had not expected.
But that did not reveil me, which Firefox setting was involved.
- So I launched about:config,
- filtered by "color",
- toggled the Colors check box another couple of times on PrefBar,
- until I finally got aware of browser.display.use_document_colors changing its value simultaneously.
What a wicked little thing!!
I am really relieved, this got finally solved.
Friday, January 20, 2012
how can Google Reader go further back in time on a Feedburner feed than the XML shows me?
If you use Google Reader for reading a feed on Google Feedburner, you can go back and back and back and … in time, but if you simply download the feed (as XML), the file is pretty finite and short.
- Q: How do they do that?
- Q: How do I get such a longer feed XML file myself? (I am more interested in this.)
Google Chrome extension "Table Capture"
Chrome Web Store - Table Capture
Using this tool you can browse all HTML tables on a web page (even nested ones) with big fun.
Using this tool you can browse all HTML tables on a web page (even nested ones) with big fun.
George Mike's HTML table capture test suite
Table Capture Test
George Mike is the author of the Google Chrome extension Table Capture, which I find very, very useful.
Firefox Add-on "Dafizilla Table2Clipboard"
Dafizilla Table2Clipboard :: Add-ons for Firefox
sources on Sourceforge.net
If you want to paste data in Microsoft Excel or OpenOffice Calc with correct disposition simply use Table2Clipboard.
"A brief survey of web data extraction tools" (ACM SIGMOD Record, Volume 31 Issue 2, June 2002)
- A brief survey of web data extraction tools (citation at the ACM digital library)
- the article as PDF at Berthier A. Ribeiro-Neto's home page at his university
Labels:
page scraping,
web harvesting,
web scraping
Thursday, January 19, 2012
Wednesday, January 18, 2012
OpenStreetMap claims map vandalism traced to Google IP range - Update
OpenStreetMap claims map vandalism traced to Google IP range - Update: Bogus changes to the OpenStreetMap maps in London and New York have, according to members of the OpenStreetMap board, been traced to a Google IP range and they want an explanation from Google
Tuesday, January 17, 2012
article: "Stepping up from XML::Simple to XML::LibXML"
Stepping up from XML::Simple to XML::LibXML
I applied this (and a little more) in order to migrate my table_pdf2csv.pl , and of course I like it much better now. From there I will easily create more nice utilities.
I applied this (and a little more) in order to migrate my table_pdf2csv.pl , and of course I like it much better now. From there I will easily create more nice utilities.
Monday, January 16, 2012
pstree - Wikipedia, the free encyclopedia
pstree - Wikipedia, the free encyclopedia
The article tells you, where to get the sources from.
The README tells you, how to compile it.
Easy!!!
Looks like pstree got abandoned on fink, but recompiling it yourself is honestly easy.
Perl-XML FAQ on XML::XPathScript
Perl-XML Frequently Asked Questions
XPathScript is a stylesheet language similar in many ways to XSLT (in concept, not in appearance), for transforming XML from one format to another …
XPathScript is a stylesheet language similar in many ways to XSLT (in concept, not in appearance), for transforming XML from one format to another …
Labels:
The Perl Programming Language,
XML,
XSLT
Subscribe to:
Posts (Atom)