Add
clickthecity.com Metro Manila Movie Guide; note: huge site
barrygonzaga
site_samples/palmsized/inq7-mobile.site
2005-08-08
3 level inq7.net site
barrygonzaga
site_samples/regional_philippines/
2005-08-08
inq7.site, pdi.site: replace
pdi.site with inq7.site
barrygonzaga
site_samples/linux/gwn.site
2005-08-08
add logo imageurl; update author
email
barrygonzaga
site_samples/business/businessweek.site
2005-08-08
Reflect web site title,
update author email
barrygonzaga
site_samples/palmsized/
2005-08-08
ny_times.site, salon.site: remove
nonworking site
akkana
lib/Sitescooper/Main.pm
2005-07-06
Add << ^^ >> links at end of story as
well as beginning
akkana
site_samples/
2005-07-06
lib/layouts.site, humor/jon_carroll.site,
news/wired_news/wired_news_politics.site, opinion/salon.site,
science/new_scientist_news.site, tech/newsforge.site: Some updates
for sites that have changed.
akkana
site_samples/regional_boston/bostonglobe.site
2005-07-06
New site: Boston
Globe City & Region sections. From Bruce Zohn
akkana
site_samples/
2005-07-06
news/USNews.site, news/newsweek_intl.site,
tech/pcmag_images.site: Updates from BoonNam Goh
akkana
site_samples/science/new_scientist_news.site
2005-01-26
Changes to track
the recent site changes
akkana
site_samples/regional_israel/
2005-01-17
haaretz.site, jpost-columns.site,
jpost-international.site, jpost-israel.site, jpost-me.site,
jpost-opinion.site: David Resnick : Jerusalem
Post and Haaretz site files
B. M. Sleight
: minor changes to pick up ask.slashdot.org
it.slashdot.org
akkana
site_samples/weblog/kevin_sites.site
2005-01-05
New site from Delmer Wells
: Kevin's War Blog
akkana
site_samples/tech/pcmag_images.site
2005-01-05
Goh Boon Nam: Update to
track site changes and grab images better
akkana
site_samples/business/the_economist.site
2005-01-05
Goh Boon Nam: Remove
Subscription-only pages which cause problem to Plucker
akkana
site_samples/
2005-01-05
humor/dave_barry.site,
linux/debian_weekly_news.site,
news/wired_news/wired_news_tech.site, tech/newsforge.site,
tech/the_register.site, weblog/riverbend.site: Updates to track
changes in the web sites
akkana
site_samples/weblog/riverbend.site
2004-06-22
Fixed StoryStart
akkana
site_samples/linux/
2004-06-03
kc_debian_hurd.site, kc_gimp.site: Remove no
longer extant debian, hurd and gimp kernel cousins
science/archaeology_org.site,
science/grahamhancock.site, tech/slyck.site: New sites from Ken
Russell
akkana
site_samples/palmsized/the_register_rss.site
2004-05-14
New palmsized
register from Ken Russell
akkana
site_samples/palmsized/
2004-05-14
the_register.site,
the_register_rss.site: Rename palmsized The Register to The
Register RSS, so as not to conflict with the non-palmsized Register
akkana
site_samples/
2004-05-14
news/atlantic.site, tech/slashdot_top.site: New
sites
akkana
site_samples/opinion/salon.site
2004-05-14
Comment out StoryToPrintableSub
-- it was causing errors
akkana
site_samples/
2004-04-27
linux/desktoplinux.site, science/smithsonian.site,
tech/joelonsoftware.site, tech/newsforge.site,
weblog/riverbend.site, weblog/where_is_raed.site: New sites, from
me
akkana
site_samples/lib/layouts.site
2004-04-27
Fix BBC news information
akkana
site_samples/
2004-04-27
linux/kernel_traffic.site,
opinion/i_cringely.site, tech/the_register.site: Update URL,
content start, and other minor fixes
akkana
site_samples/news/yahoo/
2004-04-26
yahoo_business.site,
yahoo_entertainment.site, yahoo_politics.site, yahoo_tech.site,
yahoo_top_stories.site: Re-adding yahoo sites, fixed thanks to
Jonathan Becker
akkana
site_samples/comics/
2004-04-26
boondocks.site, doonesbury.site,
tedrall.site: New comics from Ignatz Sol
akkana
site_samples/
2004-04-25
news/newsweek_intl.site, tech/pcmag_images.site:
Updates from Goh Boon Nam
akkana
site_samples/humor/dave_barry.site
2004-04-25
Update from Alan Hoyle
: fix story start, end, headline
cwerner
site_samples/opinion/pulpit.site
2004-04-23
New site for Bob Cringely's
weekly column: The Pulpit. This is the same site scooped by
i_cringely.site, except that he old i_cringely site did a 2 level
scoop that attempted to get a set of columns, whereas the new one
gets a single column and only on Fridays. The old one can probably
be removed, but I didnt want to mess with it in case someone is
relying on it.
Improved support
for isiloXC:
1. Added a new param to sitescooper.cf "ISiloDefaultIxlFile" that
points to an .ixl file in the file system. This means that users
can change the iSiloX options by using the iSiloX GUI tool to
create a new document, change all the options, then save as a .ixl
file. The and tags of the document are
stripped and replaced by sitescooper but the rest is used for
generating the isilox pdb.
More details are given in the comments in sitescooper.cf.
The most common likely use for this is to allow the users of
-isilox to specify global settings for things like image depth,
color, inclusion, dithering etc, and perhaps for category too.
2. Added a new site param called "ExtraISiloIxlTags", to allow ixl
settings specific to a site. Updated doc/site_params.html, so see
this for more details.
This is a little different in that the user has to specify a set of
top-level tags for the .ixl file. These get appended to the
generated file thus overriding the defaults (or overriding the
global options if the new config param is used). This takes
advantage of the fact that isilox tolerates the tags appearing more
than once by simply taking the last tag and ignoring earlier copies
(or at least its xml parser does).
So you can set general options in your .ixl file and override
specific options in the .site files. The fact that you have to
override the whole tag such as means that you can't
override, say bitdepth separately from dithering, but its still
pretty powerful. And simpler and more durable (ie resitant to
changes in isilox) than adding a bunch of new site params.
: Modified Files: : sitescooper/sitescooper.cf
sitescooper/doc/site_params.html :
sitescooper/lib/Sitescooper/Main.pm :
sitescooper/lib/Sitescooper/SCF.pm : Added Files: :
sitescooper/default_isilox.ixl
jmason
lib/Sitescooper/
2004-02-19
Robot.pm, StoryURLProcessor.pm: some glitches
in RSS output fixed; now does not search for sub-stories after
html_to_text conversion
jmason
site_samples/science/new_scientist_news.site
2004-02-18
New Scientist News
site updated
akkana
site_samples/
2004-02-16
cinema/ebert_1min.site, cinema/roger_ebert.site,
humor/dave_barry.site: Contributions from Alan Hoyle, alanh at
email.unc.edu
jmason
lib/Sitescooper/
2004-02-13
Main.pm, SCF.pm: added patch from Robert Fuhge,
robert.fuhge.at.epost.de, assign categories to Plucker documents
using the Category: line in the site file
jmason
site_samples/tech/risks.site
2004-02-13
updated risks.site to use new
'mobile device' rendering
akkana
site_samples/business/the_economist.site
2004-02-11
The Economist, from
BoonNam Goh
akkana
site_samples/news/
2004-02-11
newsweek.site, newsweek_intl.site: Newsweek
updates from BoonNam Goh
jmason
site_samples/security/
2004-02-07
crypto_gram.site, crypto_gram.site:
cryptogram site fixed
jmason
lib/Sitescooper/Robot.pm
2004-01-31
handle undef headlines
jmason
lib/Sitescooper/Robot.pm
2004-01-31
oops; RSS output headline was not being
HTML-encoded correctly
akkana
site_samples/
2003-11-15
tech/computer_world.site, news/newsweek_intl.site:
Contributions from BoonNam Goh
barrygonzaga
site_samples/linux/gwn.site
2003-11-04
add Gentoo Weekly News
akkana
site_samples/
2003-10-31
news/Newsweek.site, news/NewsweekIntl.site,
regional_israel/jpost.site: Remove inconsistently named files
akkana
site_samples/news/
2003-10-31
newsweek.site, newsweek_intl.site: Newsweek,
from Goh Boon Nam
akkana
site_samples/regional_israel/jerusalem_post.site
2003-10-31
Jerusalem Post,
from David Resnick
akkana
site_samples/tech/wiredmag.site
2003-10-31
Previous commit only got one
specific date. So I've substituted my own Wired site file, which
doesn't get entire stories yet, but it does get Wired every day.
akkana
site_samples/tech/wiredmag.site
2003-10-31
One issue of Wired Magazine,
from richard_html2pdb at yahoo dot com
akkana
site_samples/tech/pcmag_images.site
2003-10-31
Update from Goh Boon Nam:
Get full-sized images
akkana
site_samples/news/
2003-10-31
Newsweek.site, NewsweekIntl.site: Newsweek
updates (US and Intl) from BoonNam Goh
akkana
site_samples/regional_israel/jpost.site
2003-10-31
Jerusalem Post, from
David Resnick
akkana
site_samples/news/
2003-10-29
Newsweek.site, USNews.site: New sites
contributed by BoonNam Goh
cinema/ebert_answer_man.site,
cinema/ebert_features.site, cinema/ebert_great_movies.site,
cinema/roger_ebert.site, opinion/nro.site: updated sites from John
Straw
jmason
site_samples/regional_germany/
2003-06-10
de_cert.site, de_cyberkino.site,
de_gazette.site, de_heise_mobil.site, de_heise_tp.site,
de_heute.site, de_pdassi_news.site, de_pdassi_software.site,
de_spiegel.site, de_stern.site, de_tagesschau.site,
de_teltarif.site, de_tvspielfilm.site, mobile2day.site,
palmfaq_de.site, pda_debitel_net.site, windows2000faq.site,
zdnet_news.site, bundesregierung.site: a whole lot of new
regional_germany sites from Stefan Schwingeler
business/hottips.site, linux/linuxplaza.site,
opinion/feed.site, regional_germany/de_spiegel.site,
regional_north_carolina/weather24_raleigh.site: more dead sites
pruned
jmason
site_samples/
2002-01-18
languages/aspwire.site,
languages/news_perl_org.site, languages/perlmonth.site,
languages/sqlwire.site, languages/vbwire.site,
opinion/simson_garfinkel.site, tech/sendmail_net.site: removed lots
of dead sites
bsd/openbsd_journal.site,
palm/palminfocenter.site, palmsized/cnn.site,
palmsized/ny_times_handheld.site, palmsized/the_register.site: site
files from Barry Dexter A. Gonzaga
Guardian site
updated by Stewart C. Russell (stewart /at/ ref.collins.co.uk)
jmason
site_samples/business/businessweek.site
2001-09-06
oops, forgot busweek
jmason
site_samples/
2001-09-05
palm/pdalive.site, palmsized/ny_times.site,
palmsized/salon.site, news/gallup_poll.site,
palm/palminfocenter.site: added sites from Barry Dexter A. Gonzaga
jmason
site_samples/regional_denmark/politiken.site
2001-08-27
added Politiken
site from Claus Hindsgaul
jmason
lib/Sitescooper/UserAgent.pm
2001-08-20
fixed http auth support
jmason
site_samples/regional_toronto/
2001-08-18
globe_and_mail_columnists.site,
globe_and_mail_national.site, globe_and_mail_thearts.site,
globe_and_mail_toronto.site: globe+mail sites updated by Michael
Graham (magog@the-wire.com)
jmason
site_samples/regional_california/
2001-08-17
la_times.site,
latimes_nat.site, latimes_oc.site,
la_times/la_times_frontpage.site, la_times/latimes_local.site,
la_times/latimes_nat.site, la_times/latimes_oc.site,
la_times/latimes_science.site, la_times/latimes_tech.site,
la_times/latimes_world.site: added new LA Times sites from Mark
Beckman (mbeckman at jps.net), and reorged them into a directory
business/cnn_financial.site, news/cnn_mobile.site,
science/sciam.site, sport/cnn_sports.site: added SciAm site from
Marko, and some CNN sites from David's PODS system translated by
Marko
jmason
lib/Sitescooper/Main.pm
2001-06-28
added support for escaped-hashes in site
files from Jeff Hecker