Log in / create account Article Discussion Edit History Go to the site toolbox

WWW::Mechanize

From Reporting Cookbook: www.forjournalists.com/cookbook

Perl's WWW::Mechanize module is a handy way to automate Web data downloads or screen scraping tasks. Here's an example of how it can be used to delete spam from the Reporters' Cookbook. In this case, the offending links are just copied to the data portion of the script. The module can look for the text of a particular link and follow it, as this example also shows, but much of the recent spam has been in Chinese, and the author couldn't figure out how to get the module to recognize Chinese characters. The script can be improved, but it's still a faster way to delete spam than clicking on each offending page and deleting it manually, saving a few minutes a day.


use WWW::Mechanize;

my $mech = WWW::Mechanize->new();

$mech->get("http://forjournalists.com/cookbook");
$mech->follow_link( text => 'Log in / create account' );
$mech->submit_form(
        form_name => "userlogin",
        fields      => {
			wpName => "your_username",
			wpPassword => "your_password"
			},
    );

while(<DATA>){
	chomp;
	$link = $_;
	$mech->get($link);
	# $mech->follow_link( text_regex => qr/$link/i ); # You can search for links using regular expressions	
	$mech->follow_link( text => 'Delete' );
	$mech->field( 'wpReason', 'Spam');
	$mech->click( 'wpConfirmB' );
}


__DATA__
http://forjournalists.com/cookbook/index.php?title=6%E5%90%88%E5%BC%80%E5%A5%96%E7%BB%93%E6%9E%9C%E8%AE%B0%E5%BD%95_%E6%9B%BE%E9%81%93%E4%BA%BA_%E7%99%BD%E5%B0%8F%E5%A7%90&rcid=1601
http://forjournalists.com/cookbook/index.php?title=6%E5%90%88%E5%BC%80%E5%A5%96%E7%BB%93%E6%9E%9C%E8%AE%B0%E5%BD%95_%E5%BC%80%E5%A5%96%E7%BB%93%E6%9E%9C%E8%AE%B0%E5%BD%95_%E6%9B%BE%E9%81%93%E4%BA%BA_%E7%99%BD%E5%B0%8F%E5%A7%90&rcid=1602
http://forjournalists.com/cookbook/index.php?title=%E9%A6%99%E6%B8%AF%E5%BD%A9%E7%A5%A8%E5%BC%80%E5%A5%96%E7%9B%B4%E6%92%AD_%E9%A6%99%E6%B8%AF%E5%BD%A9%E7%A5%A8%E5%BC%80%E5%A5%96%E7%9B%B4%E6%92%AD%E4%B8%80%E5%8F%B7%E4%B8%AD&rcid=1607
http://forjournalists.com/cookbook/index.php?title=Jkuudbsjbhduh&rcid=1608
http://forjournalists.com/cookbook/index.php?title=%E4%B8%AD%E5%9B%BD%E6%80%A7%E7%97%85%E6%B2%BB%E7%96%97%E4%B8%AD%E5%BF%83-%E5%B0%96%E9%94%90%E6%B9%BF%E7%96%A3%E4%B8%93%E5%8C%BA&rcid=1609
http://forjournalists.com/cookbook/index.php?title=%E7%96%B1%E7%96%B9_%E7%94%9F%E6%AE%96%E5%99%A8%E7%96%B1%E7%96%B9-%E4%B8%AD%E5%9B%BD%E5%8C%BB%E5%AD%A6%E7%BD%91&rcid=1610
http://forjournalists.com/cookbook/index.php?title=%24%E9%9B%85%E6%80%9D%E4%BB%A3%E8%80%83QQ%EF%BC%9A27939721&rcid=1611
http://forjournalists.com/cookbook/index.php?title=%E7%96%B1%E7%96%B9_%E4%B8%AD%E5%9B%BD%E6%80%A7%E7%97%85%E6%B2%BB%E7%96%97%E7%BD%91&rcid=1612
http://forjournalists.com/cookbook/index.php?title=%E4%BB%A3%E8%80%83%E9%9B%85%E6%80%9DQQ%EF%BC%9A27939721%E4%B8%8D%E6%94%B6%E8%B7%AF%E8%B4%B9%E3%80%81%E4%B8%8D%E5%81%9A%E8%AF%81%E4%BB%B6&rcid=1613
http://forjournalists.com/cookbook/index.php?title=Liposuction&rcid=1614
http://forjournalists.com/cookbook/index.php?title=%E4%B8%AD%E5%9B%BD%E5%B0%96%E9%94%90%E6%B9%BF%E7%96%A3%E3%80%81%E7%96%B1%E7%96%B9%E4%B8%93%E4%B8%9A%E6%B2%BB%E7%96%97%E7%BD%91&rcid=1615
http://forjournalists.com/cookbook/index.php?title=Hao123chinaren&rcid=1616
http://forjournalists.com/cookbook/index.php?title=Plastic_surgerys&rcid=1617

Site Toolbox:

Personal tools
Attribution-Noncommercial-Share Alike 3.0 Unported
This page was last modified 14:30, 19 October 2007. - This page has been accessed 1,045 times. - Disclaimers - About Reporting Cookbook