Sophia GuevaraContent Scraping and Your Nonprofit—Useful Tools to Protect Your Content

By: Sophia Guevara In: Media and Technology| Philanthropy

1 Oct 2012

I recently had an eye-opening experience that helped me realize that many organizations promote their content but do nothing to ensure that it is being utilized appropriately. While some nonprofits allow others to make use of their content with proper attribution, some content users decide to make use of content inappropriately. Specifically, I am referring to content scraping. Sometimes site owners that host scraped content are looking to make money by serving up ad content to visitors who unknowingly visit the copycat page. The site owner can then profit through ad partner programs that compensate them for ad impressions or clicks. In addition, site owners may be looking to improve their site ranking in search engine results by copying and serving up content from a popular site that is considered authoritative.

How can you protect your organization’s online content? Here are some tools that your organization can use:

1. Has your foundation posted several snapshots of public events, gatherings, or conference events? It is a good idea to check once in a while to make sure that your logo and other images are being used appropriately by utilizing tools like TinEye or Google Images. Search by uploading images or the link to the hosted image. TinEye currently has more than two billion indexed images that it will check against to see if there are any matches.

2. Copyscape is a great tool for seeing if there are copies of your content on the Web. There is a free option and a couple of fee-based products available. In the search box, you can enter a website address and search to see if there are any close matches. If there is a hit, Copyscape will provide the address of the copied page and let you know how many words match the original page.

If you find a site that is making use of your organization’s content inappropriately, you may want to contact the webmaster. If that information isn’t visible on the site, try conducting a WHOIS search to see if you can track down the contact information for the person responsible for the domain/site. If the registrant hasn’t elected to keep their information private, you should be able to find what you need.

3. If your organization uses Google’s Webmaster Tools, you can see who is linking to your content. This information is quite interesting to look at and you can see how often a domain linked to your content and which page was linked. By hovering over the domain name, you can see a snapshot of the site linking to your content.

In conclusion, keep in mind that it is important to keep your organization’s content protected from those who are looking to drive more visitors to their site to generate ad revenue by copying your content. If you would like to research this topic, contact your foundation’s librarian.

Sophia Guevara is the chair of the Consortium of Foundation Libraries affinity group

1 Response to Content Scraping and Your Nonprofit—Useful Tools to Protect Your Content

Sean Harmer

October 1st, 2012 at 4:52 pm

Hi Sophia, thank you for the post.

Here are two more points that I thought your readers might find helpful.

I work for a company (Distil Inc.) that helps websites protect and accelerate their content. If you proactively stop web scraping and content theft before it happens, you don’t have to spend the extra time counteract the damage the web scraping could have caused.

Also, regarding Google Images, you might consider telling Google not to index your images. This will prevent your images from showing up at all in Google Image search. Here’s a link to Google’s instructions for this: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35308

Thanks again for the post.

Sean Harmer
Distil Inc. (http://www.distil.it)

Comment Form


Welcome to RE: Philanthropy! In this blog, guest and Council bloggers share ideas and insights on the most pressing issues in philanthropy. If you want to contribute, please contact webteam@cof.org.

The opinions expressed in this blog are those of the authors and do not necessarily reflect those of the Council on Foundations.

Contributors

Marilyn LeFeber
Amy Ellsworth
Sarah Berry
Scott Hudson
Jordan Marshall
Sam Stern
Jerry Hagstrom
Robyn Schein
Dinah Waldsmith Dittman
Lorie Slutsky
Holly Welch Stubbing
Sidney Hargro
Nathan Dungan
Monica Buhlig
Sam Davis
Jessica Janssen
Nicole Lewis
Carmen Fields
Lisa Parker
Barnett Baron
Elizabeth Douglass
Allison Lugo Knapp
Mark Neithercut
Rebecca Graves
Lisa Ranghelli
Kristin Ivie
Deidre Lind
Sue Hildick
Chris Cardona
Kevin F. Walker
Craig Cichy
Valerie Batts
Leslie Dunford
Dori Kreiger
Ryan Ginard
Debbie Starke
Suzanne E. Siskel
THNKR
Hal McCabe
Brian Reich
Sofia Rasmussen
Kelly Brown
Caroline Roan and Atiya Weiss
Jenny Hodgson
Carrie Varoquiers
Mae Hong
Kay Guinane
Benna Wilde
Sophia Guevara
Geralyn Ritter