WWWshop logo - Everything Internet

Whats going on at WWWShop:


WWWshop Internet services image

Summary

PhpDig is a web spider and search engine written in PHP, using a MySQL database and flat file support. PhpDig builds a glossary with words found in indexed pages. On a search query, it displays a result page containing the search keys, ranked by occurrence.

HTTP spidering

PhpDig follows HREF links as shown by any web browser to find the pages to index. Links can also be in AreaMap, frames, or simple like window.open() or window.location() JavaScript. PhpDig supports redirections and indexes by following links. PhpDig does not traverse directories or database tables to index content.

By default, PhpDig does not go outside of the domain you define for the indexing. Various index options are choosen by the user, including a parameter to extend indexing to subdomains and a parameter to limit the indexing to a specific directory.

You can limit indexing so that the maximum links found is ((X * Y) + 1) where X is links and Y is depth. Alternatively, you can index just one page, or you can set options to index a greater number of pages.

Any HTML content is indexed, for example from static HTML pages to dynamic HTML pages produced from say PHP scripts. PhpDig searches the Mime-Type of the document, and can be set to auto-index via a cron job.

Full-text indexing

PhpDig indexes all words of a document, but you can avoid common words by defining such words in a text file. Underscores and other characters can be part of a word. Words in the title can have a more important weight in ranking results.

Note that the MySQL FULLTEXT index is different from the PhpDig full-text indexing. The MySQL FULLTEXT index is a table index used with MyISAM tables. PhpDig does full-text indexing of page content but does not use the MySQL FULLTEXT index for searches.

Indexed file types

PhpDig indexes HTML and text files by itself. PhpDig could index PDF, MS-Word, MS-Excel, and MS-PowerPoint files if you install external binaries on the server for this purpose. PhpDig is configured to use catdoc , xls2csv , pstotext or pdftotext , and ppt2text programs.


Get in contact with us at wwwshop

Italian Motor Co
has re-invented itself with a great new fresh look that rivals industry leaders. CMS driven, it is truly a pleasure to work with, update, and manage. Check out IMC

Career Mums - Website devolopmentsRoadcon
The Roadcon Group is an entrepreneurial family owned and operated business based in Ballarat, regional Victoria. This new website devolopment pushes the company into the future. Check out Roadcon

Database driven projects:

Career Mums - Website devolopmentsThe Career Mums
project was a joint effort between Edit Media and WWW Shop. The objective was to create a career resource for Mothers wanting to get back into the workforce. Edit Media created the designs and we integrated the designs into customised application development. Check out CareerMums

Health Conference Central website devolopment imageGET tours Australia
With more than 30 years experience providing “Living Lessons” to Australian students, G·E·T Educational Tours sets the benchmark for educational tour programs Check out GET tours