Tutorial on how to install and configure htDig search for your web site. The Linux Information Portal includes informative tutorials and links to many Linux sites. WWW Search Engine Software. Contribute to roklein/htdig development by creating an account on GitHub. Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search these documents.

Author: Mut Vujind
Country: Liberia
Language: English (Spanish)
Genre: Software
Published (Last): 13 December 2018
Pages: 496
PDF File Size: 8.5 Mb
ePub File Size: 13.75 Mb
ISBN: 472-8-76324-836-1
Downloads: 81686
Price: Free* [*Free Regsitration Required]
Uploader: Akinomi

In this way, you can maintain separate directories of config files for the public and secure sites, so that the secure config files are not accessible from the public htsearch. You can’t, and you shouldn’t.

ht://Dig Frequently Asked Questions

This change may cause some PHP or CGI wrapper scripts to stop working, but these scripts should be similarly changed to recognize both separator characters. It is the opinion of the developers that this is the preferred method.

Some operating systems limit files to 2 GB in size, which can become a problem with a large database. This program uses the -T option as a record separator rather than an alternate temporary directory. The latest version is 3. Either hdtig your “rundig” script if you run htmerge through that or before you run htmerge, set the variable TMPDIR to a temp directory with lots of space. This may give you enough information to find and fix the problem yourself, or at least it may help others on the htdig mailing list to point out what to do next.


This seems to be a frequent cause of confusion. This znd a known bug in 3. This must be done with an external parser or converter. First of all, if you don’t have any luck with the settings of the locale attribute that you try, make sure you use a locale that is defined on your system.

Setting the locale correctly seems to be a frequent source of frustration for ht: You’d need to work out an equivalent configuration for your server if you’re not running Apache.

Frequently Asked Questions

The scores are calculated mostly by htdig at indexing time, with some tweaking done by htsearch at search time. However, rundig builds the database from scratch each htdif you run it. For instance, on libc5-based Linux systems, the bundled regex code works fine by default, but using libc5’s regex code causes core dumps. It so happens that the ht: Additional features include support for robot exclusion, Boolean expression and fuzzy configurable search results, ability to search both text and HTML files, searches on subsections of the database, the ability to index a protected server, limit the depth of the search, and add keywords to HTML documents.

There is a bug in Adobe Acrobat Reader version 4, in its handling of the -pairs option, which causes a segmentation violation when using it with htdig 3.

However, the xpdf package is a reliable, free software package for indexing and viewing PDF files. More information on what these variables mean annd be found in the ht: Check your web server’s error log for any information related to htsearch’s failure.

It first generates a database by “indexing” the web content. Didier Lebrun has written a guide for configuring htdig to support French, entitled. Be sure to do a “make clean” before a “make”, to remove any object files compiled with the old compiler and headers.


Despite a great deal of debugging of these programs, we haven’t been able to completely eliminate all such problems on all platforms. Also, pdftotext still has some difficulty handling text in landscape orientation, even with its new -raw option in 0.

The copyright and license notices on this page only apply to the text on this page. Other techniques include removing the db. Please try to include as much information as possible, including the version of ht: The default values for these hhdig factors, as well as information about whether they’re used by htdig or htsearch, are all listed in the configuration attributes documentation. In addition, the location of words within the document has an effect on score, as word scores are also multiplied by a varying location factor somewhere in between for words near the start and 1 for words htdiv the end of the document.

In PHP, you can simply set the following in your php.

You’ll likely need to rebuild your database from scratch if it’s corrupted. Thus far, the previous examples have assumed a Web site consisting of static HTML pages as the base for ht: The htdkg search results wrapper file, that contains the header and footer together in one file.