| ||||||
|
How to search the Web...
There are millions and millions of webpages out there. However, as most of us have troubles finding an old letter on our own computer, how can we find relevant information on this "global hard drive"? After all, this is the closest thing we get to a World Wide Anarchy. Well, there are people out there trying to catalogue the Web for us. Furthermore, virtual robots are scurrying around, trying to map the vast expanses of Cyberspace. Although most of them can cover only a small part of the Net, the task of finding anything among some two billion pages is still daunting. However, the main problem is not that the search engines and the search directories find too little, but that they find too much. It is hard to uncover the needle in a list of 40,000 hits. That's why Pandia brings you this short and easy search engine tutorial. To get the right answer, you must ask the right question. This Web search tutorial will tell you exactly how to do that! It will take you approximately 30 minutes to read this search engine tutorial through, and you will learn the essentials of Web searching in less than an hour. By improving your searching skills you will be able to find what you are looking for faster and more efficiently. How is that for an investment? Most people are primarily interested in tools for finding information on the World Wide Web. Originally there were two kinds of search services on the Web: directories and engines. Search directoriesSearch directories are hierarchical databases with references to websites. The websites that are included are hand picked by living human beings and classified according to the rules of that particular search service. Yahoo! is the mother of all search directories. For obvious reasons we are very fond of the Pandia Plus Directory, which is based on the Open Directory, a catalogue compiled by enthusiasts from all over the world. Directories are very useful when you have no more than a general notion of what you are looking for. The first page normally gives you the most general categories (like "Computers and Internet" or "Education"). Click your way down the hierarchy to the right category, select the website you find the most interesting and start reading. If you use the search form when exploring a directory, remember that you are not searching the text of the actual webpages of a particular site. Instead you are searching the text contained in the site title and the description of the site. These are composed by the directory editors, and are often based on suggestions from the site owners themselves. In addition most directories will also search the words contained in the category titles or descriptions. Note, however, that some search directories may add data from regular search engines if they cannot find matches to your query. We will tell you more about this below. Search enginesSearch engines are -- well -- "engines" or "robots" that crawl the Web looking for new webpages. These robots read the webpages and put the text (or parts of the text) into a large database or index that you may access. None of them cover the whole Net, but some of them are quite large. The major players in this field are Alta Vista, AlltheWeb, Google, Ask Jeeves and Inktomi. Inktomi is not a search site in its own right, but feeds data to Hotbot, MSN and others. AlltheWeb, which is now owned by the American company Overture, is powering most versions of the Lycos portal. Search engines should be your first choice when you know exactly what you are looking for. They also cover a much larger part of the Web than the directories. However, the distinction between engines and directories is not as clear cut as it used to be. All the major search directories will feed you results from a search engine if they cannot find what you are looking for in their own directory. Yahoo is using the search engine Google for this purpose. On the other hand, some of the search engines will serve information from search directories before giving you data from the search engine's database.
Lycos, Google and AOL will give you access to data from the Open Directory. Alta Vista and MSN have hierarchical directories similar to Yahoo and Pandia Plus. These are based on the LookSmart directory. If you cannot find what you are looking for in the Pandia Plus Directory, you may transfer your search to the Pandia Metasearch Engine. You do not have to write your query again, just click! You may find similar features at other search sites. Metasearch enginesThere are also "metasearch" services like Search.com, GO2NET's Metacrawler and our own Pandia Metasearch engine. They search several search engines and directories at the same time, trying to extract the most relevant hits from all of them. You might find it useful to start your searching with one of these, just to get a general feeling for what is out there. The search syntax is problematic, however. It may vary from search engine to search engine, which means that the metasearch engine has to try to "translate" your query into a language that each search engine will understand. More often than not, they will not try to do so. For more complex searches, you should go directly to the relevant search engine. Also note that the metasearch engines will give you but a small part of the results from each individual search engine. Search utilitiesRelated to the metasearch sites are so-called search utilities, programs that you run on your own computer, and that can search the Internet. Most of them work like metasearch services, querying several online search engines when retrieving hits form the Net. This applies to Copernic for the PC, and Sherlock for the Macintosh. These programs may be very useful for simple searching, but if you want to use more advanced query terms you should limit your search to one search service at a time. Search Engine Watch has more information on search programs like these. The best search servicesThere are more than 7 billion documents on the Web. Both AlltheWeb, Google and Inktomi now index more than 2 billion webpages One thing remains true, however: The search engines do not all cover the same parts of the Internet Universe, which gives you every reason to use more than one of them. At the moment we find Google and AlltheWeb to be the best search engines, while the Pandia Plus/Open Directory and Yahoo seem to be the best directories. For metasearching we recommend Vivisimo and Ixquick. However, do try the other search services as well! Some of them may be perfect for your needs.
Fortunately, waiters are an understanding and patient lot. "Certainly, sir. What kind of food did you have in mind? May I recommend the salmon?" Your average search engine is not that understanding. A search for food in Alta Vista brings up 21,652,651 webpages. 21 million pages are just too many to stomach. And, no, the search engine does not try to find out what you're really looking for. Still, a lot of Internet searchers actually ask questions like these: "sport", "books", "news". So, what do you do? You refine your question: "I would like a pizza with pepperoni and ham, but with no olives and no garlic." Here's the good news: If you are able to order a pizza like that, you are able to use advanced "Boolean" searching on the Internet. It's actually that easy!
You have asked for pizza with pepperoni and ham, but without olives and garlic. Here's how your order will look using Boolean operators: pizza AND pepperoni AND ham AND NOT olives AND NOT garlic. A search engine would interpret this Boolean expression in the following way: "The user wants me to show him or her links to all the pages that include the word pizza as well as the word pepperoni and the word ham, but he or she wants me to subtract pages that include the word olives or the word garlic. It isn't poetry, but it is logical and it works. The operator AND means that the word that follows has to be in the text of the pages that are to be listed. Pages including the words following AND NOT will not be listed.
If you suspect that the restaurant is out of pepperoni, you may be a little more open-minded about this, and say: "I would like pepperoni or chicken". In Boolean terms that is: pepperoni OR chicken On the Net an order like this one will give you all the pages that include the word pepperoni, all the pages that include the word chicken and all the pages that include both of these words. What happens if you take out the operators AND, AND NOT and OR and write the following line instead? pizza pepperoni ham olives garlic Most search engines interpret the space between the words as AND. That is, they will give you all the pages that include all these word. But that was not what you were looking for, was it? You are interested in pages that do not include the word olives or garlic, not in pages that have to include these words.
Then again, some engines may interpret the space between the words as OR. This means that they will even give you pages that include only one of these words. You will, for instance, end up with a lot of irrelevant information about the garlic industry. At the moment true Boolean searching is supported by AltaVista and AlltheWeb. It is important to note that in AlltheWeb will only work from the Advanced Web Search page with "Boolean Query" selected from the "Query Type" pulldown menu.
Search engines are useful, but they are extremely stupid. If you ask them for a pan pizza they may not only give you pages on pizza and pan pizza, but also information about the god Pan, Pan flutes, frying pans, Peter Pan, Pan Arabian co-operation and more. You need a way of telling the search engine that pan pizza is an expression or a phrase. For this you use double quotation marks: "...", like this: "pan pizza" AND "Italian pepperoni" AND "black olives" This will tell the search engine to look for pages that include the text string pan pizza, not the word pan in general. Please note that Alta Vista has a database with commonly used expressions that it will interpret as phrases even if you omit the quotation marks.
"Thomas Alva Edison" But this search would not bring you pages where the name is given as Thomas A. Edison or Thomas Edison. You could solve this problem by entering "Thomas Alva Edison" OR "Thomas A. Edison" OR "Thomas Edison" or you could use the NEAR search operator. NEAR means "show me pages where these words are near each other". Thomas NEAR Edison How near is NEAR? That depends. In Alta Vista the words are less than 10 words apart. dogs near/3 cats finds documents in which dog and cat occur within three words of each other, in either order." By altering the number, you can decide how far apart the keyword can be in order to be included in the results. Only Alta Vista Advanced Search allow use of this operator. After it started using the AlltheWeb search engine for its results (June 2000), Lycos no longer supports NEAR (or any true Boolean operator for that matter). AOL also dropped support for true Boolean when it switched to using Google data in 2002. Nesting (Brackets)
"pan pizza" AND pepperoni OR ham AND olives The use of parentheses -- nesting -- will clear things up: "pan pizza" AND (pepperoni OR ham) AND olives This means that you want a pizza with olives, but are uncertain whether you want pepperoni or ham on that pizza. On the other hand: ("pan pizza" AND pepperoni) OR (ham AND olives) means that you have to choose between a pepperoni pan pizza and a dish based on ham and olives. Now you know the basics. Some engines use the expression NOT instead of AND NOT, but if you stick to AND NOT it should work anyway. Write AND NOT in two words. The only exception is Pandia Plus and the Open Directory, actually. Here you have to write ANDNOT in one word. ANDNO, don't ask us why! Most search engines want you to write the Boolean operators in CAPITAL letters. The rest will ignore the difference between upper and lower case. If you use capital letters you are on the safe side. Wildcards (trunication)
In most search engines and directories, a search for dog* will give you pages with all words starting with the three letters dog, including dog, dogs, dogged, doggy and dogma. As you can see, if you were looking for dog and dogs, you will be picking up some unwanted hits. Truncation or wildcards works best when the stem is longer and if the stem is not a root of many other common words. Please note that a lot of search engines "stem" keywords, i.e. they will automatically search for dog if you enter the keyword "dogs" and vice versa.
Now, if you find Boolean operators too intimidating, there is an easier way. This is called simplified search syntax, pseudo-Boolean searching, implied Boolean or (according to Danny Sullivan of Search Engine Watch) "search engine math". It goes like this: +pizza +pepperoni +ham -olives -garlic. Put a plus sign in front of words that must be present on the webpage. A minus sign in front of a word will tell the search engine to subtract pages that contain that particular word. Hence + equals the Boolean search term AND, and - the term AND NOT. In most search engines you can combine the pluses and the minuses with quotation marks, as explained above. However, you cannot use brackets or the OR-operator. Here is one example: +"pan pizza" -olives pepperoni This means that the pages the search engine shows you must include the phrase pan pizza, they must not include the word olives, and they should preferably include the word pepperoni. If there is no sign in front of a word, most search engines will nevertheless read a + sign. The engine reckons that the word should be present . In other words: it will default to AND if it finds no "mathematical signs". Some search engines will nevertheless give priority to keywords that you give an explicit + sign. The main exception is AltaVista, which will interpret the lack of a sign as an OR operator. This will not be the case, however, if AltaVista recognizes your query as a common phrase. The use of the minus sign may have some unforeseen consequences. Imagine that you are looking for webpages that contain information about the Star Wars movie, The Phantom Menace. You would like to avoid pages on earlier movies in order to reduce the number of hits: +"Star Wars" +"The Phantom Menace" -"A New Hope" -"Return of the Jedi" -"The Empire Strikes Back" All the earlier movies in the series are marked with a minus, meaning that pages that include these phrases should not be included in the "hit list". This means, however, that the search engine will subtract all the pages that include these phrases, including pages that have information on all the movies -- A New Hope as well as The Phantom Menace. The information you are looking for may obviously be on one of those pages. Hence you should use the minus sign (or the AND NOT term for that matter) with great care. Please note that there must not be any space between the relevant sign and the word! Write +"Star Wars", not + " Star Wars ". If you want to use search engine math in AltaVista, you must use the simple search form. Avoid using a "-" term as the first one in your query. Write dog -cat, not -cat dog
* In some search services, like Hotbot, Lycos, AltaVista, Yahoo,Google and Pandia Plus, the default is AND. In this case you will have to use OR operator or the relevant option on a pull down menu.
When the search engine robots retrieve information from webpages around the world, they sort the information into various categories or "fields". The main fields that can be accessed in field searching are: Title: This is the text you can read in the bar at the top of the browser window (not the main headline on the webpage itself). The title normally contains important keywords referring to the content of the page. If you restrict your search to the page titles, you will get fewer -- but more focused -- hits. You could for instance search for petunias AND title:gardening. URL: This is the address (the Uniform Resource Locator) of a page, e.g. http://www.pandia.com/. You may restrict you search to pages with addresses that contain a certain word. If you want to restrict your search to the Pandia tutorial, you can do a search like this: "field searching" AND url:pandia.com/goalgetter. Domains: The domain is the unique name that identifies an Internet site. Domain Names have two or more parts, separated by dots. The part on the left is the most specific, and the part on the right is the most general. Cf. yahoo.com and pandia.com. The domain name is normally part of the Web and email address. Some search engines allow you to restrict your search to a specific domain. By doing a domain-search you may for instance restrict your search to pages in a specific country. British pages normally end in the letters .uk. A search for Jaguar AND car AND domain:.uk should give you British pages containing information on the Jaguar car. There are also some top level domains (com, org, net etc.) that are not restricted to specific countries, although they are predominantly American. You can use these endings to restrict your search to commercial (.com), US educational (.edu), US governmental (.gov) or US military (.mil) sites. Dealing with Error Codes... "404 not found"
OK. You find an interesting site in your favorite directory. You click on the relevant link, and -- alas -- get an error code! If you get the message "Document not found" when trying to open a webpage, do not despair. The message confirms that the site exists, and the webpage may still be there. If you look at a Web address like this one: http://www.pandia.com/search/faq.html, you will see that it looks very much like a file address on a PC or a MAC (cf. C:\documents\letter.doc or harddrive:documents:letter.doc). As a matter of fact, an HTTP-address is a file address. http:// tells your browser to look for a webpage; www.pandia.com tells it to look for a server or computer called www.pandia.com; /search/ tells it to look for the directory (or folder) called "search"; and the last part tells it to open a file called faq.html that should be in that directory. However, there is no directory called "search" on this server. You have been given an incorrect or out dated address. There may be a file with information about faq.html higher up in the file hierarchy, though. So here's what you do: Delete the last part of the address until you come to the next "/". Then you are left with http://www.pandia.com/. Then hit "enter" and see what you get. If an address ends with a slash (/), you are not specifying what file the browser should look for. Following the rules of the Internet, however, the browser will then look for a file that is defined as "default" by the server (normally called index.html or default.html). The main webpage or index in any directory is most often named -- you guessed it -- index.html. And there it is, http://www.pandia.com/index.html has a link to the Pandia FAQ. Addresses on the Internet:
The server does not have a DNS entryIf your browser is unable to locate the server (the computer containing the webpage) this could mean that the server does not exist any more. However, it could also be that the server is down for maintenance or that the network is busy. In any case: Try again later. If you have typed the address (URL), do check the spelling! If everything fails, and you get the same error message the next day, you could visit Google at http://www.google.com/, a search engine that keep copies of the indexed webpages on their servers. You may find an old version of the file you are looking for there. Yahoo! has more information on error codes: http://docs.yahoo.com/docs/writeus/error.html Menu Based Searching...
We find menu based search facilities to be more confusing than Boolean searching, and they are not as flexible when it comes to building more complex queries. That being said, menu based pages for advanced searching may be quite efficient, as soon as you get a grip on how they work. (If you do not know a search form from a web address field or a pull down menu from a radio button, please read the absolute beginners text box below first.) A menu based search page will include one (or more) search forms where you enter your search query. The simplest versions will give you one form to enter all your keywords, and a pull down menu that gives various options regarding how these keywords are to be treated by the search engine. Normally these options are:
This type of pull down menus do not give you the opportunity to exclude words (Boolean AND NOT). However, there are some search engines that let you distribute your search terms over several search fields, where each of them has its own pull down menu with options signifying whether this word or these words
See for instance Google's advanced search page. By filling in all the fields you can actually build quite complex queries. It helps to picture each of these forms as separate filters or sieves, one put beneath the other, and each of them filtering out and discarding a certain number of web pages. The search engine pours in all the web pages available and sorts out the pages you do not need on the basis of these filters. For instance, if you tell the search engine that the pages that are to be fetched have to include the word “agriculture”, it will normally filter out all pages that do not include this word (Google makes exceptions to this rule, but that should not concern us here). Most menu based pages for advanced searching also provides other types of filters, predominantly for various forms of “field searching”. For instance, you may limit the search to Web pages that have been made within a certain time period, i.e. you ask the search engine to filter out pages that do not belong to this period. You may also select pages written in a particular language, thus excluding all other languages, or you may look for pages belonging to a certain site (pandia.com) or a certain type of domain (for instance .edu for American educational sites or .no for Norwegian sites), thus sorting out all pages that do not belong to this site or domain. Each “filter” you apply will narrow down your search and return fewer results. You will normally have to experiment to get the optimal results – too many filters and you end up sorting out useful and relevant pages, too few and you end up with too many hits. An example of a menu search form.Here is a version of the Pandia Powersearch Metasearch Engine that may serve as an example of a menu based search form. It combines the use of pull down menus with radio buttons. The first pull down menu (marked The Entire Web) can be used to select what kind of files you are looking for ("Entire Web" for regular Web pages, "MP3" for music files etc.). There are also options for limiting the search to pages from a particular country. This applies only to regular searches for Web pages and will limit results to URLs that ends with that nation's national domain (e.g. .no for Norway). Please note, however, that many sites have domain names ending in .com, .org., .net, even if they are not American. The language option will filter out pages written in the language you are looking for, regardless of domain. Using this form you click on radio buttons (the small circles) to select search terms ("All Words" = AND, "Any Word" = OR, "Exact phrase" = ah, well, "Exact phrase") and the length of the site description given on the result pages. Then there is a separate pull down menu you can use to choose how many results the search engine is to present per page.
2. Use nouns and objects as query words. So-called "stop words" -- common verbs, adjectives, adverbs, pronouns, prepositions like "and, in, or, of" are often ignored by search engines or too variable to be useful (unless they are part of a phrase). Some search engines will let you search stop-words if you put them in quotes or enter a +-sign immediately before them. 3. Be as specific as possible. If you are looking for information on Golden Retrievers, do not search for dogs in general. Avoid common terms like Internet or people, unless they are a part of a phrase. 4. If you do not find what you are looking for, search for synonyms. Use the OR-operator: (dogs OR canines).
5. Check your spelling! Then check it again... 6. Be aware of alternate spellings or alternative words in various forms of English: (colour OR color), (luggage OR baggage) 7. Use at least two keywords in a query. More keywords will give you a smaller and more focused list of hits. 8. Use phrases enclosed by quotation marks in order to reduce the number of results: "may the force be with you". 9. Use the AND or plus operator in order to reduce the number of hits: "may the force be with you" AND "Star Wars" or alternatively +"may the force be with you" +"Star Wars" 10. Normally use quotation marks and capitals when searching for names: "Bill Clinton". There may be several variations of the same name, though: "Bill Clinton" OR "William Clinton" OR "William J. Clinton" OR "William Jefferson Clinton". In cases like these consider using the NEAR-operator (without quotation marks) in Alta Vista and AOL (Bill OR William) NEAR Clinton. 11. Consider truncating words in order to find both singular and plural versions of nouns: watch* 12. Put the main subject first. Search engines often list the pages that match the first keyword at the top of their list of findings. If you want to make certain that the phrases to the left are given priority, you can try putting them in parentheses: ("searching the web") AND (tutorial* OR manual*)
13. State to yourself what you want to find. You might find it useful to write it down on a piece of paper in normal language. Pick out the keywords and use them (and relevant synonyms) in your search query. The question "I want to find information about Canadians taking part in the invasion of Normandy on the D-day of World War II" may give the following query: D-day AND (Canadian* OR Canada) AND Normandy AND ("world war II" OR "second world war") 14. Do not make your queries too complicated. Avoid complex nesting with too many brackets. 15. Consider using field searching to get more relevant hits. Search for instance for words in the titles of webpages: title:"gardening". 16. Use several search services. Not one of them covers more than a part of the Net. That's it! You are an expert! You now know how to use the tools on the MLFCCA Search the Web page. |
The entire site and all contents:
©2001-2008 Minnesota Licensed Family Child Care Association, All Rights Reserved In accordance with Federal law and U.S. Department of Agriculture policy, this institution is prohibited from discriminating on the basis of race, color, national origin, sex, age, or disability. To file a complaint of discrimination, write USDA, Director, Office of Civil Rights, 1400 Independence Avenue, SW, Washington, D.C. 20250-9410 or call (800) 795-3272 or (202)720-6382 (TTY). USDA is an equal opportunity provider and employer. |