Introduction to
Google
Intro
I'm focusing on Google not
because it has (or doesn't have, who knows any more!!!) the
largest databases of downloaded Web pages and other pages. Rather,
I'm focusing on this search tool because the company has,
generally, been the most proactive in expanding the range of
information retrieval tools and resources of any search engine
vendor in the world. Look at its book digitization and Google
Earth projects two of an ever-growing list of examples.
So, it really is amazing what
Google has accomplished since it has only been around since 1998,
though Sergey Brin and Larry Page actually began development of
the technology in 1995. See the Google History
(http://www.google.com/corporate/history.html)
Web page for a full discussion of the evolution of the company's
technology and the company history.
Pagerank ranking
technology
What was particularly
important about the search engine technologywas the way of ranking
results of a search. (Though Google is, apparently, moving from it
because of the endless attemps to manipulate it by marketing
companies practicinig "search engine optimization" strategies)
Specifically, Brin and Page developed a "link analysis" algorithm
("pagerank") which ranked the pages resulting from a search
according to what other and how many Web pages and sites linked to
a particular Web page. Part of the ranking involved an assessment
of the repute of the other Web sites/pages and assumes that more
respected Web pages will link to other quality Web pages.
Example: You do a search
on "medical ethics." Pageranking usually brings up the following
Web sites in the first five results:
- Journal of Medical
Ethics
- BioMed Central | BMC Medical
Ethics
- AMA (Ethics) Principles of
Medical Ethics, June 2001
A general introduction to
Google's search engine technology is at: http://www.google.com/corporate/tech.html
Combined with
conventional ranking
- Part of Google's ranking also
involves traditional statistical ranking in which the number and
location of occurrences of the search terms that you entered is
evaluated. Google calls its version, "Hypertext-Matching
Analysis."
- Factors that are evaluated
include:
- The frequency of occurrence
of the entered terms on Web pages. The higher the number the
higher in the returned records candidate Web pages are
ranked.
- The location of the search
terms in the Web pages. Higher ranked pages include ones where
the search terms occur in the title and towards the top of the
text in the Web pages.
Major Google
Tools
The table below is a part
of a my home page, which consists of a collection of tools and Web
site that I use frequently, not just Google sites. Another
section of my home page has a variety of Yahoo tools plus several
other search engines, indexes, and links to databases and other
Web sites that I often use. I wanted you to see it to get an idea
of the range of tools that Google offers.
I'm not going to discuss all of
the Google tools above, but I want to focus on a few below. The
ones in RED
are ones that you might not be aware of but which I think you
should try, as they offer either important capabilities or access
to information from other sources than pages intended specifically
for the Web. Adn Google keeps developing new tools, most currently
in beta. Take a look at Google's "Services and Tools" aka
"More, More, More" page (http://www.google.com/intl/en/options/index.html)
and Google Labs (http://labs.google.com/)
to get an idea of what is coming down the pike.
- Basic Web Search
- This is the basic,
one-entry-field interface that you see when you go to
http://www.google.com.
- You can do a lot with it, as
you can see in the separate page that I have created on
it.
- Advanced Search
- This is your
multi-entry-field interface, which fields provide you the
equivalent of "ANDing," "ORing," and "NOT" along with a variety
of useful limits. This is also discussed in a separate
page.
- http://www.google.com/advanced_search?hl=en
- Web
Suggest
- This takes the Basic search
page plus suggestions for terms and phrases that incorporate
the term or terms that you entered. It is very useful,
particularly when you are not sure of what additional terms
that you might want to add. Be sure to try it.
- http://www.google.com/webhp?complete=1
- Proximity
Search
- Google released its
"Application Programming Interface" or "API" to allow other
parties to build tools that interface with and utilize
Google.
- One of these is
StaggerNation's "Google API Proximity Search" or "GAPS" which
adds a very limited (within 3 words) but sometimes useful
proximity search capability to Google.
- http://www.staggernation.com/cgi-bin/gaps.cgi
- Soople
- Soople is a third-party API
that primarily takes a lot of Google interface functionality
and puts it on one page.
- http://www.soople.com/
- Google
Scholar
- Google Scholar is basically
a limiting search, as it focuses on finding scholarly
literature that contain your search terms. This literature
includes peer-reviewed papers, dissertations and theses, books,
and other reports for all types of research. Typically, these
are located on Web at sites created by universities, research
institutions (academic, independent, commercial, and
government), academic publishers, professional societies, and
other Web sites.
- This is one of the most
extensive gateways to scholarly literature on the Web and is
one of the leaders in a move to free access to quality academic
literature.
- http://scholar.google.com/advanced_scholar_search
- Google
Print
- Google's controversial
digitization project is intended to put a phenomenal range of
book content on the Web. To that end, the company is working
with both publishers and major academic and public libraries to
digitize all of their contents.
- When you search Google
print, you will typically be able to access only a few pages
or, even, just the bibligraphic data for a book, as copyright
and income (to publishers and authors) issues are being hotly
debated.
- Nonetheless, it is the
beginning of a major trend to the provision of almost any kind
of information on the Web.
- It is also a very useful
tool for finding materials that have not yet appeared in
journal form or on the Web in any other format.
- http://print.google.com/advanced_print_search
- Google
Groups
- Hundreds of millions of
messages from the UseNet archive of newsgroups or discussion
forums are in this database. Sometimes such messages can be
very good sources of information, but you have to carefully
evaluate their content because they are mostly personal
messages from discussion groups.
- These messages date back to
the early 1980s because the Google "groups" database is
actually composed of the old DejaNews database of newsgroup
messages plus all the newsgroup content gatherd by Google since
the acquisition of DejaNews. For example, there is an August
1991 message from Tim Berners-Lee that discussed the HTTP
protocol that he and his team developed that is the technology
that is the basis of the Web.
- ou can find some very good
information here, though you have to take it
- http://groups.google.com/advanced_search?hl=en
- Google
Alerts
- Why go to the mountain, when
the mountain can come to you? Google Alerts is a configurable
alterting service that enables you the tell Google what
information you want to be sent the next time the Google
webcrawlers pick up new resources on your subject of interest
on other Web sites (your choice or both). I created dozens of
Alerts at different times, and eliminate them when I don't need
or want them any more.
- http://www.google.com/alerts
- Google Local
Search
- This is an interesting
search. Most people will use it for finding restaurants and
other retail businesses in a local area (you can enter city and
state or zipcode), but you can do some interesting research
with it. For example, if I am looking for manufacturers of
graphics cards in Santa Clara, I could search on:
manufacturer graphics-cards in the "What" field
and 95051 in the "Where" field. I would turn up
such companies as Nvidia, manufacturer of some of the best
graphics cards.
- http://local.google.com/
- News search
- For up-to-the minute (well,
maybe 15 minutes) news, this is one of the best tools available
online.
- The News search accesses and
indexes about several thousand different online news
sources.
- More important news sources
(such as the news network sites and the news syndicates) are
downloaded as often as every 15 minutes.
- The Advanced News search
offers Boolean and phrase-searching capabilities by specialized
entry fields.
Some of Google's
Other Tools & Operations
- Froogle search
- This search functions like a
"shopping bot" in that you put in the name of a product (the
more precise a name the better), and the search engine returns
a list records showing product images, prices, and brief
descriptive excerpts from the Web pages.
- The problem is that Froogle
is often confused by multiple products and prices on the same
Web page. None the less, it is a very useful tool for finding
low prices on a variety of products.
- Alternatives to Froogle
include shopper.com,
mysimon.com,
bizrate.com,
and pricegrabber.com.
- http://froogle.google.com/froogle_advanced_search
- Google Labs
- The "Skunkworks" of Google
(Who knows the reference?), Google Labs is worthwhile looking
at to see what new information products are emerging from the
creative bastion in Mountain View, CA.
- http://Labs.google.com
- Image search