|
Natural Language Interaction with a
Construction Estimating
Virtual Reality Environment
By
Blake Howe
Submitted to:
Dr. Sulbaran
Table of Contents
0- Abstract
1- Problem Statement
2- Project Objective
3- Methodology
3.1 Approach
3.1.1 Architecture
3.1.2 Documentation
3.1.3 Testing Plan
3.1.4 Storage of data
3.1.5 Templates
3.1.6 ATN parser
3.1.7 Decision trees
3.1.8 Decision making process
3.1.9 Operating System
3.1.10 Speech
3.1.11 Customizing for the web
3.1.12 Language
3.1.13 Object-oriented
3.2 Milestones
4- Cost and Materials
5- Anticipated Results
6-Conclusion
7-References
0 - Abstract
The users of Virtual Reality (VR) environments interact
with them through pointing and clicking methods. However,
there is a lack of capability in the VR environments
to allow the users to interact using a natural language
(conversation). The objective of this project is to
create a piece of middleware that will allow users to
communicate with VR environments through natural language
(conversation). The software developed will have a degree
of intelligence and making queries as close to natural
language as possible. This will be achieved by applying
decision tree logic and a series of algorithms for natural
language processing to an avatar within a VR environment.
This software will be linked with a VR environment focusing
on Construction Estimating. Thus, It is anticipated
that the construction management students will be able
to communicate with the VR Environment using a natural
language (conversation) to gain a better understanding
of quantity estimating. In turns, the students and faculties
will have a new tool that could enhance the educational
experience.
1 - Problem Statement
Virtual Reality Environments provide a high level of
flexibility. However, most of the users' interaction
is done through mouse and keyboard. The problem lies
in the fact that there is a need for an interface that
will allow the user to communicate in a natural manner
(conversation). According to Warren C. Couvillion Jr.
a senior research engineer in the Advanced Interactive
Technologies Department of the Training Systems and
Simulators Division, as VR's become more immersed realistic
interfaces will become increasingly important [Couvillion
2001]. The author of this proposal is certain that speech
is the most realistic interface that it could be. Because,
conversation is an integral part of human daily life
and realistic interface. W. Couvillion indicates that
projects that increase realistic interfaces are worthwhile
.
There are similar projects underway that are trying
to solve educational issues by applying intelligence
and VR. The CRAIG project and the ISIS-Tutor will be
discussed further to provide the background for this
work.
The CRAIG project is a knowledge-based system that
doles out degree requirement information at Youngstown
State University that was undertook by a team of five
Information systems undergraduates. This system makes
an attempt at intuitive interface but falls short at
not including any type of environment that will immerse
the user or allow them to utilize the most natural form
of communication. What was accomplished was some work
on natural language processing that was based on template
and frame grammar. (Hughes 2002). The virtual tutor
will distinguish itself by residing with a VR environment
and allowing the users to make their requesting using
speech.
Secondly the ISIS-Tutor ( Brusilovsky 1989) is an intelligent
tutoring system that originated from the State University
of Moscow. This system was so successful that it has
become a generic term for intelligent tutoring systems.
This tutor is broke into several different modules.
These modules are the domain and student modules, intelligence
model, and hypermedia model. It incorporates several
good ideas such as modularity and a form of intelligence.
The intelligence scheme is similar to the Virtual Estimator
in the fact that it allows students to explore existing
areas of knowledge as well as add train the system.
The problem with this application is that the interface
is menu driven and the "hyper-media" seems
to be more of a chat room with navigation than a VR.
This application could be considered the next generation
of ISIS. My research has lead to an opinion that a menu
driven interface limits the learning capability.
2 - Project Objective
The objective of this project is to develop a middleware
application to enhance the educational experience of
construction engineering students. The project will
develop an avatar that will be embedded in a Virtual
Reality environment for construction estimating. This
avatar will be capable of understanding natural language
and answering question to the students. This questions
will focus on construstion materials and methods. NILCE
will provide a venue for real time interactive 24-hour
help to students and ease the burden on the professors'
time. Eventually NILCE will evolve into a useful resource.
3- Methodology
The methodology for this project is present in two
main components. The first component is the approach
to accomplish the objective of the project. The second
component is the milestones that will achieved during
the development of NIILCE.
3.1 Approach
The approach consists of the considerations that will
be taken during the process of developing the avatar.
This list of considerations is by no means inclusive
and are subject to change during the development of
NILCE and further research and development leads.
3.1.1 Architecture
This application will be developed utilizing client-server
architecture. The reasons this type architecture was
the cost and flexibility it provides. With today's low
cost processing power and cheap connection process a
client-server application is by far the most cost efficient
and scalable. (Peter C Patton )
3.1.2 Documentation
Meticulous documentation will be kept throughout the
development of this project. There will be a on line
Project notebook kept with all decisions, bugs. and
reasons thereof. These notes will be kept on my domain
accessible 24 hours a day at www.codequest.net. Also
there will be a checklist of milestones and the percentage
completed publically accessible along with any available
source code. At the end of the project JavaDocs will
be used to create an HTML page with instructions on
how to use the different packages in NILCE.
3.1.3 Testing Plan
Testing will be on this project as soon as the first
module is completed. All preconditions and postconditons
for each module will be kept and incorpated into the
battery of tests. Each component (module) of this project
will be tested extensively for functionality and correctnesswith
a series of "black box tests". The tests will
consist of both good data and malicious input to try
and uncover the flaws before NILCE enters beta testing.
After each milestone is reached a system test will be
performed on all of the intergrated components to ensure
they are working properly. These tests will be a predetermined
growing set of tests that will be ran through all system
components upon completion. These tests will check for
common mistakes such as good values of different types
(i.e. Correctly spelled), boundary conditions, values
outside of max, delimiter problems (different forms
of delimters included), mixed case (hello, Hello, HeLlo),
input is too long for string, input has white space
or other delimiter, first element added/removed, middle
element added/removed, last element added/removed and
mispelled words.
After the simple website for NILCE is completed it
will be open for beta testing by students. This will
uncover a myriad of flaws as no other type of testing
can. As the logic of NILCE is being tested work on incoprating
it into a VR enviorment and adding the speech aspects
will contine.
3.1.4 Storage of data
The data will be stored in a Mysql database within
a series of nested tables. The decision trees will be
serialized and stored in the database Tree objects in
java can be easliy stored and retrieved from a relational
database by utilzing the type blob which is an array
of bytes. (Nerjova 2000)
The tables will be set up in the following fashion:
EX.
The above boxes represent tables in the database. There
will be a broad grouping of "HOW" tables.
Within the HOW table there will be a table for OBJECTS
(STUD in the above example) that need representation
such as walls, floorplans, roofs, etc. Within the STUD
table will be the various trees holding information
on that subject such as length, width, and the type.
The reason Mysql was chosen as the database is that
they are portable from Windows to Linux with little
modification and a fully functional free version is
available for no cost. (MySql Database Server 2003).
3.1.5 Templates
Templates will be used to catch common phrases. They
will be implemented in the following fashion.
(The X represents a variable that can be anything)
How do I get the X of a X?
This template could represent many questions.
How do I get the length of a stud?
How do I get the width of a wall?
How do I get the peak of the roof?
(Please note words such as this, a , of, the will actually
be ignored by the parser will concentrate on the question
(HOW), the qualifier (LENGTH), and the NOUN(STUD). The
technique used will be further qualified below.)
These common templates will be loaded into memory in
the form of a hash table. If a common question is asked
such as 'How do I get the length of a stud" 'then
the key will be the question and it will be pointing
to the tree containing the information.
Templates where chosen as one method of processing
because of their ease of implementation and a proven
track record with the C.R.A.I.G project from the University
of Wisconsin (Hughes 2002)
3.1.6 ATN Parser
ATN parsers are a technique developed W.A Woods in
the 1960's. They are used to recognize sentences as
their individual parts nouns, verbs, adjectives and
adverbs. (Watson 2002) There are still some difficulties
with recognizing the deep structure of input but these
can be overcome by the filtering of the input text using
some hard-coded rules.
The design of this parser will be based upon the world
net database established by Princeton University. (Miller
2002) This type of parser was chosen because of the
ease of implementation and the amount work of open source
work that has already been done already. (Deitel &Deitel
1999) This project will only use a subset of the world
net database customized for engineering problems.
3.1.7 Decision trees
The Virtual Estimator will use the simplest type of
logic to begin with. Decision trees were chosen because
of the wealth of information available on the subject
and ease of implementation. (Savitch 1990 pg. 354) (Tanimoto
1995 pg. 387)
3.1.8 Decision Making Process
The decision making process will be made primarly by
the user answering whether or not the question was asnswered
to their satisfaction. If there is need for further
definition the users will be prompted to enter a series
of yes or no questions to clarify the question which
will then be logged for further review by the administrator.
3.1.9 Speech
This application will utilize IBM's Via Voice to handle
the speech recognition. Via Voice was chosen for several
reasons the first and foremost is the immense amout
of work that will be saved by allowing 3rd part software
to handle the speech recognition apspects of NILCE.
Also there is an extensive API provided with Via Voice
for interfacing with Java (IBM). This will be invaluable
asset in getting the application to run over the web.
3.1.10 Customizing for the web
The application will use a VR avatar named NLICE (Pronounced
Nil-cee) that will be accessible over the Internet.
For testing purposes a pre-made avatar will be used.
Students will be able to log onto the website and ask
questions regarding Construction projects in real time
using voice communication through any browser with the
correct plug-ins installed. Help files and information
on installing the plugin's will be provided on site.
3.1.11 Operating system
This system will run on any platform with little modification
but will be developed on the Linux platform. The foremost
reason for choosing the Linux platform was the cost
and the availability of free tools to aid in development.
3.1.12 Language
This application will be developed in Java. Java was
chosen because of its object-oriented nature and the
fact that it will run on any platform with little modification.
(Deitel &Deitel pg. 18 1999) This will be an application
not a java applet. The reason for this decision is that
applets limit what you can do as far as accessing files
on the machine it is running on. There are ways around
this such as digital signing but the cost is prohibitive.
(Code Signing Digital Id's 2003)
3.1.13 Object-Oriented
NILCE will be developed using an object-oriented methodology.
This methodology has many advantages over its counterparts.
First and foremost is the promotion of code reuse. According
to Chuck McManus the power of object oriented programming
lies in the fact that code is designed for reuse. (Patton
pg. 1 1999) Any individual class of this system will
have the ability to be imported into other applications.
Another reason for choosing an object-oriented design
is the strong encapsulation of objects and information
hiding. This means that components of the application
can be interchanged with ease. This is supported by
the definition of encapsulation from Wikipedia. The
definition on Wikipedia states, "In computer science
and object-oriented programming, encapsulation or modularity
refers to how objects contain data. Encapsulated code
can generally be rewritten without any need to rewrite
the encapsulating code". (Wikipedia 2003)
3.2 Milestones
Phase 1 - Research
Determine architecture (March 1 - March 5)
Decision decision logic (March 5- March 10)
Methods of NLP (March 10 - March 15)
Methods of implementing speech (March 15 - March 20)
Options for documentation (March 25 - March 30)
Phase 2 - Proposal
Abstract (March 31 - April 5)
Problem Statement (April 5 - April 10 )
Objective (April 10 - April 15)
Methodology section (April 15 - April 20)
Milestones section (April 20 - April 25)
Cost and Materials (April 25 - April 30)
Conclusion (April 30 - May 5)
References (May 5 - June 1)
Phase 3 Acceptance of proposal (June 1, 2003)
Phase 4 Foundation classes
Parsing engine (June 1 - June 10)
Template Grammer (June 10 - June 20)
ATN Parser (June 20 - June 30)
Monitor class (July 1 - July 10)
Driver for NLP (July 10 - July 21)
Phase 5 Installation of Mysql and driver (July 21 -
July 25)
Phase 6 Database Interface
Create database (July 26)
Open connection (July 27)
Close connection (July 28)
Access Node (July 29 - July 31)
Update Node (August 1 - August 5)
Store Node (August 5 - August 10)
Phase 7 Decsion trees
Creation of tree (August 10 - August 20)
Update tree (August 20 - August 25)
Add tree (August 25 - August 27)
Delete tree (August 27 - August 31)
Phase 8 NLICE
web page for NLICE (September 1 - September 10)
Intergration of foundation and database classes (September
10 - September 20)
Testing of NLICE DB and logic (September 20 - September
31)
Design of Virtual World (October 1 - October 10)
Avatar added (October 10 - October 15)
Via Voice installation (October 15 - October 20)
Speech sent to server (October 20 - Octover 25)
Speech played back to user (October 25 - October 31)
4 - Cost and Materials
The cost of this project will be minimal. All of the
source code will be either written or modified from
open source projects. The only initial costs will be
acquiring a copy of Via Voice and the setting up of
a test server. Depending on the type of server and the
method of acquisition of Via Voice the budget for the
project should fall under $100.
5- Anticipated Result
By the end of this project my contribution to the
next generation of VR Applications will exist. This
will be part of a transformation from what is being
done today in the realm of education with VR to what
could possible be done tommorow. This middleware application
will allow teachers and students to work together to
enhance the educational experience. The teachers will
have an alternative that will allow students help outside
of the classroom that can be closely monitored by an
expert in the field (thierselves) and the students will
have a new medium for computer based help that will
not require a intricanate knowledge of computers.
6 - Conclusion
Virtual reality presents tremendous possibilities in
the realm of education. The environment as presented
here offers an ideal solution to the problems with traditional
Virtual Environments by performing a duel role as both
tutor and student and allowing the user to communicate
in a way that is familiar to them. (conversation)
This project should be considered nothing more than
an open door into the endless possibilities of Virtual
Reality. Hopefully the solid development practices and
meticulous documentation will provide a good foundation
for its continued development by any developers that
wish to pursue the 21st century application.
References
Deitel &Deitel (1999)) Java How to Program Upper
Saddle River, New Jersey
Miller , George A. (2002) World Net A lexical database
for the English language retrieved May 1, 2003 from
Cognitive Science Project Princeton University 221 Nassau
Street Princeton, New Jersey http://www.cogsci.princeton.edu/~wn/
Watson, Mark (December 9, 2002) Practical Artificial
Intelligence in Java retrieved April 1, 2003 from http://www.markwatson.com/opencontent/
Savitch , Michael Main Walter (1990) Data Structures
and other objects using C++ Addison Wesley Longman Library
of congress
Hughes, Cameron A (December 2002) FRAME AND TEMPLATE
GRAMMAR retrieved April 1, 2003) from C.R.A.I.G Project
Website Youngston State Universtiy http://www.cc.ysu.edu/~cahughes/craig_overview.html#frame_template_grammar
Bigus, Joseph P (2003) Constructing Intelligent Agents
with Java Wiley Computer Publishing
Tanimoto, Steven L (1995) The elements of Artificial
Intelligence using Common Lisp WH Freeman and company
41 Madison Ave, New York
Couvillion, Jr Warren C. . (2001) Navigating Virtual
Worlds Technology Today, published by Southwest Research
Institute http://www.swri.edu/3pubs/ttoday/fall01/navigate.htm
Patton, Peter C (1999) Recombinations: Client/server
computing in the 1990s retrieved May 1, 2003 from Pennstate
Printout Pennstate University http://www.upenn.edu/computing/printout/archive/v08/5/clinserv.html
Paton, Peter C Code reuse and object-oriented systems
Java World
http://www.javaworld.com/javaworld/jw-12-1996/jw-12-indepth.html
Nerjova Mar 25, 2000 9:59 PM Serialization to a database
Message posted to Java Forums archived at http://forum.java.sun.com/thread.jsp?forum=62&thread=131609
MySql Database Server (2003) retrieved May 10, 2003
from the MySql Homepage http://www.mysql.com/products/mysql/index.html
IBM (2003) Speech for java API retrieved April 1, 2003
from IBM homepage
http://www.alphaworks.ibm.com/tech/speech)
Brusilovsky,Peter (1989) ISIS-Tutor An Intelligent
Learning Environment for CDS/ISIS Users Dept. of Cognitive
Psychology, University of Trier http://cs.joensuu.fi/~mtuki/www_clce.270296/Brusilov.html
Code Signing Digital Id's retrieved April 11, 2003
from the Verisign - Security Center website http://verisign.netscape.com/developer/
Encapsulation (Object-Oriented Programming) retrieved
May 1, 2003 from Wikipedia http://www.wikipedia.org/wiki/Encapsulation_in_object-oriented_programming
Middleware (defintion) retreived April 2, 2003 from
www.Whatis.com
http://searchwebservices.techtarget.com/sDefinition/0,,sid26_gci212571,00.html
Encapsulation (Object-Oriented Programming) retrieved
April 3, 2003 from Wikipedia http://www.wikipedia.org/wiki/Encapsulation_in_object-oriented_programming
|