Design of mobile augmented reality game based on image recognition

Zhang, Boping

doi:10.1186/s13640-017-0238-6

Research
Open access
Published: 20 December 2017

Design of mobile augmented reality game based on image recognition

Boping Zhang ORCID: orcid.org/0000-0001-7835-7622¹

EURASIP Journal on Image and Video Processing volume 2017, Article number: 90 (2017) Cite this article

5468 Accesses
10 Citations
1 Altmetric
Metrics details

Abstract

The work studied a complete set of development system for mobile augmented reality game based on combination of AR technology and RTS game. An effective image recognition strategy was proposed through SIFT feature matching algorithm. Integrated with cloud image recognition module, the response speed of image recognition module was improved by eliminating error matching point. Through optimization of the scheme and existing technologies, the improved image recognition algorithm was applied to the game system, realizing RTS game system on mobile intelligent terminal. The experimental results showed that the proposed method meets the requirement of augmented reality system in terms of efficiency and accuracy. In addition, the demand of users for information expansion can be satisfied by the method to some extent, which is more applicable than traditional augmented reality.

1 Introduction

Augmented reality (AR) is a three-dimensional scene where virtual objects are superimposed on real scene. In such a scene, virtual objects can be quickly generated, manipulated, and rotated to enhance users’ understanding of the real environment [1, 2]. As an extension of simulated real technology, AR integrates emerging technologies of computer graphics, computer vision, image processing, sensor technology, human-computer interaction, and photoelectric display. The capabilities of intellectualization and information processing for mobile terminals are gradually enhanced with the development of wireless mobile network technology and increasing bandwidth. The users frequently access the Internet through mobile intelligent terminal. Multi-sensor equipment on mobile phone has provided hardware foundation for application of AR. In terms of software support, companies such as Microsoft and Qualcomm continually introduce various kinds of SDK with intellectual properties through innovative research and development. Such hardware and software development lays a guarantee for hierarchical and personalized service of mobile information https://tieba.baidu.com/p/4143666840.

Mobile augmented reality is becoming a hotspot in the field of game development. In 1997, Feiner et al. [3] developed MARS, the world’s first mobile augmented reality system, which is mainly used in navigation technology. In 2000, Tomas et al. [4] released AR-Quake game, an extension of the Quake game on the PC. It is very popular at that time, allowing games to be played indoors and outdoors. Cheok et al. [5] released HumanPacman in 2003, which is a mobile interactive entertainment system equipped with GPS and inertial sensors for positioning and visual sensing, as well as a touch-controlled human-computer interface. Schrier of the Massachusetts Institute of Technology designed a mobile augmented reality educational game called “relive the war of independence” [6]. Wherein, the playing scene is settled in Lexington, the starting place of American revolutionary war. Navigated by PAD with GPS, the participants explore Lexington battle-related public green area and other buildings. Besides, PDA overlays to display the virtual history characters and artifacts, as well as visual audio materials when they arrive at the target location.

Ingress is a mobile augmented reality game released by Google in 2012 [7]. In this game, a group of European scientists finds some mysterious energy with unknown sources and usage, and some researchers think that the energy should be controlled or it will enslave the human beings after being affected mentally. The camp is divided into two groups named “Enlightened” and “Resistance”—the former trying to accept the energy as a gift to humanity; the latter striving to resist and protect the rest of our resources and wealth. The game players, called “Agent” in the game, fight each other to control the strongholds such as landmarks or sculptures in the real world.

Pokemon Go, an augmented reality (AR) RPG mobile game in pet nurturance https://www.nintendo.com/, jointly developed by Nintendo, Pokemon Company, and Niantic Labs Company of Google, is welcomed globally. It has won five certificates by Guinness World Record in August 2016. However, Pokemon Go has not been released in mainland China. It indicates that AR game has not grown into a mature industry at present, where both opportunities and challenges exist.

In the work, AR technology and RTS game (Real-Time Strategy Game) were combined to study a complete development system of mobile augmented reality game. A mobile AR game named LastStandStan was developed to improve the speed of instant image recognition for clients. Proposing an improved SIFT feature matching algorithm integrated with cloud image recognition module, the response speed of image recognition module was improved by eliminating error matching point. Through optimization of the scheme and existing technologies, we applied the improved image recognition algorithm to the game system, realizing RTS game system on mobile intelligent terminal. The experimental results prove that the proposed method meets the requirement of augmented reality system in terms of efficiency and accuracy. In addition, the demand of users for information expansion can be satisfied by the method to some extent, which is more applicable than traditional augmented reality.

The work included the following three aspects:

1.
To implement RTS game system on mobile intelligent terminal, the system architecture based on C/S was designed, divided into the cloud subsystem and the mobile terminal subsystem.
2.
In order to improve the matching speed of image, the optimal value of SIFT algorithm was found by searching on the traditional SIFT (scale-invariant feature transform) algorithm through lots of experiments, allowing user to set the optimal parameters manually. Under the environment of Easy AR, the improved SIFT algorithm is exported as a package running on the cloud server in case of designing games, with its main function of image recognition by SIFT algorithm to improve the matching speed of the random images.
3.
In the work, a game named LastStandStan has been developed as an application of the improved SIFT algorithm, which can run on Android phones currently.

2 Overall structure of game system

At present, the commonly used image recognition methods in the augmented reality games are mainly based on special objects and image natural feature points [8]. Method based on special objects has great limitation in applications for it requires additional special markers on target image, affecting user experience to some extent. Moreover, it is difficult to identify the target image in the presence of partial shade on special markers. The method based on image natural feature points does not need additional markers; thus, it is more often to be used in the field of augmented reality with its concise and flexible way of usage. There are mainly two ways, the method based on partial image characteristics/machine learning [9]. Traditional SIFT algorithm based on partial image characteristics has high matching precision and a better robustness for image reverse, illumination, and perspective change. Meanwhile, it produces large amount of calculation data stored in the mobile terminal, causing a larger burden due to the limitation of hardware equipment, such as mobile processor and memory. It fails to meet the requirement of real-time application in augmented reality system [10].

In the work, a C/S architecture-adopted RTS system is built based on cloud image recognition. The STIF classifier data can be stored in the server by C/S architecture technology, thus effectively reducing the memory load in mobile terminal and improving the response speed in system. Figure 1 shows the framework of augmented reality game system [11].

The RTS game system on the user’s intelligent terminal establishes socket communication with the cloud server through wireless network such as 3G, 4G, and WIFI, realizing the data exchange between application and cloud server. After RTS game system uploads the currently identified image to the cloud, the recognition system retrieves and identifies the image from database. If the current image exists in the database, the RTS game system will download the enhanced information associated with the image to local storage via the cache mechanism. Finally, the interface of the smart phone is designed to show the related enhancement information of the recognition scene, and the rendering engine of Unity3D is used to draw the 3D model in real time, which is divided into subsystems of cloud management and intelligent terminal management. In terms of cloud management subsystem, the first step is to acquire image through the input device. After pretreatment, the scene is checked whether a target that needs display augmentation exists. If the target exists, 3D registration is conducted. If not, the automatic tracking program is started. After tracking, 3D registration is performed and then stored in information database. Meanwhile, an augmentation information database is established to store 3D model as well as text, video, and audio information. The information index is built to improve retrieval speed. As the basis of image recognition, such information can provide sample training and quick identification. The intelligent terminal management subsystem obtains augmented 3D model information of the cloud through image recognition. Virtual information is displayed on corresponding coordinate positions through image fusion.

3 Methods of SIFT image recognition

3.1 Basic principles of SIFT image recognition algorithm

SIFT algorithm has characteristics of similarity and unchanged rotation. When structuring features, multiple details need to be focused and processed to ensure faster operation and more intensive positioning [12]. Figure 2 shows the flow block diagram of SIFT algorithm.

Generation processes of local descriptive features include [13]:

1.
Detecting extreme points: Gaussian differential function is used to search the image and identify potential fixed points at all scales.
2.
Positioning key points: The scale of candidate position is determined through the model. The degree of stability determines the choice of key points.
3.
Determining orientation of key points: Using gradient-orientation histogram, each key point is assigned with an orientation at the highest gradient, thus determining main orientations of key points.
4.
Describing key points: Calculating local gradients of the image, a symbol is used to represent each gradient.

3.2 Key point detection

1.
Scale-space theory

Scale space, proposed in the middle of the twentieth century, is defined after development as follows: Introducing a scale parameter into processing model, the continuously changing scale parameters are used to obtain and propose expression sequence of scale space. Then, principal contour of scale space is formulated as feature vector, extracting features such as edge [14].

Scale-space method means that the scale image becomes increasingly blurred when the scale becomes larger, thus simulating formation process of the target in the human eye retina.

2.
Expression of scale space

Scale space of the image is expressed as Formula (1).

$$ L\left(x,y,\upsigma \right)=G\left(x,y,\upsigma \right)\ast I\left(x,y\right) $$

(1)

where G (x, y, σ) is Gaussian function, I (x, y) the original image I (x, y), and * the convolution operation.

$$ Gx,y,\upsigma =\frac{1}{2{\uppi \upsigma}^2}{e}^{-\left({\left(x-d/2\right)}^2+{\left(y-b/2\right)}^2\right)/2{\upsigma}^2} $$

(2)

where d and b are the dimensions of Gaussian template; (x, y) is the pixel position; σ is the scale space factor.

3.
Constructing Gaussian pyramid

Operations of this process include Gaussian blur and down sampling (See Fig. 3).

The pyramid is characterized by different sizes and tower-like model with increasingly smaller sizes from bottom to the top.

The tower is realized as follows: Original image is used in the first layer, while new image obtained from down sampling in the next layer. There are n layers in each tower. The number of layers is calculated as follows.

$$ n={\log}_2\left\{\min \right.\left.\left(p,q\right)\right\}-d\kern0.75em d\upepsilon \left[0,{\log}_2\left\{\min \right.\left.\left(p,q\right)\right\}\right] $$

(3)

where p and q are the sizes of original image; d is the logarithmic value of minimum dimension of the image on top of the tower.

4.
Gaussian difference pyramid

According to previous studies, after the scales of maximum and minimum values of Gaussian Laplacian function σ² ∇ ² G are normalized, their results and other feature extraction functions such as gradient can generate the most stable image feature. Then, Gaussian difference function is developed, which is approximated by the scale-normalized Gaussian Laplacian function σ² ∇ ² G. The relationship is described as follows.

$$ \frac{\kern0.5em \partial G}{\partial \upsigma}={\upsigma}^2{\nabla}^2G $$

(4)

Difference approximately replaces differential:

$$ {\upsigma}^2{\nabla}^2G=\frac{\partial G}{\partial \upsigma}\approx \frac{G\left(x,y,k\upsigma \right)-G\left(x,y,\upsigma \right)}{k\upsigma -\upsigma} $$

(5)

then

$$ G\left(x,y,k\upsigma \right)-G\left(x,y,\upsigma \right)\approx \left(k-1\right){\upsigma}^2{\nabla}^2G $$

(6)

where k-1 is a constant.

As shown in Fig. 4, the red is DoG operator curve, while the blue is Gauss-Laplacian curve. Extreme value is detected by replacing Laplacian with DoG operator [15].

$$ D\left(x,y,\upsigma \right)=\left(G\left(x,y,k\upsigma \right)-G\left(x,y,\upsigma \right)\right)\ast I\left(x,y\right)=L\left(x,y,k\upsigma \right)-L\Big(x,y,\upsigma $$

(7)

In the calculation, Gaussian difference image is obtained by subtracting the upper and lower layers of Gaussian pyramid of each group (see Fig. 5).

5.
Space extreme point detection

Local extreme points constitute key points in the space of Gaussian difference. In search for key points, the images between two adjacent layers in the same group are compared. Then, each pixel point is compared with all the neighboring points around it to judge its size. Figure 6 shows that the red intermediate detection point is compared with corresponding 26 points in surrounding as well as up and down scale space, thus ensuring that the extreme points can be detected.

In the right image of Fig. 5, N + 2 layers of DoG pyramid and N + 3 layers of Gaussian pyramid are required if there are N extreme points in each group. Due to influence of factors such as edge response, extreme points generated in such a case are not all stable.

Key points include location, scale, and orientation. To maintain invariance of view angle and illumination, the key points need to be described by a set of vectors. The descriptors need to include key points and pixels that contribute to the key points. Meanwhile, independent characteristics of the descriptors should be ensured to improve the probability of correct matching of feature points.

Key point matching is divided as follows.

1.
Gradient calculation

The modulus and orientation are determined by Formula (8).

$$ m\left(x,y\right)=\underset{\uptheta \left(x,y\right)=\upalpha \mathrm{tan}2\left(\frac{N\left(x,y+1\right)-N\left(x,y-1\right)}{N\left(x+1,y\right)-N\left(x-1,y\right)}\right)}{\sqrt{{\left(N\left(x+1,y\right)-N\left(x-1,y\right)\right)}^2+{\left(N\left(x,y+1\right)-N\left(x,y-1\right)\right)}^2}} $$

(8)

where N represents scale space value of key points.

2.
Gradient histogram statistics

The gradient and orientation of pixels in neighborhood are counted, shown in the form of a histogram. Figure 7 shows that the orientation ranges from 0 to 360°, with a bin per 10° and 36 bins in all. The peak in the field of feature points represents gradient orientation. The histogram of maximum values is the main orientation of key points [16]. Meanwhile, the histogram, with the peak 80% greater than main orientation, is selected as auxiliary orientation, thus improving match robustness.

The entire algorithm has not ended after key points are matched successfully, as large number of mismatch points occur in matching process. Generally, Ransac method is used to eliminate mismatch points in SIFT matching algorithm [17]. The core of the algorithm is continual iteration through repeated testing.

3.3 SIFT image recognition

SIFT feature looks for stable pole through scale space to generate local feature descriptors, where the stable ones have high representative and distinguishing features. However, they are generated in small number in one picture. Therefore, RTS game system in the work adopts classical SIFT matching algorithm as the original SIFT matching method, manually determining the accuracy and mismatch rate of each match, that is, describing the matching performance by accuracy and misrecognition rate. The accuracy is defined by the ratio of number of valid matches retained to total objective. The misrecognition rate is determined by the ratio of the result (the total number of key points minus the match points) to the total number of key points. The identification process includes image pre-processing, module on feature extraction, module on image description, training module, and identification module. A package to identify atlas is exported with the algorithm in Easy AR.

1.
Pre-processing module

The pre-processing module outputs the area containing object in the image by operations, such as filtering the object containing image, searching for the contour, and establishing the minimum surrounding rectangle.

① Gray degree transforms the image by the formula of L = 0.299 ∗ + 0.587 ∗G + 0.114 ∗B.
② The median filter and Gaussian filter are used to filter the original image separately.
③ The image is reversed with grayscale and processes with binarization.
④ Searching contour is conducted to the binary image.
⑤ Establish a minimum surround rectangle for the obtained contour, and output the image inside the smallest rectangle.

2.
Module on feature extraction

Module on feature extraction is divided into two steps. First, the key points are detected by reading the image in the grayscale pattern, then testing the key points with the output of key points assemblage. Second, features are described. The key code is as follows:

Sift feature detector extractor;
Mat descriptor;
extractor.compute(img, keypoints, descriptor);

Code in the first line declares a SIFT feature descriptor generator. The second line declares a Mat type of data used to store description. The third line directly uses the calculation method of descriptor generator to calculate the descriptor.

3.
Module on image description

This module is to describe the image by spatial pyramid model. The specific process is as follows:

Step 1: Extract the SIFT key points of the image by using the initial layer scale of 1, with the initial layer grid width of 6.
Step 2: Extract the key points of SIFT feature of the original image.
Step 3: Combine the key points assemblage generated in Steps 1 and 2 to form a new set of key points.
Step 4: Describe the SIFT feature of the key points assemblage generated in Step 3, and output the image description vector.

4.
Key point matching

For DoG function in scale space, curve fitting is conducted and the ratio of main curvature to feature value is calculated, eliminating edge feature points and feature points with low contrast. After calculating the gradient of extreme point, the adjacent region is divided based on gradient direction of feature points specified by gradient histogram. Then, the feature point descriptor is generated through integration, excluding feature points smaller than threshold value and counting the number of matching points. Setting a threshold, if the number of matching point is bigger than the threshold value, end the program and output match result; if the threshold value is bigger, the approximate component of the first layer image is used again to repeat Steps 2–6, then end the program after getting matching result.

4 System application

In order to verify the effectiveness of the improved SIFT algorithm, we developed a mobile AR game based on LastStandStan [18]. Integrating SIFT image recognition algorithm, AR technology and RTS game, the game focuses on using the proposed SIFT recognition technology to recognize pictures randomly taken by users.

The game is about a science fiction story that takes place in the future. As humans waste too much earth resources, the earth finally becomes deserted after 50,000 years due to depletion of resources. Human beings have to leave their homes and take spacecraft drifting in space, while the space environment is not suitable for breeding the earthling. Therefore, the earth lives are completely caught in sleepy and energy-depleting spacecraft, leaving only the robots still running the instructions given by their developers. Due to the coming depletion of energy, the machine system of spacecraft starts the final program to create an unprecedented super robot. When the robot is about to wake up, its radio waves will be connected to the brain waves of players from the distant past. Players need to help the robot complete various tasks of production and combat readiness. The robot is a virtual image for player’s control of the spacecraft. Players are required to promote development of the game through ways of collection, creation, and war, ultimately completing the final purpose of life research.

4.1 Game design features

The game LastStandStan completes 3D modeling in the 3ds max 2014 environment as well as virtual interaction in Unity3D 5.0 engine. For achievement of AR function, a plug-in based on Easy AR has been developed to implement image recognition through the improved SIFT algorithm. The game output is based on screen of intelligent mobile terminal [19]. These are the following characteristics in the game design:

1.
The game integrates task sequence and construction features of RTS game, blending characteristics of tower-defense game in battle scene. All operations are converted to the changes of various values.
2.
A robot is selected by the player as the protagonist to provide operating instructions when the player switches game levels. Players click UI or corresponding objects through touch. Different values feedback will be returned to players if they click on different objectives. The first time players enter the game, they will receive a hint about whether they choose a robot or not [20,21,22,23,24]. When loading levels, the robot will prompt the player to operate the next step and complete daily collection of energy. Through the energy, all kinds of equipment are produced for wars, life research, and repair of spaceship.
3.
Interaction is implemented on AR virtual objects. The interaction is achieved when the touching point coincides with the fixed point on virtual object. For example, in the spacecraft scene, when the player clicks the cabin in the middle of model, next-step command is actually triggered—cabin color changes the command. Then, the player can switch the level after clicking the confirmation button again.
4.
Two outcomes are designed in the game. One is when the intelligence reaches 99%; the player needs to sacrifice the robot to complete life research, thus awakening the sleeping life of the earth. The other is to retain the robot when the intelligence reaches 99%. New mechanical civilization emerges after a few years.
5.
Different operations are required in various cabins to promote game process. In collection cabin, energy collectors should be built to speed up energy collection. Repair cabin can enhance durability of the spacecraft (the game failure failed at durability less than 0). The content of library cabin remains to be expanded, which allows players to view game background information. Combat cabin can be used to construct defensive equipment and create steed knight for warfare. Steed knight has six service lives, which need to be re-manufactured after the lives have been used up. The player can enter battle scene by clicking steed knight.
6.
Combat scene is designed as a hexagonal terrain, generating enemies at random locations. With large and small sizes, the enemies will blow themselves when touching the defense equipment or base. Large monster has relatively higher moving speed and damage to the base. The defensive device detects surrounding enemies at regular intervals to destroy them. The durability at 0 results in game failure. The player will be rewarded with mineral, energy, and metal after all enemies have been destroyed to win battle victory.
7.
Biological engineering cabin is the main task of the game, where players need to enhance the artificial intelligence through energy as well as minerals and metal obtained from wars. When artificial intelligence reaches 99%, final story of the game will be triggered to retain mechanical civilization or complete life.

4.2 Artificial intelligence

In order to enhance pleasure of the game, behavior logic is added in game design. Such logic is controlled by the system rather than the player, with feedback of numerical values [25]. When a single player fights against AI, AI is in random combinations and fight after AR recognition to players’ image. The characteristics of AI is that if AI takes advantage in the battle, it automatically chases players and launches attack after reaching a certain distance; if AI is in the weak side, it escapes in random directions. If they are the same branch of armed forces, two states are randomly generated—one is to actively pursue the player and attack in a certain distance, and the other is to escape in random directions. Table 1 shows specific artificial intelligence.

Table 1 Artificial intelligence

Design of mobile augmented reality game based on image recognition

Abstract

1 Introduction

2 Overall structure of game system

3 Methods of SIFT image recognition

3.1 Basic principles of SIFT image recognition algorithm

3.2 Key point detection

3.3 SIFT image recognition

4 System application

4.1 Game design features

4.2 Artificial intelligence

4.3 Numerical balance

4.4 Main model design

5 Experimental test

5.1 Experiment of SIFT image matching algorithm

5.1.1 Experiment on parameter adjustment for illumination change image

5.1.2 Experiment on parameter adjustment for angle change image

5.1.3 Summary of experimental results

5.2 Game test

6 Results and discussion

7 Conclusion

References

Acknowledgements

Availability of data and materials

Author’s contributions

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Author’s information

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords