What is Object?

Let's consider typical interaction with software application. For example, we know that we need to type a name in the name field. What do we do?

User Login Input Box

Suppose that application is running. So we are trying to find the place on the screen where to enter the name:

  • Part 1: We are looking for something like a text box. It may have a label "User:" or "Name:".
  • Part 2: We click there and type the name.
  • Part 3: Sometimes we look at what we just typed to see if entry is correct.

UI Automation tool does exactly the same. An interaction has same 2 parts:

  • Part 1: Locating a widget.
  • Part 2: Interacting with it.
  • Part 3: Checking widget state.

Now suppose that application has been updated. We see a bit modified user interface where text box changed the style and, maybe, location.

Now we spend a bit more time to recognize the input box (if we were familiar with previous layout). Once we realize where it is, we click and type the name.

While 1st part changes when layout modified, we can see that 2nd part of the interaction stays the same: click and enter value. This brings us to essential idea of splitting test tool logic in a way that object recognition is separated from object interaction. So we don't mix those parts.

Also when we talk about object recognition we may consider different approaches. Some people do read labels before entering text. Others rely on the common sense feelings. For routine tasks where we have to login every day, we already expect location of the input text box to be the same every time and prepared to click on it even if application still loads and just getting ready to be show on the screen. Some people have good memory and feeling of colors, so graphical style drives them through the logic entries.

So it appears natural to have similar separation in the automated test. It became a common approach. Even with open source and ad-hoc solutions, involving Selenium WebDriver and Java programming it is recommended to use 'Page Object' patter that is a separation between object location and interactions. Primary goal of this separation is maintainability (http://toolsqa.com/selenium-webdriver/page-object-model/).

We may also note that object interaction stays the same. If it is a text box, then we can type text in it. So if input field is moved or replaced with another text field it is expected to contain same interaction pattern, or expose the same behavioral API.

To accommodate such interaction pattern some UI testing tools introduce the notion of UI Object or just Object. Rapise has this abstraction implemented.

An UI Object is a combination of the object location, set of interaction actions and state accessors, i.e.:

  1. Object location is a way to find a widget on the screen. In Rapise we call it Locators.
  2. Interaction actions are Click, Press, type text (in Rapise those are called Behaviors).
  3. State accessors are APIs to find out widget text, color, checked state and so on (Get- and Set- properties in Rapise).

Rapise supports notion of UI Objects. UI Objects are saved in the repository and may be used for test execution. UI Objects may be shared between multiple test scenarios.

Object Locators

In many cases recognizing an object an interacting with it requires different skills. This difference becomes deeper in UI automation. While locating an object we need to understand how to find a widget: find a window, then find region/tab/form within that window, then find an input text widget. The term 'find' has special meaning depending on the technology. We use different APIs for that: it may be Microsoft Active Accessibility, UI Automation, internal application UI model hooking (Managed, Java, ActiveX), DOM model interaction.

Here we introduce the notion of object Locator whose goal is to provide simple, clear way of finding a widget within its application. Locator makes object recognition straight forward enough while keeping it simple and easy to modify.

Locator Features

Soon we will describe different implementation of the Object Locators. How to find out that specific locator is good or bad? Let's talk about expected features of the locator to see which one is better.

First, locators should be resilient. It should survive when little updates to UI design are introduced. Various heuristics are applied to achieve a stable permanent locator.

For web tests we need even more than resistance to small UI change. Since we expect the same web app to be rendered by different browsers and mobile devices then we should have a locator that is browser-independent. Luckily there is such a technology -- XPath and Rapise uses it for locating web elements.

The same object in Rapise may have several locators. In this case Rapise tries them one-by-one until something is found. Usually locators are ordered from fastest and most explicit ones to more generic and slow.

Consider an interaction with application exposing Microsoft Active Accessibility aka MSAA. We use "Generic" Rapise recording library. We use standard Windows calculator as an example:


There are several ways on locating it.

MSAA Locator

MSAA allows applications to expose a tree structure that represents an internal hierarchy of the UI.


Rapise creates a locator when recording an object. The Locator part represents the position in the tree:

MSAA Path Locator

You may notice that the path points to object with role ROLE_SYSTEM_WINDOW and it contains own children including ROLE_SYSTEM_PUSHBUTTON. Rapise contains smart structure matching, described later in this article.

Ordinal Locator

An Ordinal locator is a trivial way for finding an object by its name. We say that some widget has ordinal number N if it is Nth in the 0-based array of all objects with matching window class, name and role found recursively within the whole application window.

Here is how it looks for the button '7' in the calculator:

MSAA Ordinal Locator

Where Ordinal Number is an index in the array of all matching objects.

The logic behind this locator in UI persistence. It is good for object having long explicit name, like 'Total Mileage'. Normally there is only one or few widgets and the one we need keeps the same order.

Drawback of this locator is the need to recursively search the whole MSAA tree. This may be very slow operation. In some AUTs it takes seconds or minutes.

This is one of the reason why ordinal locator is not automatically generated during recording (its construction may slow down recorder). It is automatically added for the objects created using Learn feature. We assume that if object is learned then user has some more time and Rapise tries to search deeper and recognize object better.

Rectangle Locator

This is the most unstable type of locator. We remember rectangle of the widget (relative to the window position).

MSAA Rect Locator

So Rapise checks if there is an object with same size and position within the window and then uses it for further object matching.

byname and bytext locators

There are 2 special types of locators:

'byname' - find widget with matching name (or ID, if applicable) within current application window. byname

'bytext' - find widget with matching Text (or value) within current application window. bytext

See how it is done for .NET and for UI Automation

These locators are slow because imply recursive search. The benefit is stability because such an approach less bound by the control tree structure.

Manual (API based) location

It is possible to find an object using API calls. Here are some examples on how this could be done for web and for UI Automation. See how it may be done in web.

Technology-specific Locators

We just considered several locators based on MSAA. However, MSAA appears to be very general approach and it is not supported by some applications (such as Java Swing, Silverlight). Sometimes it is weak (some controls and widgets are not represented) and sometimes it is too slow, so traversing the tree takes more time. So Rapise has a number of technology-focused libraries that provide support other types of location methods provided by other technologies.

UI Automation

Microsoft UI Automation (UIA) is an application programming interface (API) that allows one to access, identify, and manipulate the user interface (UI) elements of another application. UIA is a successor to Microsoft Active Accessibility.

UIA exposes every piece of the UI to client applications as an Automation Element. Elements are contained in a tree structure, with the desktop as the root element.

UIA Tree

UI Automation tree is more concise and clear compared to MSAA so locator may look as a /-separated path:

UIA Path

Where path elements are build using Automation ID (if available) or Name properties of each element in the tree path:

UIA Properties

From the application developer perspective it is important to be careful and make Automation ID for key elements. This makes UI Automation faster and produced locator paths cleaner.

Java Path

Java has several UI libraries (AWT, Swing, SWT) exposing different protocols.

For AWT and Swing Rapise has deep integration using Java Access Bridge technology.

This gives a tree structure containing internal object names and Java class names:

Java Tree

Those names are used to produce /-separated path (like with UIA).

Java Location

Standard Widget Toolkit (SWT) technology exposes UI Automation APIs, so Rapise has SWT support library with UIA locators.

.NET Windows Forms

Windows Forms (WinForms) is a graphical (GUI) class library included as a part of Microsoft .NET Framework, providing a platform to write rich client applications for desktop, laptop, and tablet PCs

Rapise has deep support for WinForms apps. So spy can see tree of controls with type and text contents:

WinForms Tree

And locator is constructed as a /-separated path of control Names:

WinForms Locator

Web Browsers and Selenium Targets

Rapise uses the fact that all web browsers have strong, standardized way of representation of a web page.

When HTML text defining a web page is loaded, the browser creates a Document Object Model of the page (HTML DOM).

Rapise Spy has a Web Spy mode displaying DOM tree for currently loaded pages and frames:

Web Tree

There is a well-known language XPath designed for finding nodes in the DOM tree. Rapise uses it as universal, cross browser element locator.

Web Locator XPath

In addition to that Rapise has [JavaScript as a language of its internal representation and it is the same language as used to make web application live. So Rapise is friendly to representation formats (such as JSON).

Mobile Targets

Rapise connects to mobile devices and emulators via Appium.

For native mobile apps native tree is shown:

Mobile Tree

Locator for native apps is /-separated path.

For mobile web apps the tree is similar to web:

Mobile Web App Tree

The locator here is XPath. So the recorded object is also a cross-browser.

Location Process

Now we can go over the process of the locator resolution.

  1. Find a Window. For mobile and selenium targets this process is bypassed. The window is found in the desktop using Window Name and Window Class property.

    Object-specific window information is taken from the location description:


    And then it looks through the list of top level windows:

    HWND Tree

    In many cases regular expressions are used to match window name or class name.

    This process may result in an array of matching windows. In this case next phase is applied to 1st found window and, if nothing found there, proceeds to others consequently. If nothing is found in any existing window then Rapise assumes that object not found.

  2. Connect to technology. At this point Rapise has a root Window and tries to establish connection to technology-specific APIs. , i.e.:

    • For web browsers it is usually a connection to a plugin running inside the browser. Rapise has plugins and extensions for all major browsers.

    • For Selenium profiles it is a connection to a local or remote Selenium server.

    • For Mobile targets it is connection to a local or remote Appium instance.

    • For Java it is a connection to JavaAccessBridge.

    • For .NET WinForms it is an own ManagedProxy connecting to remote process message loop.

    • For VB6 it is own technology that is able to interact with VB forms.

    • Or it is a connection to UI Automation and MSAA using Windows Automation API.

  3. Resolve location. Next, locator is resolved using provided path, found window and technology API entry point. If locator is sufficient then Rapise has technology-specific object (window handle, IAcessible pointer, Managed ControlProxy, link to web browser, etc.). If nothing is found by given locator then next window is checked.

  4. Property matching. Once we have a pointer to an object found by locator it is time to match its properties to make sure that locator found what we actually expected to find.

    Usually object name is checked. It may be name, ID, caption or text of the object. If name is not the same then one may set Ignore Object Name locator option in the object properties:


    Or, Name in locator may be made more universal using Regular Expressions.

  5. Pattern matching. For MSAA and UIA additional deeper matching logic is applied. This is important for the complex objects. For example, Calculator button in MSAA is a subtree containing several elements:

    MSAA Button Tree

    So matching the whole button means matching: object of type ROLE_SYSTEM_WINDOW having a child with role ROLE_SYSTEM_PUSHBUTTON

    Similar structures are used to represent Combo Boxes, Lists, Trees and so on. Rapise allows defining Matcher Rules defining such a structure. You may learn more from this tutorial.

Object Recording

Many automation testing tools allow manual object location. Rapise provides recording and learning capabilities so object locators are built automatically.

Automatic creation of the locator usually is a trivial task. However making it fast, resilient short and clear is a tricky complex process. Rapise has many different heuristics used for recording objects for different technologies and under various circumstances. These heuristics are populated and upgraded all the time.

Automatic creation is possible when application is well designed and all objects have structure, names or unique IDs making recognition easier.

Why Recording may be Ineffective

In real world it is a common situation that an application being automated is perfect externally, but has issues inside. It makes object recognition non-optimal. Produced locators are unclear, weak or non-resilient.

In some cases object IDs are set, but are changed after each rebuild or each application restart. This confuses recorder because it relies on ID persistence.

Sometimes IDs are not unique. For example, there might be two fields with ID 'name' where one means user name and another is his pet's name.

Usually situation is not such simple. In many cases application is mostly recording-friendly except few forms or screens.

How to Improve Recording Quality

Easiest way is to improve an application. In many cases assigning IDs and names is simple. Developers don't do it because they don't feel the need for it and no one asks about it. But if you ask them - they do.

Sometimes application source code is unavailable or frozen, so we have to adopt to an application as-is. This is where Rapise provides a set of powerful and useful features.


Rapise allows learning an object by pointing on it. In many cases learning is similar to Recording. However Learning may be more powerful in depth an quality of the object recognition.

For example, the Generic library may generate additional locators when one learns an object, because it has more time for deeper analysis.

There are more powerful ways for learning an object using the Spy.

Spy / WebSpy / Mobile Spy

Spy tool provides even more capabilities for high quality learning of objects.

With spy one may choose exact item to be learned from the elements tree. This is not always possible or easy because elements may overlap on the screen.

Depending on the chosen technology the Spy may provide additional benefits:

  1. Mobile Spy
    • Provides abilities to view device screen.
    • Allows sending and recording mobile actions (such as Tap).
    • Gives an ability to choose preferred locator method and trying it.
  2. Web Spy
    • Enables several XPath heuristics and shows suggested XPath versions to choose for Learn.
    • Allows trying different XPath versions to develop an optimal one.

    WebSpy XPath

    • While Cross Browser recording only works with direct browser libraries, Web Spy also enables Learn for Selenium targets so it is possible to record remote targets such as Safari.


Another strong feature dedicated to maintenance of locators is Re-Learn. If AUT changed over time so that widgets are still there but are not visible anymore it is possible to use Re-Learn and point to them.


Advanced Object Location Methods

There are more advanced methods for object location that may be useful with exceptional cases.

First is Optical Character Recognition - (OCR). OCR is the conversion of photographs of text, into editable text. It is still a technology that requires proper windows display, theme and font settings to make recognition quality high enough. However it may be a solution for very advanced tasks where no APIs are available to read the screen text.

Rapise provides a sample UsingOCR demonstrating this feature.

Another possible way is to use accessibility APIs directly in your test script and gather required object. This is a way to re-produce the work done by a locator and patch it with variations.


So we hope that this topic managed to shed more light on object recognition internals. Rapise interacts with application through some Accessibility APIs. Usually application widgets are represented by Object Tree that may be seen in the Spy application. And for the Automated Test it all is wrapped by the UI Object. In JavaScript we get it when we do:

    // Find an object

    // Interact with an object

An UI Object has explicit Locator and Behavior so object recognition and object interaction is separated. Easy ways for updating Locators enable advanced application maintainability. Spy and Learn features allow creation of more stable and resilient locators.