Doksi Docs
Device API

Interaction

Tap, swipe, type text, and press keys on the device.

Direct input to the device screen. Every interaction tool requires a reason parameter — a first-person description of the action and its intent. This is captured for audit logging.

All interaction tools return a diff of what changed on screen as a result of the action.

tap

Tap at specific screen coordinates. Use get_textual_state to find element positions.

ParameterTypeRequiredDescription
xnumberYesX coordinate
ynumberYesY coordinate
reasonstringYesIntent description
Tapped at (540, 750).

Elements changed:
  EditText "Email" focused=false -> focused=true
  Keyboard appeared

long_press

Long press at a coordinate. Useful for context menus, drag initiation, or selection.

ParameterTypeRequiredDescription
xnumberYesX coordinate
ynumberYesY coordinate
durationnumberNoHold duration in milliseconds. Default: 1000
reasonstringYesIntent description

swipe

Swipe between two points. Use for scrolling, dismissing cards, pull-to-refresh, and drawer gestures.

ParameterTypeRequiredDescription
startXnumberYesStart X
startYnumberYesStart Y
endXnumberYesEnd X
endYnumberYesEnd Y
durationnumberNoSwipe duration in milliseconds. Default: 300
reasonstringYesIntent description
Swiped from (540, 1800) to (540, 400).

Elements added:
  TextView "Item 5" [120,1600][960,1700]
  TextView "Item 6" [120,1720][960,1820]

type_text

Type text into the currently focused input field. The keyboard must be visible — verify with get_general_state before calling.

ParameterTypeRequiredDescription
textstringYesText to type
reasonstringYesIntent description
Typed "user@example.com".

press_key

Press a system key or send a numeric key code.

ParameterTypeRequiredDescription
keystringYesback, home, enter, recents, or a numeric key code
reasonstringYesIntent description
Pressed key: back.

On this page