基于Python的PC微信自动化探索：uiautomation+OpenCV+EasyOCR

1 背景

随着PC微信版本不断升级（尤其 4.x 之后），传统UIAutomation控件树自动化方案逐渐失效，新版微信大量采用跨平台自绘 UI（Canvas渲染，类似Electron/Skia/Qt 思路），此时整个窗口就像一张canvas，所有按钮/输入框都是画出来的，而不是系统控件，这导致：

Inspect.exe 无法查看内部控件
pywinauto / uiautomation 无法直接定位按钮
自动化脚本稳定性下降

而且微信出于群控灰产、批量营销、自主商业等目的，在主动反自动化可能会进一步加强，出于合法合规的考虑，当前可靠的自动化技术路线是：窗口控制 + 截图 + 模板匹配 + OCR + 鼠标键盘模拟，即完全模拟人工操作，在AI大模型日益成熟的今天，这种技术路线也必将越来越成熟。本文结合实际工程经验，介绍如何使用：

uiautomation
OpenCV
EasyOCR

本文抛砖引玉，实现一个相对稳定的微信窗口自动化框架。

2 环境准备

首先需要安装python环境以及必要的依赖库，python安装可自行参考网络资源，版本这里使用的是Python 3.11.9，依赖库安装如下：

pip install uiautomation
pip install opencv-python
pip install pillow
pip install numpy
pip install easyocr

在python项目中，如果程序运行过程中提示缺少模块，直接问豆包等AI工具都会直接告诉你如何安装相应库。

3 关键库介绍

3.1 uiautomation

uiautomation是Python中基于Microsoft UI Automation框架的强大库，用于实现Windows应用程序的自动化操作、软件测试和辅助工具开发，可从项目github主页获取更多内容及Demo示例，以下对其使用进行简单介绍。

1 基础操作方法

（1）鼠标点击

import uiautomation as uia
# 绝对坐标点击
uia.Click(x=100, y=200)
# 控件相对点击
button = uia.ButtonControl(Name='确定')
button.Click(ratioX=0.5, ratioY=0.5)  # 点击控件中心

（2）键盘输入

# 文本输入
uia.SendKeys('Hello World')
# 快捷键模拟
uia.SendKeys('{Ctrl}c')  # 复制操作

2 窗口管理方法

（1）窗口显示控制

window = uia.WindowControl(Name='记事本')
# 窗口状态设置
uia.ShowWindow(window.NativeWindowHandle, 1)  # 最大化
uia.ShowWindow(window.NativeWindowHandle, 2)  # 最小化

（2）窗口等待机制

wcwin = uia.WindowControl(searchDepth=1, Name='微信')
if wcwin.Exists(3, 1):
    print("发现微信，并将激活微信窗口")

Exists函数两个参数可以理解为：“在指定的时间内，按固定的频率反复查找窗口”。第 0 秒立即查找一次，如果发现微信窗口，直接返回True，程序继续；第1，2秒没找到则歇1秒，再查一次；第3秒最后查一次，如果还是没找到，返回 False。直接调用Exists()而不传任何参数，它的默认行为是：立即查找并返回，不进行任何循环等待。

3 控件模式与高级操作

（1）InvokePattern（激活控件）

import uiautomation as uia

button = uia.ButtonControl(Name='确定')
button.GetInvokePattern().Invoke()

uia.ButtonControl(Name='确定') (定位规则)

这一行并不立即执行查找，它只是创建了一个“搜索描述符”。它告诉程序：我要找一个类型是Button（按钮）、名字叫“确定”的控件。

button.GetInvokePattern() (获取模式)

这是UI Automation (UIA) 标准的核心，UIA将控件的能力抽象为“模式”，InvokePattern专门用于处理类似“点击”的触发动作。

.Invoke() (执行动作)

这才是真正的执行指令，程序会顺着之前的描述符去桌面的UI树里搜寻，一旦找到匹配的第一个对象，就向它发送一个“触发”指令。如下图中工具Inspect给出了桌面UI树：

示例代码中那样直接通过uia.ButtonControl定义，它默认会在全桌面范围内搜索，这非常危险，因为它可能会点到任务栏、后台程序或者其他无关窗口的“确定”按钮。实际代码中一般会限定范围，如在窗口内部查找：

import uiautomation as uia

# 先定位目标窗口
window = uia.WindowControl(Name="记事本")  # 或 ClassName="Notepad"
# 再在【这个窗口内部】找“确定”
button = window.ButtonControl(Name="确定")
button.GetInvokePattern().Invoke()

如果一个窗口里有多个“确定”，还可以用foundIndex指定哪一个：

# 第一个确定
button1 = window.ButtonControl(Name="确定", foundIndex=1)
# 第二个确定
button2 = window.ButtonControl(Name="确定", foundIndex=2)

（2）ValuePattern（值操作）

edit = uia.EditControl(Name='输入框')
# 获取值
print(edit.GetValuePattern().Value)
# 设置值
edit.GetValuePattern().SetValue('New Text')

（3）SelectionPattern（选择控制）

combo = uia.ComboBoxControl(Name='下拉框')
selected_items = combo.GetSelectionPattern().GetSelection()

4 上下文管理器

（1）线程安全初始化

with uia.UIAutomationInitializerInThread():
    # 确保COM环境正确初始化
    main_window = uia.WindowControl(Name='主窗口')
    main_window.SetActive()

在这个代码块里，初始化线程的 UIAutomation 环境，然后安全地操作窗口。为当前线程初始化Windows COM组件环境，UIAutomation底层依赖Windows COM接口，主线程会自动初始化，子线程/多线程必须手动初始化，不写会直接报错：RPC_E_CHANGED_MODE、访问冲突、调用失败，即只要在子线程/多线程里用uiautomation，外面必须包这一句。

5 控件属性与定位

（1）常用控件属性

（2）控件定位方法

# 通过名称定位
button = uia.ButtonControl(Name='确定')
# 通过自动化ID定位
edit = uia.EditControl(AutomationId='txtInput')
# 层级遍历定位
root = uia.GetRootControl()
for child in root.GetChildren():
    if child.ControlTypeName == 'Window':
        print(child.Name)

6 异常处理与优化

（1）控件不可见处理

target = uia.ButtonControl(Name='隐藏按钮')
if target.Exists():
    if target.IsOffscreen:
        target.SendKeys('{PageDown}')  # 滚动显示
    target.Click()

（2）超时异常处理

for _ in range(3):
    try:
        window = uia.WindowControl(Name='延迟窗口')
        if window.Exists(1):  # 1秒超时
            window.Click()
            break
    except Exception as e:
        print(f'操作失败: {e}')

3.2 easyocr

1 原理

EasyOCR原理简单：先找文字在哪里，再把文字读出来，流程两大核心步骤：

（1）文字检测（Detection）

作用：在图片里找到所有文字区域，画框框住

底层算法：CRAFT，会分析图片每个像素，以确定这里像不像文字？之后把连续的文字连成一个矩形框

输出：一堆文字框坐标。

（2）文字识别（Recognition）

作用：把每个框里的图像变成真正的文字
底层算法：CRNN，把小图输入深度学习模型
输出：文字内容 + 可信度（conf）

完整流程如下：

图片输入
   ↓
【预处理】调整大小、灰度、对比度
   ↓
【文字检测】找到所有文字区域（CRAFT）
   ↓
【切割小图】把每个文字区域切出来
   ↓
【文字识别】把小图变成文字（CRNN）
   ↓
【输出结果】坐标 + 文字 + 可信度

2 技术优势

PyTorch —— 运行深度学习模型

CRAFT —— 找文字位置

CRNN —— 图像转文字

CRAFT对不规则文字、小文字、倾斜文字检测很稳，CRNN专门做端到端文字识别，训练数据包含大量屏幕文字、UI文字、海报文字，所以相比传统工具它具有无可比拟的优势，下面拿它和Tesseract进行对比。

首先给出结论：90%的场景下，EasyOCR效果 > Tesseract，特定场景（纯文档、印刷体、长文本）Tesseract更好。

以下是核心对比：

所以，EasyOCR适合桌面自动化、截图识别、UI 文字读取（拍照/截图/UI界面），而Tesseract适合OCR文字大量提取、文档类识别（文档/扫描件）。

3 使用示例

（1）无参数

适用于只想提取文字，不想看坐标。

import easyocr

# 1. 创建 reader，指定语言，中文+英文
reader = easyocr.Reader(['ch_sim', 'en'])

# 2. 读取图片文字
result = reader.readtext('test.png')

# 3. 输出结果
for bbox, text, conf in result:
    print(f"文字：{text}，可信度：{conf:.2f}")

（2）带常用参数

适用于既要文字识别，又要文字坐标。

result = reader.readtext(
    image_path,       # 图片路径
    text_threshold=0.3,  # 文字识别置信度（低=更容易识别，但可能错）
    low_text=0.3,     # 文字区域灵敏度（低=能识别淡字）
    detail=1          # 1=返回坐标+文字+置信度 0=只返回文字
)

返回内容格式如下：

[
   [ [x1,y1], [x2,y1], [x2,y2], [x1,y2] ],  "文字内容",  可信度
]

bbox：文字的四个角坐标

text：识别出来的文字

conf：可信度（0~1）

（3）支持语言

ch_sim：简体中文，ch_tra：繁体中文，en：英文，ja：日文，ko：韩文

（4）UI自动化、截图识别推荐参数

result = reader.readtext(
    "screenshot.png",
    text_threshold=0.2,
    low_text=0.2,
    detail=1
)

灵敏度最高，适合界面文字、按钮文字、小文字。

4 示例程序

4.1 模板匹配（OpenCV）

模板匹配通俗来讲就是拿着一张小图（模板），在大图里到处滑动对比，找一模一样/最相似的位置。

result = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)

# 1. 获取最大值、最小值 + 坐标
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

# 2. 匹配度最高的点就是 max_loc
best_x, best_y = max_loc
best_score = max_val  # 最高匹配分数

# 3. 设置阈值判断是否有效
threshold = 0.8
if best_score >= threshold:
    print("找到匹配最高的目标：", best_x, best_y, "分数：", best_score)
else:
    print("未找到有效匹配")

代码中tempate是模板图片，img是要进行匹配的大图，这里会使用TM_CCOEFF_NORMED算法进行匹配，并根据阈值找到最大的匹配值。进行模板匹配时，模板图片要提前准备好，并且匹配结果坐标是在大图中的位置，还需根据实际情况，确定是否转换为在整个桌面的位置。

4.2 鼠标拖动

鼠标拖动首先需要确定拖动的起始和结束位置，然后还需要给出多长时间内分几次完成拖动，整个过程就是移动到起始位，按下鼠标左键，分步拖动，抬起左键，整个示例代码如下：

def mouse_drag(start_x, start_y, end_x, end_y, duration=0.5, steps=20):
    """
    模拟鼠标左键拖动
    start_x, start_y: 起点屏幕坐标
    end_x, end_y: 终点屏幕坐标
    duration: 拖动总耗时
    steps: 拖动分多少步
    """
    print("模拟鼠标拖动({},{})->({},{})".format(start_x, start_y, end_x, end_y))
    # 移动到起点
    user32.SetCursorPos(int(start_x), int(start_y))
    time.sleep(0.1)

    # 按下左键
    user32.mouse_event(2, 0, 0, 0, 0)  # MOUSEEVENTF_LEFTDOWN
    time.sleep(0.1)

    # 拖动
    for i in range(1, steps + 1):
        x = int(start_x + (end_x - start_x) * i / steps)
        y = int(start_y + (end_y - start_y) * i / steps)
        user32.SetCursorPos(x, y)
        time.sleep(duration / steps)

    # 抬起左键
    user32.mouse_event(4, 0, 0, 0, 0)

4.3 完整代码

以下代码给出了操作微信的简单示例，在例子中会根据名称在联系人列表中查找该联系人，找到联系人后对其选中并发送消息“你好”，之后再通过模拟鼠标拖动滚动轮查看历史消息，完整代码如下：

import uiautomation as uia
import os, sys
import math
import ctypes
import cv2
import numpy as np
from PIL import ImageGrab
import easyocr
import time

# ⭐ DPI 修正（非常重要）
ctypes.windll.shcore.SetProcessDpiAwareness(2)

user32 = ctypes.windll.user32
screen_w = user32.GetSystemMetrics(0)
screen_h = user32.GetSystemMetrics(1)
print(f"屏幕分辨率: {screen_w}x{screen_h}")
wcleft = wctop = wcright = wcbottom = 0
wcwidth = wcheight = 0
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
print(f"加载easyocr")
reader = easyocr.Reader(['ch_sim', 'en'], gpu=False, verbose=False)

def TempMatch(tempName):
    # ⭐ 截图窗口
    img_pil = ImageGrab.grab(bbox=(wcleft, wctop, wcright, wcbottom))
    img = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)

    # ⭐ 读取模板图标
    template = cv2.imread(tempName)
    h, w = template.shape[:2]

    # ⭐ 模板匹配
    result = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
    threshold = 0.8
    found = False
    screen_x = screen_y = 0
    '''
    loc = np.where(result >= threshold)
    for pt in zip(*loc[::-1]):
        found = True
        # ⭐ 图标在窗口中的位置
        icon_x = pt[0]
        icon_y = pt[1]
        # ⭐ 转换为桌面坐标
        screen_x = int(wcleft + icon_x + w // 2)
        screen_y = int(wctop + icon_y + h // 2)
        print("发现图标："+tempName)
        print("图标窗口位置:", icon_x, icon_y)
        print("图标桌面位置:", screen_x, screen_y)
        # ⭐ 画框显示
        cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
        cv2.imwrite("match_result.png", img)
        break
    '''
    # 1. 获取最大值、最小值 + 坐标
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

    # 2. 匹配度最高的点就是 max_loc
    #best_x, best_y = max_loc
    best_score = max_val  # 最高匹配分数

    # 3. 设置阈值判断是否有效
    threshold = 0.8
    if best_score >= threshold:
        found = True
        icon_x, icon_y = max_loc
        # ⭐ 转换为桌面坐标
        screen_x = int(wcleft + icon_x + w // 2)
        screen_y = int(wctop + icon_y + h // 2)
        print("发现图标："+tempName)
        print("图标窗口位置:", icon_x, icon_y)
        print("图标桌面位置:", screen_x, screen_y)
        # ⭐ 画框显示
        cv2.rectangle(img, (icon_x, icon_y), (icon_x + w, icon_y + h), (0, 0, 255), 2)
        cv2.imwrite("match_result.png", img)
        
    return found, screen_x, screen_y

def mouse_drag(start_x, start_y, end_x, end_y, duration=0.5, steps=20):
    """
    模拟鼠标左键拖动
    start_x, start_y: 起点屏幕坐标
    end_x, end_y: 终点屏幕坐标
    duration: 拖动总耗时
    steps: 拖动分多少步
    """
    print("模拟鼠标拖动({},{})->({},{})".format(start_x, start_y, end_x, end_y))
    # 移动到起点
    user32.SetCursorPos(int(start_x), int(start_y))
    time.sleep(0.1)

    # 按下左键
    user32.mouse_event(2, 0, 0, 0, 0)  # MOUSEEVENTF_LEFTDOWN
    time.sleep(0.1)

    # 拖动
    for i in range(1, steps + 1):
        x = int(start_x + (end_x - start_x) * i / steps)
        y = int(start_y + (end_y - start_y) * i / steps)
        user32.SetCursorPos(x, y)
        time.sleep(duration / steps)

    # 抬起左键
    user32.mouse_event(4, 0, 0, 0, 0)

wcwin = uia.WindowControl(searchDepth=1, Name='微信')
if wcwin.Exists(3, 1):      # 第0，1，2，3秒分别查看窗口是否存在，只要存在返回True继续
    print("发现微信，并将激活微信窗口")
    wcwin.SetActive()
    #wcwin.SetTopmost(True)
    # ⭐ 获取窗口桌面位置
    wcwin.CaptureToImage('wcMain.png')
    rect = wcwin.BoundingRectangle

    wcleft, wctop, wcright, wcbottom = rect.left, rect.top, rect.right, rect.bottom
    wcwidth, wcheight = wcright - wcleft, wcbottom - wctop
    print(f"微信窗口位置: {wcleft},{wctop},{wcwidth},{wcheight}")
    
    template_path = os.path.join(BASE_DIR, "contact.png")
    found, screen_x, screen_y = TempMatch(template_path)
    if not found:
        print("未找到图标")
    else:
        uia.Click(screen_x, screen_y)
        wcwin.CaptureToImage('wcWnd.png')
        # 扫描完整截图
        result = reader.readtext(r'wcWnd.png', text_threshold=0.3, low_text=0.3)
        search_x = search_y = 0
        print('=== ALL TEXT (low threshold) ===')
        for detection in result:
            bbox, text, conf = detection
            cx = int((bbox[0][0] + bbox[2][0]) / 2)
            cy = int((bbox[0][1] + bbox[2][1]) / 2)
            print(f'({cx:4d},{cy:4d}) {text} [{conf:.2f}]')
            if text.find(f'搜') != -1 or text.find(f'索') != -1:
                search_x, search_y = wcleft+cx, wctop+cy
                print("找到搜索框")
                break
        if search_x > 0:
            print("单击搜索框，并发送联系人名称")
            uia.Click(search_x, search_y)
            uia.SendKeys('ZWW')
            time.sleep(0.5)
            wcwin.CaptureToImage('wcWnd.png')
            # 扫描完整截图
            result = reader.readtext(r'wcWnd.png', text_threshold=0.3, low_text=0.3)
            search_x = search_y = 0
            #print('=== ALL TEXT (low threshold) ===')
            findIdx = 0
            for detection in result:
                bbox, text, conf = detection
                cx = int((bbox[0][0] + bbox[2][0]) / 2)
                cy = int((bbox[0][1] + bbox[2][1]) / 2)
                print(f'({cx:4d},{cy:4d}) {text} [{conf:.2f}]')
                
                if text.find(f'Z') != -1 or text.find(f'W') != -1:
                    findIdx += 1
                    if findIdx > 1:
                        search_x, search_y = wcleft+cx, wctop+cy
                        break
            if search_x > 0:
                print("发现联系人，单击，并发送'你好'")
                uia.Click(search_x, search_y)
                uia.SendKeys('你好')
                uia.SendKeys('{Enter}')
            else:
                print("未找到")
        print("移动鼠标到微信窗口中间位置")
        # move to the center of wechat
        user32.SetCursorPos(wcleft+wcwidth//2, wctop+wcheight//2)
        template_path = os.path.join(BASE_DIR, "dragdown.png")
        found, screen_x, screen_y = TempMatch(template_path)
        if not found:
            print("未找到图标")
        else:
            print("拖动滚动条")
            # 最新版微信不最右侧的拖动条很敏感，一不小心就成修改窗口尺寸了，所以这里-3，尽量远离边框
            mouse_drag(screen_x-3, screen_y, screen_x-3, wctop, 2, 30)
else:
    print("微信未发现")

View Code

程序运行需要自行制作用于微信主界面匹配的模板文件：contact.png和dragdown.png，使用任意截图文件在微信主界面进行截图就可以制作相应文件。

程序运行效果图如下：

4.4 补充说明

以上程序只是给出简单的示例程序，要想真正的做好微信自动化工具，还有很多细节需要考虑，比如利用opencv的线段检测可以把各个功能区进行详细划分，这样在进行相关操作时，需要在相应的功能区进行，提高模板匹配或文字识别的精度。

import cv2
import numpy as np

# 1. 读取图片
img = cv2.imread("wcMain.png")

# 2. 转灰度图
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 3. 边缘检测（必须！霍夫变换需要边缘图）
edges = cv2.Canny(gray, 30, 120)

# 4. 线段检测（核心函数）
lines = cv2.HoughLinesP(
    edges,          # 边缘图
    1,              # 半径精度, 1 个像素，适合绝大多数场景，速度快、精度够；2 个像素，速度更快，但检测会更粗糙。
    np.pi/180,      # 角度精度
    threshold=50,   # 阈值，越高检测越少
    minLineLength=150,  # 最小线段长度50
    maxLineGap=6      # 最大间断间隔
)

# 5. 把线段画在图上
if lines is not None:
    print("{} lines detected".format(len(lines)))
    lineColor = 255
    for line in lines:
        x1, y1, x2, y2 = line[0]
        print("(%d,%d) (%d,%d)"%(x1, y1, x2, y2))
        cv2.line(img, (x1,y1), (x2,y2), (0,0,lineColor), 2)
        lineColor -= 10
        if lineColor < 50:
            lineColor = 50
else:
    print(f"no line detected")

# 6. 显示结果
cv2.imshow("edges", edges)
cv2.imshow("result", img)
cv2.waitKey(0)