• 如何使用油猴插件实现自动登录功能?
  • 发布于 2个月前
  • 251 热度
    0 评论
最近受够了公司内部站点每次登陆都需要填写用户名和密码,还有输入验证码。要是能够直接跳过登陆页面就好啦。说干就干,决定使用油猴插件实现自动登陆功能。其中最难解决的就是验证码破解,花了一天的时间完美解决,现在整理出来。
一.分析验证码

分析验证码,是破解验证码一切工作的开始。
1.验证码有哪些特征?
2.是否容易破解?
3.采用什么策略破解?

特征总结
这里仅是总结一下公司网站验证码(上面验证码图片)的特征。
1.仅有字母(大小写)和数字,并且剔除了难以区分的字符:1、i、I、l、L、0、o、O。
2.同一字符每次出现的大小、粗细、倾斜都一致(容易做成标准的字符样本库)
3.首字符开始的位置一致(方便裁剪左侧背景)
4.有干扰线和背景色,颜色相较于字符都比较亮(方便通过阈值来区分像素是否属于字符)

制定破解策略
根据上一步分析的验证码特征来制定破解该验证码的策略。
1.制作标准样本库
2.使用标准样本对验证码图片进行卷积比对(下面会有介绍)

二.制作样本库
1.请求获取验证码
2.提取图片像素
3.二值化(将像素处理成0和1)
4.用canvas绘制二值化后的验证码(白底黑字,也可等比放大以便查看和截图)
5.从绘制的二值化后的验证码上截取合适的字符
6.处理字符截图(去白边,去噪点)
7.还原图片的放大比例(若之前有放大处理)
8.保存为模板字符串

获取验证码
// 返回图片base64数据
function getVerifyCode() {
  return fetch(VERIFY_CODE_API)
    .then(rsp => rsp.json())
    .then(data => `data:image/png;base64,${data.data}`)
}
将base64数据转成像素
使用canvas。
// 支持base64数据或本地图片路径
async function getImageData(imageSrc) {
  const image = new Image();
  image.src = imageSrc;
  // 等待图片加载完成
  await new Promise(resolve => {
    image.onload = resolve;
  });
  // 创建canvas
  const canvas = document.createElement('canvas');
  const context = canvas.getContext('2d');
  context.drawImage(image, 0, 0);
  return context.getImageData(0, 0, image.width, image.height);
}
返回ImageData类型的对象。
data是一个Uint8ClampedArray,一个类型数组,每4位表示一个像素的rgba值(0-255)。

二值化处理
首先需设置好一个阈值,亮度高于阈值认定为背景,低于阈值暂认定为字符(有可能是噪点或干扰线)。阈值需要根据实际效果进行调优(不断修改)。推荐初始阈值可以设置为[130, 130, 130](rgb通道值,alpha固定是255就不设置了),约是0-255的中间数。
const threshold = [130, 130, 130];

// 返回每一项都是0或1的二维数组
function binarization(imageData) {
  const pixel2binary = pixel => 
    pixel.every((chValue, index) => chValue > threshold[index]) ? '0' : '1';
  
  // data中每4位表示一个像素
  const { data, width, height } = imageData;
  const binaryData = [];
  let x, y, row, rowLoc, pixel, pixelLoc;
  for (y = 0; y < height; y++) {
    row = [];
    // 当前行起始位置
    rowLoc = y * width * 4;
    for (x = 0; x < width; x++) {
      pixelLoc = rowLoc + x * 4;
      // 取该点的rgb色值
      pixel = imageData.slice(pixelLoc, 3);
      row.push(pixel2binary(pixel));
    }
    binaryData.push(row);
  }
  return binaryData;
}
绘制二值化的数据(黑字白底)
function drawBinaryData(context, data, scale = 1) {
  const binary2pixel = binary => 
    binary === '0' ? [255, 255, 255, 255] : [0, 0, 0, 255];
  const repeatAction = (action) => {
    for (let i = 0; i < scale; i++) action();
  };
  const h = data.length;
  const w = data[0].length;
  let x, y, row;
  cosnt pixelData = [];
  for (y = 0; y < h; y++) repeatAction(() => {
    for (x = 0; x < w; x++) repeatAction(() => {
      pixelData.push(...binary2pixel(data[y][x]));
    });
  });
  // 创建ImageData实例
  const imageData = new ImageData(
    Uint8ClampedArray.from(pixelData),
    w * scale,
    h * scale
  );
  return context.putImageData(imageData, 0, 0);
}
输出宽高都放大4倍的验证码:

截图保存样本
挑选合适的验证码将字符截图出来。

上面验证码中的字符5就不适合作为样本,因为截取后右下方会有其它字符的点。当然也可以使用工具或写代码去除.

将所有字符样本都保存下来。这需要不断请求获取验证码图片。

去掉字符截图白边
function cutWhiteEdge(data) {
  let edge;
  const isWhiteEdge = () => 
    edge.every(binary => binary === '0');
  // 连续切边
  const cutEdgeContinuous = (resetEdge, cutEdge) => {
    const _resetEdge = () => (edge = resetEdge());
    for (_resetEdge(); isWhiteEdge(); cutEdge(), _resetEdge());
  };
  // 切边顺序:上下左右
  // 上
  cutEdgeContinuous(
    () => data[0],
    () => data.shift()
  );
  // 下
  cutEdgeContinuous(
    () => data[data.length - 1],
    () => data.pop()
  );
  // 左
  cutEdgeContinuous(
    () => data.map(r => r[0]),
    () => data.forEach(r => r.shift())
  );
  // 右
  cutEdgeContinuous(
    () => data.map(r => r[r.length - 1]),
    () => data.forEach(r => r.pop())
  );
}
还原二值化数据的缩放
function restoreDataScale(data, scale) {
  const scaleData = [];
  let x, y, row;
  const h = data.length;
  const w = data[0].length;
  for (y = 0; y < h; y += scale) {
    row = [];
    for (x = 0; x < w; x += scale) {
      row.push(data[y][x]);
    }
    scaleData.push(row);
  }
  return scaleData;
}
保存模板字符串
就是将处理后的二值化数组,转为字符串形式,方便保存(数据库等)。
function binaryData2Template(data) {
  return data.map(r => r.join('')).join(' ');
}

右侧控制台打印出的就是模板字符串,不过是使用换行符进行每行的分隔。


读取字符截图
上面刚刚介绍了字符截图和处理截图,当中少了读取字符截图这一步。可以写代码直接读取字符截图的文件夹,一次性处理所有字符截图。我在做这一步时,是使用input[type=file]手动每次选择一张字符截图进行处理的(时间紧张),这里贴一下代码。
fileInput.addEventListener('change', e => {
  // 堆代码 duidaima.com
  // 获取文件
  if (fileInput.files.length === 0) return;
  const file = fileInput.files[0];
  const reader = new FileReader();
  reader.addEventListener('load', async e => {
    // e.target.result是图片的base64资源
    const imageData = getImageData(e.target.result);
    const binaryData = binarization(imageData);
    cutWhiteEdge(binaryData);
    // 还原之前对图片的放大
    const restoreData = restoreDataScale(binaryData, 4);
    const template = binaryData2Template(restoreData);
    // 使用clipboard将模板写入剪切板
    navigator.clipboard.writeText(template);
    // 也可以发接口写入数据库...
  });
  reader.readAsDataURL(file);
});
file input的change事件

2.值化阈值调整

经过多次获取验证码、二值化、然后输出查看发现,有些验证码的图片二值化后有的字符被去除了或去除了部分,原因是这些字符的颜色也比较亮。

比如这一张验证码,打印出来是这样的(字符S亮度较高):


此时需要调整阈值(调高一点):
const threshold = [140, 140, 140];

三.卷积比对
上面介绍了如何获取字符模板。在进行卷积比对前,需要处理和保存好所有字符的模板(这是一个辛苦活😭)。
获取模板
我这里直接使用常量定义了所有字符模板。
const CODE_TEMPLATES = {
  2: '0000001111100 0000111111110 0001110000111 0001100000011 0011100000011 0000000000011 0000000000110 0000000001110 0000000001100 0000000011000 0000000110000 0000011100000 0000111000000 0001110000000 0011100000000 0111000000000 0111111111110 1111111111110',
  3: '000001111000 000111111110 001110000110 001100000011 011100000011 000000000011 000000000110 000000001110 000011111000 000011111000 000000001100 000000001110 000000000110 110000000110 110000001100 111000011100 011111111000 001111100000',
  4: '0000000000111 0000000001110 0000000011110 0000000111110 0000000110110 0000001101110 0000011001100 0000110001100 0001110001100 0001100001100 0011000001100 0110000011100 1111111111111 1111111111111 0000000011000 0000000011000 0000000111000 0000000111000',
  5: '000111111111 000111111111 001100000000 001100000000 001100000000 001100000000 011011110000 011111111000 011100011100 000000001100 000000001110 000000001110 000000001100 110000001100 110000011100 111000111000 011111110000 001111100000',
  6: '0000001111 0000111111 0001111000 0011100000 0011000000 0110000000 0110111100 1111111110 1111000111 1110000011 1100000011 1100000011 1100000011 1100000011 1100000111 1110001110 0111111100 0011111000',
  7: '111111111111 111111111111 000000000110 000000000110 000000001100 000000011100 000000011000 000000110000 000000110000 000001100000 000011100000 000011000000 000111000000 000110000000 001100000000 011100000000 011000000000 111000000000',
  8: '000001111100 000011111110 000111000111 001110000011 001100000011 001100000011 001100000111 001110001110 000111111100 000111111100 011100001100 011000000110 110000000110 110000000110 110000001110 111000011100 011111111000 000111110000',
  9: '00001111000 00111111100 01110001110 01100000111 11100000011 11000000011 11000000011 11000000011 11100000111 01100001110 01111111110 00111100110 00000001100 00000001100 00000011000 00001110000 01111100000 01110000000',
  a: '00001111100 00111111110 01110000110 01100000111 00000000111 00011111110 01111111110 11100000110 11000000110 11000001110 11000011110 11111111100 01111101110',
  A: '000000000111000 000000000111000 000000001111000 000000001111000 000000011001100 000000011001100 000000110001100 000000110001100 000001100001100 000001100001110 000011000000110 000011111111110 000111111111110 001110000000110 001100000000111 011100000000011 011000000000011 111000000000011',
  b: '000110000000 000110000000 001110000000 001100000000 001100000000 001100000000 001101111000 011111111110 011110001110 011100000110 011000000111 011000000111 011000000111 111000000110 111000000110 111000001110 111000011100 111111111000 110111110000',
  B: '0001111111100 0011111111110 0011100000111 0011000000011 0011000000011 0011000000011 0011000000111 0111000001110 0111111111100 0111111111100 0110000001110 0110000000110 0110000000110 1110000000110 1100000001110 1100000011100 1111111111000 1111111110000',
  c: '00001111100 00011111110 00111000111 01100000011 01100000011 11100000000 11000000000 11000000000 11000000000 11100000111 01100001110 01111111100 00011110000',
  C: '000000111110000 000011111111100 000111100001110 000110000000110 001100000000110 001100000000111 011100000000000 011000000000000 011000000000000 011000000000000 011000000000000 111000000000000 011000000001100 011000000001100 011000000011000 001100000111000 001111111110000 000011111000000',
  d: '0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000111100110 0011111111110 0011100011110 0110000001100 0110000001100 1110000001100 1100000001100 1100000001100 1100000011100 1110000011000 0110000111000 0111111111000 0011111011000',
  D: '00011111110000 00011111111100 00111000011110 00110000000110 00110000000111 00110000000011 00110000000011 00110000000011 01110000000011 01100000000111 01100000000110 01100000000110 01100000001110 11100000001100 11100000011100 11000001111000 11111111110000 11111111000000',
  e: '00001111100 00011111110 00110000111 01100000011 01100000011 11111111111 11111111111 11000000000 11000000000 11100000000 01110000110 01111111100 00011111000',
  E: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11111111111000 11111111111000',
  f: '000001111 000111110 000111000 001110000 001100000 001100000 111111100 111111100 001100000 011100000 011000000 011000000 011000000 011000000 011000000 111000000 110000000 110000000 110000000',
  F: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11000000000000 11000000000000',
  g: '0000011110011 0001111111111 0001110001111 0011100000111 0011000000110 0111000000110 0110000000110 0110000000110 0110000001110 0111000001100 0011000011100 0011111111100 0001111101100 0000000011100 0100000011000 1110000111000 0111111110000 0011111000000',
  G: '00000111111000 00001111111100 00011100001110 00110000000110 01110000000111 01100000000000 11100000000000 11000000000000 11000000000000 11000001111110 11000001111110 11000000000110 11000000001110 11000000001100 11100000001100 01110000011100 00111111111000 00011111100000',
  h: '000111000000 000110000000 000110000000 000110000000 000110000000 001110000000 001110111100 001101111110 001111000111 001100000111 001100000011 011100000111 011100000110 011000000110 011000000110 011000000110 011000001110 111000001110 110000001100',
  H: '0001100000000011 0001100000000011 0011100000000111 0011100000000110 0011000000000110 0011000000000110 0011000000000110 0011000000000110 0111111111111110 0111111111111100 0110000000001100 0110000000001100 0110000000001100 1110000000011100 1110000000011100 1100000000011000 1100000000011000 1100000000011000',
  j: '000000110 000000111 000000110 000000000 000000000 000000110 000001110 000001110 000001100 000001100 000001100 000001100 000011100 000011000 000011000 000011000 000011000 000011000 000111000 000110000 000110000 111110000 111100000',
  J: '0000000000011 0000000000011 0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000000000110 0000000001110 0000000001110 0000000001100 0000000001100 1110000001100 1110000011100 0111000111000 0111111110000 0001111100000',
  k: '0000110000000 0001110000000 0001100000000 0001100000000 0001100000000 0001100000000 0001100001111 0011100011100 0011000111000 0011001110000 0011011100000 0011111000000 0011111000000 0111111100000 0110001100000 0110000110000 0110000111000 0110000011000 1110000011100',
  K: '0001100000001111 0001100000011100 0011100000111000 0011100001110000 0011000011100000 0011000111000000 0011001110000000 0011011100000000 0111111100000000 0111111100000000 0111101110000000 0111000110000000 0110000111000000 1110000011000000 1110000011100000 1100000001100000 1100000001110000 1100000000111000',
  m: '00111011110000111100 00111111111011111110 00111000011110000110 00110000011100000111 00110000001100000111 01110000011100000110 01110000011000000110 01100000011000000110 01100000011000000110 01100000011000000110 01100000111000001110 11100000111000001100 11000000110000001100',
  M: '00011100000000000111 00011100000000001111 00111100000000001111 00111100000000011110 00110110000000111110 00110110000000110110 00110110000001110110 00110110000001100110 01110111000011101110 01100011000011001100 01100011000110001100 01100011000110001100 01100011001100001100 11100011111100011100 11100001111000011000 11000001111000011000 11000001110000011000 11000001110000011000',
  n: '00110111110 00111111111 01111000111 01110000011 01100000011 01100000011 01100000011 01100000111 11100000110 11000000110 11000000110 11000000110 11000001110',
  N: '00011100000000111 00011100000000111 00011110000000110 00011110000000110 00011111000000110 00011011000000110 00111011100001110 00111001100001110 00110001110001100 00110000110001100 00110000111001100 00110000011001100 01110000011011100 01110000011111000 01100000001111000 01100000001111000 01100000000111000 11100000000111000',
  p: '0001101111000 0001111111110 0011110001110 0011100000110 0011000000111 0011000000111 0011000000110 0111000000110 0110000000110 0110000001110 0111000011100 0111111111000 0110111110000 1110000000000 1100000000000 1100000000000 1100000000000 1100000000000',
  P: '000111111111000 000111111111110 000110000000110 000110000000111 000110000000011 000110000000011 001110000000111 001110000000111 001100000001110 001111111111100 001111111110000 001100000000000 011100000000000 011100000000000 011000000000000 011000000000000 011000000000000 111000000000000',
  q: '000011110011 001111111111 001110001111 011100000110 011000000110 111000000110 110000000110 110000001110 110000001110 111000001100 011000011100 011111111100 001111101100 000000001100 000000011000 000000011000 000000011000 000000011000',
  Q: '00000111110000 00011111111100 00111100001110 00110000000110 01100000000110 01100000000111 11100000000111 11000000000111 11000000000111 11000000000111 11000000000110 11000000000110 11000000001110 11000000001100 11100000011100 01110000111000 01111111110000 00011111110000 00000000111000 00000000011100 00000000010000',
  r: '001110111 001111111 001110000 001100000 001100000 001100000 011100000 011000000 011000000 011000000 011000000 111000000 111000000',
  R: '00011111111000 00011111111100 00111000001110 00110000000110 00110000000111 00110000000111 00110000000110 01110000001110 01110000011100 01111111111000 01111111110000 01100000110000 01100000110000 11100000111000 11100000011000 11000000011000 11000000011100 11000000001100',
  s: '00001111100 00111111110 01110000111 01100000011 01110000000 00111110000 00011111100 00000011110 00000000110 11000000110 11100001110 01111111100 00111110000',
  S: '00000111111000 00001111111100 00011100001110 00111000000110 00110000000111 00110000000000 00110000000000 00011100000000 00001111000000 00000111110000 00000000111000 00000000001100 00000000001100 11000000001100 11000000011100 01110000111000 01111111111000 00011111100000',
  t: '0001100 0001100 0001100 1111111 1111111 0011000 0011000 0011000 0011000 0111000 0110000 0110000 0110000 0110000 0111100 0011100',
  T: '11111111111111 11111111111110 00000111000000 00000111000000 00000110000000 00000110000000 00000110000000 00000110000000 00001110000000 00001110000000 00001100000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
  u: '011100000111 011000000110 011000000110 011000000110 011000000110 011000001110 111000001100 110000001100 110000001100 110000011100 111000111100 011111111100 001111011000',
  U: '000110000000011 001110000000011 001100000000111 001100000000110 001100000000110 001100000000110 011100000000110 011100000000110 011000000001110 011000000001100 011000000001100 011000000001100 011000000001100 111000000011100 011000000011000 011100001111000 001111111110000 000111111000000',
  v: '11100000011 01100000111 01100000110 01100001110 01100001100 00100011100 00110011000 00110110000 00110110000 00111100000 00011100000 00011000000 00011000000',
  V: '111000000000111 011000000000110 011000000001110 011000000001100 011000000011100 011100000011000 001100000111000 001100000110000 001100001110000 001100001100000 001100011100000 000110011000000 000110111000000 000110110000000 000111110000000 000111100000000 000011100000000 000011000000000',
  w: '111000001100000111 011000011100000110 011000011100001100 011000111100001100 011000110100011000 011001100100011000 011001100110111000 011011000110110000 011011000110110000 011110000111100000 001110000111100000 001100000011000000 001100000011000000',
  W: '111000000111000000111 111000000111000000110 011000001111000001110 011000001111000001100 011000001111000001100 011000011011000011100 011000011011000011000 011000110011000011000 011000110001000110000 011001110001100110000 011001100001101110000 011001100001101100000 011011000001101100000 011011000001111000000 011110000001111000000 001110000001111000000 001110000001110000000 001100000000110000000',
  x: '0001100000111 0001110000110 0000110001100 0000111011100 0000011111000 0000011110000 0000001100000 0000011110000 0000110110000 0001110111000 0011100011000 0111000011100 1110000001100',
  X: '00011100000000111 00001110000001110 00000110000011100 00000111000011000 00000011000111000 00000011101110000 00000011111100000 00000001111000000 00000001111000000 00000001110000000 00000011111000000 00000111011000000 00000110011100000 00001110001100000 00011100001110000 00111000000110000 00110000000111000 11110000000011000',
  y: '0001100000011 0001100000111 0001100000110 0001100001110 0001110001100 0000110011100 0000110011000 0000110111000 0000110110000 0000111100000 0000111100000 0000011000000 0000011000000 0000110000000 0000110000000 0001100000000 1111100000000 1110000000000',
  Y: '11100000000111 01100000001110 01100000001100 01110000011100 00110000111000 00110000110000 00111001110000 00011011100000 00011011000000 00011111000000 00001110000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
  z: '001111111111 001111111111 000000001110 000000011100 000000111000 000001110000 000011100000 000111000000 000110000000 001100000000 011000000000 111111111100 111111111100',
  Z: '000111111111111 000111111111111 000000000001110 000000000001100 000000000011100 000000000011000 000000001110000 000000011100000 000000111000000 000000110000000 000001100000000 000011100000000 000111000000000 001110000000000 011100000000000 011000000000000 111111111111100 111111111111000', 
};
统计字符模板中有效像素
统计字符模板中有效像素,是指统计模板中出现1的个数(0表示背景,无效像素)。统计有效像素的目的是为了后面判断相似度时使用。这一步也可以在得到模板的时候就做好,然后保存到数据库。
const tplEffectPoints = CODE_TEMPLATES.reduce((calc, code) => {
  // 统计每个字符模板中1的个数
  calc[code] = CODE_TEMPLATES[code].split('').filter(c => c === '1').length;
  return calc;
}, {});
什么是卷积比对

我制作了一个gif示意图。卷积比对,我之前称之为扫描比对,就相当于拿着模板在图片上不停的移动(从左往右,从上往下),判断图片上的有效像素点(为1的点)是否与该字符模板的有效像素点重合度(也是相似度)。


可以想一下,为什么只判断有效像素点的重合度,而不判断非有效像素。
实现卷积比对代码
// 返回是否匹配,匹配个数,匹配位置
function convolution(binaryData, threshold = 1) {
  const codes = Object.keys(CODE_TEMPLATES);
  const h = binaryData.length;
  const w = binaryData[0].length;
  const matches = [];
  let code, tplData, tplH, tplW;

  function doConvolution() {
    let x, y, colLastIdx, rowLastIdx;

    // 返回1的个数,重合个数,重合百分比(相似度)
    const compare = (x, y, code) => {
      let effectivePointNum = 0;
      for (let i = 0; i < tplH; i++) {
        for (let j = 0; j < tplW; j++) {
          if (tplData[i][j] === '1') {
            if (tplData[i][j] === binaryData[i + y][j + x]) {
              effectivePointNum++;
            }
          }
        }
      }
      // 相似度 = 重合点数/字符模板有效点数
      const similarity = effectivePointNum / tplEffectPoints[code];
      return { x, y, similarity };
    };

    // 卷积方向:从左往右,从上往下
    for (y = 0, rowLastIdx = h - tplH; y <= rowLastIdx; y++) {
      for (x = 0, colLastIdx = w - tplW; x <= colLastIdx; x++) {
        const result = compare(x, y, code);
        if (result.similarity >= threshold) {
          matches.push({ ...result, code });
        }
      }
    }
  }

  for (let i = 0; i < codes.length; i++) {
    code = codes[i];
    // 将模板转成二维数组
    tplData = CODE_TEMPLATES[code].split(' ').map(row => row.split(''));
    tplH = tplData.length;
    tplW = tplData[0].length;
    doConvolution();
  }
  // 按位置(x轴)排序
  matches.sort((a, b) => a.x - b.x);
  return matches;
} 
其它处理

在进行卷积比对前,需将验证码进行二值化处理。二值化后的图片可能还需要进行其它处理,如去噪点、去干扰线等。这里简单处理了一下噪点。


去噪点
噪点就是在验证码图片上随机放上一些亮度较暗的一些点,如果我们仅通过明暗这个阈值来做过滤时,很容易将噪点当做有效像素。

噪点的特征
一般来说,噪点都是随机的,不连续的.
这里简单判断一下噪点:如果一个有效点(为1的点)的周围(上下左右)不存在另一个有效点,那么就认为这个有效点是一个噪点。
function denoising(binData) {
  const h = binData.length;
  const w = binData[0].length;
  const isEffectivePoint = (x, y) => binData[y][x] === '1';
  const checkAround = (x, y) => {
    // 边界控制
    const checkTop = y > 0;
    const checkBottom = y < h - 1;
    const checkLeft = x > 0;
    const checkRight = x < w - 1;
    
    return (
      (checkTop && isEffectivePoint(x, y - 1)) ||
      (checkBottom && isEffectivePoint(x, y + 1)) ||
      (checkLeft && isEffectivePoint(x - 1, y)) || 
      (checkRight && isEffectivePoint(x + 1, y))
    );
  };
  
  for (let y = 0; y < h; y++) {
    for (let x = 0; x < w; x++) {
      if (isEffectivePoint(x, y) && !checkAround(x, y)) {
        // 将噪点置为无效点
        binData[y][x] = '0';
      }
    }
  } 
}
后期处理
通过以上卷积比对拿到的结果可能并不总是满足我们的目的。

识别上面的验证码图片,得到的匹配结果是这样的:

识别结果中数量不仅超出了4个,还额外多识别了r。这是因为该字体的字符P中包含了字符r所有的有效像素。所以,在匹配结果中,P字符位置若识别出字符r,我们应该舍弃字符r。这里列出该字体,所有有包含关系的字符:
const containMap = {
  Q: { C: -1 }, // C的x比Q小1
  E: { F: 0 },
  V: { v: 1 },
  y: { v: 2 },
  m: { r: 0 },
  p: { r: 0 },
};
根据字符包含关系进行后期处理:
function afterEffect(matches) {
  if (matches.length <= 4) return;
  // 构建数据结构,方便后续处理 {e: [match], r: [match, match], ...}
  const codeMap = matches.reduce((map, item) => {
    const { code } = item;
    (map[code] = map[code] || []).push(item);
    return map;
  }, {});
  
  Object.keys(containMap).forEach(code => {
    if (!codeMap[code]) return;
    Object.keys(containMap[code]).forEach(containCode => {
      if (!codeMap[containCode]) return;
      // 包含code与被包含code之间的位置偏差
      const offest = containMap[code][containCode];
      codeMap[code].forEach(Q => {
        let idx = codeMap[containCode].findIndex(C => C.x === Q.x + offest);
        if (idx > -1) {
          // 从codeMap中移除
          const [C] = codeMap[containCode].splice(0, 1);
          // 从matches中移除
          idx = matches.findIndex(item => item === C);
          matches.splice(idx, 1);
        }
      });
    });   
  });
}
后期处理可以有很多步骤(这里仅做了一步),需根据具体情况进行处理,越简单越好。
最后从匹配结果中提取验证码。
const verifyCodes = matches.map(item => item.code).join('');

还原验证

在取值验证码之前,需要再核对一次matches中的个数,如果明显不符合,那说明我们处理的还有问题。可以将每一步处理结果进行保存,后期再拿出来还原,对出问题的步骤进行优化。另外,在我们提交验证码校验后,如果没有校验通过,也需要保存所有步骤的处理结果以及验证码,需要后续排查和优化。


校验失败后处理
会存在校验失败的情况:一种情况是我们的处理还有问题、还有可能是验证码生成步骤也会不断调整。
当识别失败后,可以允许一定次数的重试。
用户评论