【30天學習新語言-Ruby】Day 5：基本 HTML 解析找出潛在攻擊點

650 字

3 分鐘

【30天學習新語言-Ruby】Day 5：基本 HTML 解析找出潛在攻擊點

2025-08-14

Learning

/

Ruby

/

30天學Ruby挑戰

💡 為什麼要解析 HTML？#

找到 Web 服務之後，下一步就是分析網頁內容了。但手動看 HTML 原始碼應該蠻累的 XD，而且容易漏東西，這時候就可以用自動化工具來幫忙。

今天要用 Ruby 的 nokogiri 算是一個很強大的 HTML 解析器，它可以幫我們快速找出網頁中的表單、連結、註解等等，這些都會是滲透測試的重要目標。

p.s. Nokogiri 這個名字是日文「鋸」的意思，猜測應該是這個工具可以把 HTML 鋸開來才這樣取名w

🎯 大綱#

安裝並使用 nokogiri gem
解析 HTML 找出所有表單
提取所有超連結
搜尋 HTML 註解（可能有敏感資訊）
找出所有 JavaScript 檔案

📚 知識點#

gem install nokogiri — 安裝第三方套件
Nokogiri::HTML() — 解析 HTML 字串
doc.css() — 使用 CSS selector 查詢
doc.xpath() — 使用 XPath 查詢
element['attribute'] — 取得元素屬性
element.text — 取得元素文字內容

💻 實作#

1
require 'nokogiri'
2
require 'open-uri'
3
require 'uri'
4

5
def analyze_webpage(url)
6
  begin
7
    # 開啟網頁並解析
8
    html = URI.open(url)
9
    doc = Nokogiri::HTML(html)
10

11
    puts "\nAnalyzing: #{url}"
12
    puts "=" * 50
13

14
    # 1. 找出所有 Form
15
    forms = doc.css('form')
16
    if forms.any?
17
      puts "\nFound #{forms.count} form(s):"
18
      forms.each_with_index do |form, i|
19
        puts "  Form ##{i+1}:"
20
        puts "    Action: #{form['action'] || 'N/A'}"
21
        puts "    Method: #{form['method'] || 'GET'}"
22

23
        # 列出所有 input 欄位
24
        inputs = form.css('input')
25
        if inputs.any?
26
          puts "    Inputs:"
27
          inputs.each do |input|
28
            puts "      - #{input['name']} (type: #{input['type']})"
29
          end
30
        end
31
      end
32
    end
33

34
    # 2. 提取所有連結
35
    links = doc.css('a[href]')
36
    if links.any?
37
      puts "\nFound #{links.count} link(s):"
38
      unique_links = links.map { |link| link['href'] }.uniq
39
      unique_links[0..9].each do |link|  # 只顯示前10個
40
        puts "  - #{link}"
41
      end
42
      puts "  ... and #{unique_links.count - 10} more" if unique_links.count > 10
43
    end
44

45
    # 3. 搜尋 HTML 註解
46
    comments = doc.xpath('//comment()')
47
    if comments.any?
48
      puts "\nFound #{comments.count} HTML comment(s):"
49
      comments.each do |comment|
50
        content = comment.text.strip
51
        next if content.empty?
52
        puts "  <!-- #{content[0..100]}#{content.length > 100 ? '...' : ''} -->"
53
      end
54
    end
55

56
    # 4. 找出所有 JavaScript 檔案
57
    scripts = doc.css('script[src]')
58
    if scripts.any?
59
      puts "\nFound #{scripts.count} JavaScript file(s):"
60
      scripts.each do |script|
61
        puts "  - #{script['src']}"
62
      end
63
    end
64

65
    # 5. 找出可能的 API endpoint
66
    api_patterns = ['/api/', '/v1/', '/v2/', '/graphql', '/rest/']
67
    all_urls = (links.map { |l| l['href'] } + scripts.map { |s| s['src'] }).compact
68
    api_endpoints = all_urls.select { |url| api_patterns.any? { |p| url.include?(p) } }
69

70
    if api_endpoints.any?
71
      puts "\nPossible API endpoints:"
72
      api_endpoints.uniq.each do |endpoint|
73
        puts "  - #{endpoint}"
74
      end
75
    end
76

77
  rescue => e
78
    puts "Error: #{e.message}"
79
  end
80
end
81

82
# 測試目標
83
targets = [
84
  'http://example.com',
85
  'http://testphp.vulnweb.com'  # 測試用的脆弱網站
86
]
87

88
targets.each do |target|
89
  analyze_webpage(target)
90
end

🚀 執行方式#

先安裝 nokogiri：

1
gem install nokogiri

建立檔案 html_analyzer.rb
執行指令：

1
ruby html_analyzer.rb

預期輸出會像這樣：

1
Analyzing: http://example.com
2
==================================================
3

4
Found 1 form(s):
5
  Form #1:
6
    Action: /search
7
    Method: POST
8
    Inputs:
9
      - q (type: text)
10
      - submit (type: submit)
11

12
Found 15 link(s):
13
  - /about
14
  - /contact
15
  - /api/v1/users
16
  ...
17

18
Possible API endpoints:
19
  - /api/v1/users
20
  - /api/v1/posts