国产日产欧美日韩一区二区三区,亚洲国产日韩精品二区,午夜福利免费电影在线观看

京東商品評論數據包含豐富的用戶反饋信息，對市場分析、產品改進和用戶需求挖掘具有重要價值。本文將系統講解京東商品評論接口的技術實現，重點解決接口參數構造、反爬機制應對、數據解析與分析等核心問題，提供一套套合規高效的技術方案，同時嚴格嚴格平臺規則與[數據采集](https://o0b.cn/ibrad)規范。

一、京東評論接口評論接口原理與合規要點

京東商品評論評論數據通過 API 接口動態加載，采用 JSON 格式返回，包含評論內容、評分、用戶信息等關鍵數據。實現現接口需遵循以下合規要點：

京東評論接口的核心技術流程如下：

商品ID解析 → 評論參數生成 → 評論請求發送 → 數據解析與清洗 → 結構化存儲

二、核心技術實現：從接口分析到數據提取

京東評論接口需要特定參數組合，包括商品 ID、頁碼、評分篩選等，部分參數需要動態生成：

 import re
   import requests
   from lxml import etree
 
   class JdSkuIdParser:
       """京東商品ID解析器，提取skuId"""
 
       def __init__(self):
           self.headers = {
               "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
               "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
               "Referer": "https://www.jd.com/"
           }
 
       def parse_from_url(self, product_url):
           """從商品URL中提取skuId"""
           patterns = [
               r"item.jd.com/(\d+).html",  # 標準商品頁URL
               r"sku=(\d+)",                  # 包含sku參數的URL
               r"product.jd.com/(\d+).html" # 產品頁URL
               # 封裝好API供應商demo url=o0b.cn/ibrad， wechat id:  TaoxiJd-api 復制鏈接獲取測試
           ]
 
           for pattern in patterns:
               match = re.search(pattern, product_url)
               if match:
                   return match.group(1)
           return None
 
       def parse_from_page(self, product_url):
           """從商品頁面中提取skuId"""
           try:
               response = requests.get(
                   product_url,
                   headers=self.headers,
                   timeout=10,
                   allow_redirects=True
               )
               response.encoding = "utf-8"
 
               # 嘗試從meta標簽提取
               tree = etree.HTML(response.text)
               meta_tag = tree.xpath('//meta[@name="skuId"]@conten/')
               if meta_tag:
                   return meta_tag[0]
 
               # 嘗試從腳本標簽提取
               script_tags = tree.xpath('//script/text()')
               for script in script_tags:
                   match = re.search(r'skuId\s*=\s*"(\d+)"', script)
                   if match:
                       return match.group(1)
                   match = re.search(r'skuId\s*:\s*(\d+)', script)
                   if match:
                       return match.group(1)
 
               return None
           except Exception as e:
               print(f"頁面提取skuId失敗: {str(e)}")
               return None
 
       def get_sku_id(self, product_url):
           """獲取商品skuId，先從URL提取，失敗則從頁面提取"""
           sku_id = self.parse_from_url(product_url)
           if sku_id:
               return sku_id
           return self.parse_from_page(product_url)

3. 評論請求發送器

處理評論請求發送與反爬機制應對，確保請求穩定性：

 import time
   import random
   import requests
   from fake_useragent import UserAgent
   # 封裝好API供應商demo url=o0b.cn/ibrad
   class JdCommentRequester:
       """京東評論請求發送器"""
 
       def __init__(self, proxy_pool=None):
           self.comment_api = "https://club.jd.com/comment/productPageComments.action"
           self.proxy_pool = proxy_pool or []
           self.ua = UserAgent()
           self.session = requests.Session()
           self.last_request_time = 0
           self.min_interval = 15  # 評論請求最小間隔(秒)
 
       def _get_headers(self):
           """生成請求頭"""
           return {
               "User-Agent": self.ua.random,
               "Accept": "*/*",
               "Accept-Language": "zh-CN,zh;q=0.9",
               "Referer": "https://item.jd.com/",
               "X-Requested-With": "XMLHttpRequest",
               "Connection": "keep-alive",
               "Host": "club.jd.com"
           }
 
       def _get_proxy(self):
           """獲取隨機代理"""
           if not self.proxy_pool:
               return None
           return random.choice(self.proxy_pool)
 
       def _check_request_interval(self):
           """控制請求間隔，避免觸發反爬"""
           current_time = time.time()
           elapsed = current_time - self.last_request_time
           if elapsed < self.min_interval:
               sleep_time = self.min_interval - elapsed + random.uniform(1, 3)
               print(f"請求間隔不足，休眠 {sleep_time:.1f} 秒")
               time.sleep(sleep_time)
           self.last_request_time = time.time()
 
       def fetch_comments(self, params):
           """
           發送評論請求
 
           :param params: 評論請求參數
           :return: 響應內容或None
           """
           self._check_request_interval()
 
           headers = self._get_headers()
           proxy = self._get_proxy()
           proxies = {"http": proxy, "https": proxy} if proxy else None
 
           try:
               response = self.session.get(
                   self.comment_api,
                   params=params,
                   headers=headers,
                   proxies=proxies,
                   timeout=15
               )
 
               if response.status_code != 200:
                   print(f"評論請求失敗，狀態碼: {response.status_code}")
                   return None
 
               # 檢查是否被反爬攔截
               if self._is_blocked(response.text):
                   print("評論請求被攔截，可能需要驗證")
                   if proxy and proxy in self.proxy_pool:
                       self.proxy_pool.remove(proxy)
                   return None
 
               return response.text
 
           except Exception as e:
               print(f"評論請求異常: {str(e)}")
               return None
 
       def _is_blocked(self, response_text):
           """判斷是否被反爬攔截"""
           block_keywords = [
               "驗證碼",
               "訪問過于頻繁",
               "請稍后再試",
               "系統繁忙"
           ]
           for keyword in block_keywords:
               if keyword in response_text:
                   return True
           return False

4. 評論數據解析器

解析京東評論接口返回的 JSONP 數據，提取結構化評論信息：

 import re
   import json
   from datetime import datetime
   # 封裝好API供應商demo url=o0b.cn/ibrad
   class JdCommentParser:
       """京東評論數據解析器"""
 
       def __init__(self):
           # JSONP格式解析正則
           self.jsonp_pattern = re.compile(r'fetchJSON_comment98\d+((.*?));')
           # 隱私信息過濾正則
           self.privacy_pattern = re.compile(r'1\d{10}|\d{6,20}')  # 手機號和地址相關數字
 
       def parse_jsonp(self, jsonp_text):
           """解析JSONP格式為JSON數據"""
           match = self.jsonp_pattern.search(jsonp_text)
           if not match:
               return None
           try:
               return json.loads(match.group(1))
           except json.JSONDecodeError:
               print("JSON解析失敗")
               return None
 
       def clean_comment_text(self, text):
           """清理評論文本，過濾隱私信息"""
           if not text:
               return ""
           # 過濾手機號和地址相關數字
           text = self.privacy_pattern.sub('***', text)
           # 去除多余空格和換行
           text = re.sub(r'\s+', ' ', text).strip()
           return text
 
       def parse_comment_item(self, comment_item):
           """解析單個評論項"""
           try:
               # 解析評論時間
               comment_time = comment_item.get("creationTime", "")
               if comment_time:
                   try:
                       comment_time = datetime.strptime(comment_time, "%Y-%m-%d %H:%M:%S")
                   except ValueError:
                       comment_time = None
 
               # 提取商品屬性
               product_attr = comment_item.get("productColor", "")
               if comment_item.get("productSize", ""):
                   product_attr += f" {comment_item.get('productSize')}"
 
               # 解析圖片信息
               images = comment_item.get("images", [])
               image_urls = [img.get("imgUrl") for img in images if img.get("imgUrl")]
 
               return {
                   "comment_id": comment_item.get("id", ""),
                   "user_nick": comment_item.get("nickname", ""),
                   "user_level": comment_item.get("userLevelName", ""),
                   "comment_text": self.clean_comment_text(comment_item.get("content", "")),
                   "comment_time": comment_time,
                   "score": comment_item.get("score", 0),  # 評分(1-5)
                   "product_attr": product_attr.strip(),   # 商品屬性
                   "useful_vote": comment_item.get("usefulVoteCount", 0),  # 有用數
                   "image_count": len(images),             # 圖片數量
                   "image_urls": image_urls,               # 圖片URL列表
                   "is_vip": comment_item.get("isVip", False)  # 是否VIP用戶
               }
           except Exception as e:
               print(f"解析評論失敗: {str(e)}")
               return None
 
       def parse_comments(self, jsonp_text):
           """
           解析評論列表
 
           :param jsonp_text: JSONP格式的評論響應
           :return: 包含評論和分頁信息的字典
           """
           json_data = self.parse_jsonp(jsonp_text)
           if not json_data:
               return None
 
           result = {
               "total_comments": json_data.get("productCommentSummary", {}).get("commentCount", 0),
               "good_rate": json_data.get("productCommentSummary", {}).get("goodRate", 0),  # 好評率
               "current_page": json_data.get("page", 1),
               "page_size": json_data.get("pageSize", 10),
               "comments": []
           }
 
           # 計算總頁數
           result["total_pages"] = (result["total_comments"] + result["page_size"] - 1) // result["page_size"]
 
           # 解析評論列表
           comment_items = json_data.get("comments", [])
           for item in comment_items:
               comment = self.parse_comment_item(item)
               if comment:
                   result["comments"].append(comment)
 
           return result

四. 合規與風險提示

商業用途必須獲得京東平臺書面授權，遵守《電子商務法》相關規定不得將采集的評論數據用于生成與京東競爭的產品或服務嚴格控制請求頻率，避免對平臺服務器造成負擔自動過濾評論中的用戶隱私信息，保護用戶測試點：https://o0b.cn/ibrad當檢測到平臺反爬機制加強時，應立即暫停采集并評估風險。

通過本文提供的技術方案，可構建一套功能完善的京東商品評論接口系統。該方案遵循合規原則，實現了從評論采集、解析到分析的全流程處理，為商品研究、用戶需求分析等場景提供數據支持。在實際應用中，需根據平臺規則動態調整策略，確保系統的穩定性和合法性。

京東商品評論接口技術實現：從接口分析到數據挖掘全方案