[OSS/S3] 对象存储 FAQ

1 概述: 对象存储-常见问题

2 FAQ for 对象存储(OSS/S3)

Q: 火山云(TOS) 客户端的访问差异及示例?

所有客户端 : 支持 内网与外网 访问

  • endpoint (s3 访问方式 和 非 s3 访问方式; 内网与外网 的 endpoint 是完全不同的)

内网 : "tos-s3-ap-southeast-3.ivolces.com:443" or "tos-s3-ap-southeast-3.ivolces.com" / ...
外网 : "tos-s3-ap-southeast-3.volces.com:443" or "tos-s3-ap-southeast-3.volces.com" / ... (此方式的访问效率较低)
详情参见 : 地域和访问域名(Endpoint) - 火山云/TOS

java: com.amazonaws:aws-java-sdk-*(s3/kms/core/...):1.12.261 : 支持 s3 协议;支持 VistualHost 访问方式,但不支持 PathStyle 访问方式

//基于 AK/SK 创建会话凭据
com.amazonaws.auth.BasicAWSCredentials ossCredentials = new BasicAWSCredentials(accessKey, secretKey);
ClientConfiguration clientConfiguration = new ClientConfiguration();
clientConfiguration.setConnectionTimeout(9999 * 1000);
clientConfiguration.setRequestTimeout(9999 * 1000);
clientConfiguration.setSocketTimeout(9999 * 1000);
clientConfiguration.setProtocol(Protocol.HTTPS);

//创建客户端会话实例		
com.amazonaws.services.s3.AmazonS3 ossClient = AmazonS3ClientBuilder.standard()
	.withCredentials(new AWSStaticCredentialsProvider(ossCredentials))
	.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, region))
	.withClientConfiguration(clientConfiguration)
	//.enablePathStyleAccess()
	.withPathStyleAccessEnabled( enablePathStyleAccess = false ) //华为云 OBS : 支持 PathStyle 访问方式; 火山云(TOS) : 不支持 PathStyle 访问方式
	.build();
	

//具体操作-上传文件到 OSS
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, objectKey, file);//String bucketName, String key, File file
String name = file.getName();
ObjectMetadata metadata = new ObjectMetadata();
metadata.addUserMetadata( FILE_TYPE , name.substring(name.lastIndexOf(".") + 1));// 记录文件类型
putObjectRequest.setMetadata(metadata);
com.amazonaws.services.s3.model.PutObjectResult result = ossClient.putObject(putObjectRequest);


//具体操作-根据桶名及文件夹名,获取该桶该文件夹的操作对象
ListObjectsRequest lor = new ListObjectsRequest().withBucketName(bucketName).withPrefix("perfFile/");
ObjectListing objectListing = S3.listObjects(lor);
//根据操作对象列出所有文件对象,单个对象使用objectSummary.getKey()即可获取此文件完整路径,配合桶名可以用于操作
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
//...
  String objectKey = objectSummary.getKey();
}

//具体操作-删除OSS文件
ossClient.deleteObject(bucketName, key);//String key
  • endpoint (s3 访问方式 和 非 s3 访问方式; 内网与外网 的 endpoint 是完全不同的)

内网 : "tos-s3-ap-southeast-3.ivolces.com:443" or "tos-s3-ap-southeast-3.ivolces.com" / ...
外网 : "tos-s3-ap-southeast-3.volces.com:443" or "tos-s3-ap-southeast-3.volces.com" / ...
详情参见 : 地域和访问域名(Endpoint) - 火山云/TOS

  • region

"ap-southeast-3" / "cn-beijing" / "cn-shanghai" / ...

python: tos.TosClientV2 : 不支持 s3 协议

  • 安装方式 : pip3 install tos
  • 示例代码
import os;
# from obs import ObsClient;
import tos;
from tos import TosClientV2 as TosClient;
import shutil;
import traceback;

""" 本地试验脚本 
地域和访问域名(Endpoint) | https://www.volcengine.com/docs/6349/107356?lang=zh
Volcengine TOS SDK for Python | https://pypi.org/project/tos/
    pip install tos
火山引擎 TOS Python SDK | https://github.com/volcengine/ve-tos-python-sdk/blob/main/README-zh.md
"""

AccessKey='xxx'
SecretKey='xxx'
#Endpoint='obs.cn-north-4.myhuaweicloud.com'
# 注: 火山云 , python tos 模块 ,仅支持 非 S3 的 endpoint; java 客户端,支持 s3 endpoint
# 内网(非S3) : tos-s3-ap-southeast-3.ivolces.com
# Endpoint='tos-ap-southeast-3.ivolces.com'
# 公网(非S3)
Endpoint='tos-ap-southeast-3.volces.com'
Region="ap-southeast-3"

BucketName="xxxEnv-tos-bigdata-private"
OssDirectory="XXXEnv/ODS/xxx/xxx/"
  # 需注意: 前面不要加 '/' 根目录前缀
#LocalSourcePath = "/data/xxxx/local_data/source/xxxDayNumber/xxxDeviceIdHash"
LocalSourcePath="E:/tmp/xxx-platform" + "/data/xxxx/local_data/source/normal_data/xxxDayNumber/xxxDeviceIdHash"
    # 或 : "E:\\tmp\\" + ...

# "obs://xxxEnv-tos-bigdata-private/XXXEnv/ODS/xxx/xxx/xxxDayNumber/XxxDeviceId/e7463ffdce5ced753a1d77c246fae273-0"
#ObjectKey = "xxxEnv/ODS/xxxxx/xxxDayNumber/xxxDeviceModelCode/XxxDeviceId/e7463ffdce5ced753a1d77c246fae273-0"
ObjectKey = "xxxEnv/ODS/xxx/xxxDayNumber/xxxDeviceModelCode/XxxDeviceId/e7463ffdce5ced753a1d77c246fae273-0"
#LocalSinkPath = "E:/tmp/xxx-platform" + "/xxx/local_data/sink/normal_data/xxxDayNumber/xxxDeviceIdHash/"
LocalSinkPath='E:\\work_data\\xxx\\XxxDeviceId\\e7463ffdce5ced753a1d77c246fae273-0'

#ossClient = ObsClient(access_key_id=AccessKey, secret_access_key=SecretKey,server=Endpoint)
ossClient = TosClient(ak=AccessKey, sk=SecretKey, endpoint=Endpoint, region=Region)



### CASE : 从 OBS 指定目录批量下载 对象文件到本地文件夹下
# https://xxxEnv-tos-bigdata-private.tos-s3-ap-southeast-3.ivolces.com/xxxEnv/ODS/xxx/xxxDayNumber/xxxDeviceModelCode/XxxDeviceId/e7463ffdce5ced753a1d77c246fae273-0
#local_path='/data/xxx/xxx/local_data/source/normal_data/xxxDayNumber/xxxDeviceIdHash'
local_path=LocalSourcePath

def exit_when_exception():
    print( traceback.format_exc() )
    shutil.rmtree(LocalSourcePath)
    os._exit(1)

try:
    # 创建 TosClientV2 对象,对桶和对象的操作都通过 TosClientV2 实现
    ossClient = tos.TosClientV2(AccessKey, SecretKey, Endpoint, Region)
    # 列举指定桶下所有对象
    truncated = True
    continuation_token = ''
    total_files = 0;
    success_files = 0;
    failure_files = 0;
    while truncated:
        result = ossClient.list_objects_type2(BucketName, prefix= OssDirectory, continuation_token=continuation_token)
        total_files = len(result.contents);
        for item in result.contents:
            if item.key.endswith("/") == False: # 文件对象,而非目录对象
                fileName = os.path.basename( item.key );
                try :
                    file_response = ossClient.get_object_to_file( BucketName, item.key, LocalSourcePath + "/" + fileName) # 会根据前缀在本地创建文件夹
                    success_files += 1;
                    # print("下载成功:" + item.key)
                except Exception as e:
                    failure_files += 1;
                    print(f"download file fail!objectKey:{item.key}, exception:{e}")
            else: # 目录对象, eg: 'xxxEnv/ODS/xxx/normal_data/xxxDayNumber/xxxDeviceIdHash/'
                total_files -= 1;
        truncated = result.is_truncated
        continuation_token = result.next_continuation_token

    # resp = obsClient.downloadFiles(bucketName=BucketName, prefix=OssDirectory, downloadFolder=LocalSourcePath)
    # result = "DownloadFiles summary : total_task:%d, success:%d ,failure:%d" % (resp.total_tasks, resp.successful_tasks, resp.failed_tasks)
    result = "DownloadFiles summary | total_task:%d, success:%d ,failure:%d" % ( total_files,success_files, failure_files )
    print(result)
except tos.exceptions.TosClientError as e:
    # 操作失败,捕获客户端异常,一般情况为非法请求参数或网络异常
    print('fail with client error, message:{}, cause: {}'.format(e.message, e.cause))
    exit_when_exception();
except tos.exceptions.TosServerError as e:
    # 操作失败,捕获服务端异常,可从返回信息中获取详细错误信息
    print('fail with server error, code: {}'.format(e.code))
    # request id 可定位具体问题,强烈建议日志中保存
    print('error with request id: {}'.format(e.request_id))
    print('error with message: {}'.format(e.message))
    print('error with http code: {}'.format(e.status_code))
    print('error with ec: {}'.format(e.ec))
    print('error with request url: {}'.format(e.request_url))
    exit_when_exception();
except Exception as e:
    print('fail with unknown error: {}'.format(e))
    exit_when_exception();


"""
### CASE : 从 本地文件 上传到 OBS 指定目录下 (OBS会自动创建对应路径)
try:
    #resp = ossClient.putFile(BucketName, ObjectKey, LocalSinkPath)
    resp = ossClient.put_object_from_file(BucketName, ObjectKey, LocalSinkPath)
    print(f"resp.response:{resp.resp}")
except:
    import traceback
    print(traceback.format_exc())
    # shutil.rmtree(LocalSourcePath)
    # shutil.rmtree(LocalSinkPath)
    os._exit(1)
"""

python : pyarrowpyarrow.fs.S3FileSystem 子模块 : 支持 s3 协议;支持 VistualHost 访问方式,但不支持 PathStyle 访问方式

#!/usr/bin/env python3.9
# -*- coding: utf-8 -*-

from pyarrow import fs
import pyarrow as pa
import pyarrow.parquet as pq


# 读取指定目录下的 若干 parquet 文件 by `pq.read_table`
#table = pq.read_table('/data/xxx/xxx/local_data/source/normal_data/xxxDayNumber/xxxDeviceIdHash')
table = pq.read_table('E:/work_data/xxx/xxx/xxxDayNumber/xxxDeviceIdHash')
file_options = pa.dataset.ParquetFileFormat().make_write_options(compression='zstd')

s3= fs.S3FileSystem(
    region="ap-southeast-3"
    , endpoint_override='tos-s3-ap-southeast-3.volces.com:443'
    , access_key='xxx'
    , secret_key='xxx'
    , force_virtual_addressing=True         # 设置为 True 即指定为 virtual 模式 (划重点) | https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html
);


# 检查 查询权限
try:
    # 尝试列出桶内内容
    file_info = s3.get_file_info( fs.FileSelector("xxxEnv-tos-bigdata-private/xxxxxDir", recursive=False))
    print("✅ 基础连接成功!能够访问存储桶。")
except Exception as e:
    print(f"❌ 权限验证失败: {e}")


# 检查 写入权限
test_path = "xxxEnv-tos-bigdata-private/test_connection.txt"
try:
    with s3.open_output_stream(test_path) as stream:
        stream.write(b"connection test")
    print(f"✅ 写入权限验证成功!文件已创建: {test_path}")
    # 可选:测试完成后删除测试文件
    # s3.delete_file(test_path)
except Exception as e:
    print(f"❌ 写入失败: 请检查 s3:PutObject 权限。错误详情:\n{e}")


pa.dataset.write_dataset(
    table
    , base_dir='xxxEnv-tos-bigdata-private/xxxEnv/ODS/xxx/xxxDayNumber'
    , partitioning=['key1','key2']
    , basename_template='13acbf6ba3fe361bb48f262c3c0148a6-{i}'
    , format='parquet'
    , max_partitions=1000000
    , existing_data_behavior='overwrite_or_ignore'
    , max_open_files=1000000
    , file_options=file_options
    , filesystem=s3
)

rows=table.num_rows

Q: 火山云(TOS)不支持 PathStyle 访问模式,仅支持 VirtualHost 访问模式,导致Java/Python客户端中报 InvalidPathAccess 错误,如何配置解决?

问题描述

  • java 等支持 s3 协议的客户端报: InvalidPathAccess 错误
  • pyarrow 报: AWS Error ACCESS_DENIED during HeadBucket operation: No response body.
table = pq.read_table('/data/source/normal_data/xxxDayNumber/xxxDeviceIdHash')
file_options = pa.dataset.ParquetFileFormat().make_write_options(compression='zstd')
pa.dataset.write_dataset(table, base_dir='xxxEnv-tos-bigdata-private/xxxEnv/ODS/xxx/xxxDayNumber',partitioning=['key1','key2'],basename_template='13acbf6ba3fe361bb48f262c3c0148a6-{i}',format='parquet',max_partitions=1000000,existing_data_behavior='overwrite_or_ignore',max_open_files=1000000,file_options=file_options,filesystem=s3)

异常日志:

# python3 xxxx.py
  ...
  File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 1035, in write_dataset
    _filesystemdataset_write(
  File "pyarrow/_dataset.pyx", line 4177, in pyarrow._dataset._filesystemdataset_write
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When testing for existence of bucket 'xxxEnv-tos-bigdata-private': AWS Error ACCESS_DENIED during HeadBucket operation: No response body.

root@xxx-ecs:~# pip list | grep -i pyarrow
pyarrow                   23.0.0

原因分析

  • 火山云(TOS) : 不支持 PathStyle 访问模式,仅支持 VirtualHost 访问模式,将导致Java/Python客户端中报 InvalidPathAccess 错误
  • 华为云(OBS) : 支持 PathStyle 访问模式

解决方法

Y 推荐文献

X 参考文献

posted @ 2026-02-11 00:26  千千寰宇  阅读(4)  评论(0)    收藏  举报