# Whisper AI Backend - API Documentation

Base URL: `http://localhost:8000`

## Endpoints

### 1. Health Check

**Endpoint:** `GET /health`

**Description:** Check if the API server is running.

**Request:**

```bash
curl http://localhost:8000/health
```

**Response:**

```json
{
  "status": "healthy"
}
```

**Status Codes:**

- `200 OK` - Service is healthy

---

### 2. Upload Audio for Transcription

**Endpoint:** `POST /transcribe`

**Description:** Upload an audio file for transcription. Returns immediately with a job ID.

**Supported Audio Formats:**

- MP3 (`audio/mpeg`)
- WAV (`audio/wav`, `audio/x-wav`)
- M4A (`audio/m4a`, `audio/x-m4a`, `audio/mp4`)
- OGG (`audio/ogg`)
- WebM (`audio/webm`)

**Request:**

```bash
curl -X POST http://localhost:8000/transcribe \
  -H "Content-Type: multipart/form-data" \
  -F "file=@path/to/audio.m4a"
```

**Request Body:**

- `file` (required): Audio file as multipart/form-data

**Response (Success):**

```json
{
  "job_id": "85388709-04da-4fa3-af02-a708e15b8e4e",
  "status": "queued"
}
```

**Response Structure:**

```typescript
interface TranscriptionRequest {
  job_id: string; // UUID of the transcription job
  status: "queued"; // Initial status is always "queued"
}
```

**Status Codes:**

- `200 OK` - File accepted and job created
- `415 Unsupported Media Type` - Invalid file type

**Error Response (Unsupported File Type):**

```json
{
  "detail": "Unsupported file type: application/octet-stream. Supported types: audio/mpeg, audio/wav, audio/x-wav, audio/m4a, audio/x-m4a, audio/mp4, audio/ogg, audio/webm"
}
```

---

### 3. Get Transcription Job Status

**Endpoint:** `GET /job/{job_id}`

**Description:** Check the status and retrieve results of a transcription job.

**Request:**

```bash
curl http://localhost:8000/job/85388709-04da-4fa3-af02-a708e15b8e4e
```

**Path Parameters:**

- `job_id` (required): The job ID returned from the `/transcribe` endpoint

**Response (Queued):**

```json
{
  "job_id": "85388709-04da-4fa3-af02-a708e15b8e4e",
  "status": "queued",
  "text": null,
  "language": null,
  "duration": null,
  "language_probability": null,
  "error": null
}
```

**Response (Processing):**

```json
{
  "job_id": "85388709-04da-4fa3-af02-a708e15b8e4e",
  "status": "processing",
  "text": null,
  "language": null,
  "duration": null,
  "language_probability": null,
  "error": null
}
```

**Response (Completed):**

```json
{
  "job_id": "85388709-04da-4fa3-af02-a708e15b8e4e",
  "status": "completed",
  "text": "Hi, this is Mano, so I spoke to Kofi yesterday and we kind of came to the competition...",
  "language": "en",
  "duration": 36.6506875,
  "language_probability": 0.6491772532463074,
  "error": null
}
```

**Response (Failed):**

```json
{
  "job_id": "85388709-04da-4fa3-af02-a708e15b8e4e",
  "status": "failed",
  "text": null,
  "language": null,
  "duration": null,
  "language_probability": null,
  "error": "Transcription failed: Audio file corrupted"
}
```

**Response Structure:**

```typescript
interface TranscriptionResponse {
  job_id: string; // UUID of the job
  status: "queued" | "processing" | "completed" | "failed";
  text: string | null; // Transcribed text (only when completed)
  language: string | null; // Detected language code (e.g., "en", "es")
  duration: number | null; // Audio duration in seconds
  language_probability: number | null; // Confidence score (0-1)
  error: string | null; // Error message (only when failed)
}
```

**Status Codes:**

- `200 OK` - Job found
- `500 Internal Server Error` - Failed to retrieve job status

---

## Usage Flow

### 1. Upload Audio File

```javascript
const formData = new FormData();
formData.append("file", audioFile);

const response = await fetch("http://localhost:8000/transcribe", {
  method: "POST",
  body: formData,
});

const { job_id } = await response.json();
```

### 2. Poll for Results

```javascript
async function pollTranscription(jobId) {
  const response = await fetch(`http://localhost:8000/job/${jobId}`);
  const result = await response.json();

  if (result.status === "completed") {
    return result.text;
  } else if (result.status === "failed") {
    throw new Error(result.error);
  } else {
    // Still processing, poll again after delay
    await new Promise((resolve) => setTimeout(resolve, 2000));
    return pollTranscription(jobId);
  }
}

const transcription = await pollTranscription(job_id);
```

### 3. Complete Example

```javascript
async function transcribeAudio(audioFile) {
  // Step 1: Upload
  const formData = new FormData();
  formData.append("file", audioFile);

  const uploadResponse = await fetch("http://localhost:8000/transcribe", {
    method: "POST",
    body: formData,
  });

  if (!uploadResponse.ok) {
    const error = await uploadResponse.json();
    throw new Error(error.detail);
  }

  const { job_id } = await uploadResponse.json();

  // Step 2: Poll for results
  while (true) {
    const statusResponse = await fetch(`http://localhost:8000/job/${job_id}`);
    const result = await statusResponse.json();

    if (result.status === "completed") {
      return {
        text: result.text,
        language: result.language,
        duration: result.duration,
        confidence: result.language_probability,
      };
    } else if (result.status === "failed") {
      throw new Error(result.error);
    }

    // Wait 2 seconds before polling again
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
}

// Usage
try {
  const result = await transcribeAudio(myAudioFile);
  console.log("Transcription:", result.text);
  console.log("Language:", result.language);
  console.log("Duration:", result.duration, "seconds");
} catch (error) {
  console.error("Transcription failed:", error.message);
}
```

---

## Notes

- **Job Expiration**: Results are stored for 24 hours after completion
- **Polling Interval**: Recommended 2-5 seconds between status checks
- **File Size**: No explicit limit, but larger files take longer to process
- **Processing Time**: Typically 10-30% of audio duration (e.g., 30s audio = 3-9s processing)
- **First Request**: First transcription after worker startup is slower (model loading)
- **Concurrent Requests**: Supports 4 concurrent transcriptions (configurable)

---

## CORS Configuration

If accessing from a web frontend, you may need to enable CORS in `main.py`:

```python
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000"],  # Your frontend URL
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
```